The common method to delete duplicates in a file is to sort the records. Duplicates may then be deleted either on-the-fly or in a second pass. Here, we present a new method based on hashing. Multiple passes are made over the file and detected duplicates move in place to the tail end of the file. The algorithm requires, on the average, only linear time and works with O(1) extra space.
W. Hesselink and J. Jongejan:
Duplicate Deletion Derived, Comm. ACM, Vol. 35, No. 7 (July 1992) 99-107
including a response by us.
Prof. Rechenberg from the University of Linz, Austria, thought both our original paper and the comment from Hesselink& Jongejan were hard to understand, so he wrote a very well designed paper
P.Rechenberg, H.Dobler: " Duplicate Deletion, A Didactical Explanation
of an Intricate Algorithm by Teuhola and Wegner",
Technical Report TR 92/9, Universitaet Linz, Institut fuer Informatik,
Altenbergerstr. 69, A-4040 Linz/Auhof, Austria.
His email address is Rechbg@soft.uni-linz.ac.at