mp3dup looks for duplicate files in recursively parsed directories.
It is written to check for duplicate mp3 files, but can be used
to find duplicate files among any filetypes.
mp3dup is designed to scale very well and work with extreme quantities
of files.
The algorithm is as follows.
Parsed files are inserted in a hash table keyed with the size of the file.
Files with the same size are md5summed, and those md5sums are inserted
in a new hash table.
Files with the same size and md5sum are brute-force-diffed.
Non regular files are just skipped.
Empty files are printed as such.
Partial matches
mp3dup can also find files that are equal for the first, or last N bytes.
This is accomplished by hashing on the head and tail parts of the files, and
byte-comparing the collisions.
ID3-matches
Files with identical ID3-tags can also be found (identical last 128 bytes).
mp3dup is written by Alexander Haväng, eel@musiknet.se, 2002