mp3dup

mp3dup looks for duplicate files in recursively parsed directories.
It is written to check for duplicate mp3 files, but can be used to find duplicate files among any filetypes.
mp3dup is designed to scale very well and work with extreme quantities of files.

The algorithm is as follows.
Parsed files are inserted in a hash table keyed with the size of the file. Files with the same size are md5summed, and those md5sums are inserted in a new hash table.
Files with the same size and md5sum are brute-force-diffed.
Non regular files are just skipped.
Empty files are printed as such.
Partial matches
mp3dup can also find files that are equal for the first, or last N bytes.
This is accomplished by hashing on the head and tail parts of the files, and byte-comparing the collisions.
ID3-matches
Files with identical ID3-tags can also be found (identical last 128 bytes).

Download

Latest version 0.3, released 020203. mp3dup-0.3.tar.gz
Version 0.2, released 020124. mp3dup-0.2.tar.gz
Initial version 0.1, released 020121 mp3dup-0.1.tar.gz

mp3dup is written by Alexander Haväng, eel@musiknet.se, 2002