Just very recently, I wrote a blog post where I suggest people use audio fingerprinting techniques to better search and identify for duplicate songs in their libraries instead of using metadata. I also provided a half-assed script to attempt to do it easily.
That script simply did not work as expected, and ultimately writing something that would scan for duplicates was slightly more complicated.
Introducing fsduplicates
Today I’m introducing a better tool that I have built since I wrote that blog post. The tool is called fsduplicates
, and it is a command line tool that interacts directly with the AcoustID database.fsduplicates
is incomplete, but it is pretty usable for now. It is written in Swift 3 and it is fully open source.
fsduplicates
has two main functions: To scan and fetch the AcoustID
s of each song in the AcoustID database (called the indexing process), and duplicate indentification. Using the tool is really easy.
The indexing content is done using the -f
flag. Note that you can issue the -v
flag to all commands to trigger verbosity.
fsduplicates -f DIR_TO_SEARCH DIR_TO_OUTPUT
An example:
fsduplicates -f /Volumes/iTunes/Music/Nightwish ~/Documents/fsduplicates_nightwish
This will recursively index the contents of the folder you passed as DIR_TO_SEARCH
. Do note that this process can take a long time, not only due to the fingerprinting process but to play well with AcoustID’s rules. All the results will be dropped in DIR_TO_OUTPUT
(also called a Library
). fsduplicates
will not move or delete any files. The Library directory will contain three plain text files with info about the songs:
library
contains a list of all the songs it indexed. This is a simple list of file paths to songs. This is used byfsduplicates
to prevent reindexing songs and save you time. The advantage of having this file is that, if you are indexing a large directory and you want to stop, the reindexing process can be restarted later without losing progress.fps_library
contains a list ofAcoustID
s and the files that match them. If you wanted to analyse your duplicates manually, you would use this pair. Each line stores the data asacoustid:filepath
to make it easier to parse using standard Bash tools.no_fps_library
contains a list of file paths that did not have a matching fingerprint in AcoustID’s database. Consider contributing the fingerprints of these songs to their service to help them improve.
After the indexing process is done, you can show results using the -s
flag.
fsduplicates -s LIBRARY
Example:
fsduplicates -s ~/Documents/fsduplicates_nightwish
This will group fingerprints with file paths that matched them to make it easier to see which songs are duplicated.
----------------------------------- Showing Duplicates for 08fcc296-7d3f-483f-86ea-cfbe725d291d: 1. /Volumes/iTunes/Music/Nightwish/Bless The Child/02 The Wayfarer.m4a 2. /Volumes/iTunes/Music/Nightwish/Century Child/12 The Wayfarer.m4a 3. /Volumes/iTunes/Music/Nightwish/Ever Dream/03 The Wayfarer.m4a 4. /Volumes/iTunes/Music/Nightwish/Highest Hopes/2-01 The Wayfarer.m4a 5. /Volumes/iTunes/Music/Nightwish/Tales From The Elvenpath/15 Wayfarer.m4a 6. /Volumes/iTunes/Music/Nightwish/Wishsides/2-03 The Wayfarer.m4a ----------------------------------- ----------------------------------- Showing Duplicates for e9ffe05f-ad4a-4906-afca-26cbbf628787: 1. /Volumes/iTunes/Music/Nightwish/Bless The Child/09 Lagoon.m4a 2. /Volumes/iTunes/Music/Nightwish/Century Child/11 Lagoon.m4a 3. /Volumes/iTunes/Music/Nightwish/Highest Hopes/2-08 Lagoon.m4a 4. /Volumes/iTunes/Music/Nightwish/Tales From The Elvenpath/14 Lagoon.m4a 5. /Volumes/iTunes/Music/Nightwish/Wishsides/2-07 Lagoon.m4a -----------------------------------
If you pass in the -i
flag, you will be able to choose an action for each group. Currently, there’s only the option to create symbolic links of each file in the group in the directory or to skip the group. In the future I will add more actions, like the ability to directly delete the duplicates or move them entirely to the Library path.
fsduplicates -s -i ~/Documents/fsduplicates_nightwish
Sample output:
----------------------------------- Showing duplicates for 08fcc296-7d3f-483f-86ea-cfbe725d291d: 1. /Volumes/iTunes/Music/Nightwish/Bless The Child/02 The Wayfarer.m4a 2. /Volumes/iTunes/Music/Nightwish/Century Child/12 The Wayfarer.m4a 3. /Volumes/iTunes/Music/Nightwish/Ever Dream/03 The Wayfarer.m4a 4. /Volumes/iTunes/Music/Nightwish/Highest Hopes/2-01 The Wayfarer.m4a 5. /Volumes/iTunes/Music/Nightwish/Tales From The Elvenpath/15 Wayfarer.m4a 6. /Volumes/iTunes/Music/Nightwish/Wishsides/2-03 The Wayfarer.m4a ----------------------------------- What do you want to do?: (s)ymbolic link all to Library (i)gnore OPTION:
Creating symbolic links is useful, because you can then drop them into a music player to listen to them.
Downloading fsduplicates
Head over to the project page on Github and download it from there. The install and more complete usage instructions are on the README.md
. The project is open source and I welcome contributions, even if they are just cleaning the hacky code I wrote.
Warnings and other notes
I consider this product to be mostly incomplete, but it’s complete enough for my needs. You agree to use this tool under your own risk and not blame me if it creates a black hole in your computer.
Hi Andy, This project does exactly what I have been wanting to do for a long time. My mp3 library dates back to the days of Napster. I followed your instructions to install chromaprint and fsduplicates on my mac 10.13.3. However, when I run “fsduplicats -f input_dir output_dir” I get an ERROR: Unknown option -hash which repeats for every mp3 fils that it finds. The output library files are 0 byes. Any idea what went wrong?
Here is an example of the output when I activate verbose mode: ERROR: Unknown option -hash
Error on file /Users/iMac/Documents/jerry/dupsmp3/Luis Fonsi – Despacito ft. Daddy Yankee.mp3: The file does not contain a valid fingerprint
The index has been created
Hmmm, it looks like the hash command may not be part of the underlying chromaprint anymore. I will try to see what’s wrong this weekend. If I decide to fix it I ask for your patience, as this project is old and I’d basically have to rewrite it.