Just very recently, I wrote a blog post where I suggest people use audio fingerprinting techniques to better search and identify for duplicate songs in their libraries instead of using metadata. I also provided a half-assed script to attempt to do it easily.

That script simply did not work as expected, and ultimately writing something that would scan for duplicates was slightly more complicated.

Introducing fsduplicates

Today I’m introducing a better tool that I have built since I wrote that blog post. The tool is called fsduplicates, and it is a command line tool that interacts directly with the AcoustID database.fsduplicates is incomplete, but it is pretty usable for now. It is written in Swift 3 and it is fully open source.

fsduplicates has two main functions: To scan and fetch the AcoustIDs of each song in the AcoustID database (called the indexing process), and duplicate indentification. Using the tool is really easy.

The indexing content is done using the -f flag. Note that you can issue the -v flag to all commands to trigger verbosity.

fsduplicates -f DIR_TO_SEARCH DIR_TO_OUTPUT

An example:

fsduplicates -f /Volumes/iTunes/Music/Nightwish ~/Documents/fsduplicates_nightwish

This will recursively index the contents of the folder you passed as DIR_TO_SEARCH. Do note that this process can take a long time, not only due to the fingerprinting process but to play well with AcoustID’s rules. All the results will be dropped in DIR_TO_OUTPUT (also called a Library). fsduplicates will not move or delete any files. The Library directory will contain three plain text files with info about the songs:

  • library contains a list of all the songs it indexed. This is a simple list of file paths to songs. This is used by fsduplicates to prevent reindexing songs and save you time. The advantage of having this file is that, if you are indexing a large directory and you want to stop, the reindexing process can be restarted later without losing progress.
  • fps_library contains a list of AcoustIDs and the files that match them. If you wanted to analyse your duplicates manually, you would use this pair. Each line stores the data as acoustid:filepath to make it easier to parse using standard Bash tools.
  • no_fps_library contains a list of file paths that did not have a matching fingerprint in AcoustID’s database. Consider contributing the fingerprints of these songs to their service to help them improve.

After the indexing process is done, you can show results using the -s flag.

fsduplicates -s LIBRARY

Example:

fsduplicates -s ~/Documents/fsduplicates_nightwish

This will group fingerprints with file paths that matched them to make it easier to see which songs are duplicated.

-----------------------------------
Showing Duplicates for 08fcc296-7d3f-483f-86ea-cfbe725d291d:
1. /Volumes/iTunes/Music/Nightwish/Bless The Child/02 The Wayfarer.m4a
2. /Volumes/iTunes/Music/Nightwish/Century Child/12 The Wayfarer.m4a
3. /Volumes/iTunes/Music/Nightwish/Ever Dream/03 The Wayfarer.m4a
4. /Volumes/iTunes/Music/Nightwish/Highest Hopes/2-01 The Wayfarer.m4a
5. /Volumes/iTunes/Music/Nightwish/Tales From The Elvenpath/15 Wayfarer.m4a
6. /Volumes/iTunes/Music/Nightwish/Wishsides/2-03 The Wayfarer.m4a
-----------------------------------

-----------------------------------
Showing Duplicates for e9ffe05f-ad4a-4906-afca-26cbbf628787:
1. /Volumes/iTunes/Music/Nightwish/Bless The Child/09 Lagoon.m4a
2. /Volumes/iTunes/Music/Nightwish/Century Child/11 Lagoon.m4a
3. /Volumes/iTunes/Music/Nightwish/Highest Hopes/2-08 Lagoon.m4a
4. /Volumes/iTunes/Music/Nightwish/Tales From The Elvenpath/14 Lagoon.m4a
5. /Volumes/iTunes/Music/Nightwish/Wishsides/2-07 Lagoon.m4a
-----------------------------------

If you pass in the -i flag, you will be able to choose an action for each group. Currently, there’s only the option to create symbolic links of each file in the group in the directory or to skip the group. In the future I will add more actions, like the ability to directly delete the duplicates or move them entirely to the Library path.

fsduplicates -s -i ~/Documents/fsduplicates_nightwish

Sample output:

-----------------------------------
Showing duplicates for 08fcc296-7d3f-483f-86ea-cfbe725d291d:
1. /Volumes/iTunes/Music/Nightwish/Bless The Child/02 The Wayfarer.m4a
2. /Volumes/iTunes/Music/Nightwish/Century Child/12 The Wayfarer.m4a
3. /Volumes/iTunes/Music/Nightwish/Ever Dream/03 The Wayfarer.m4a
4. /Volumes/iTunes/Music/Nightwish/Highest Hopes/2-01 The Wayfarer.m4a
5. /Volumes/iTunes/Music/Nightwish/Tales From The Elvenpath/15 Wayfarer.m4a
6. /Volumes/iTunes/Music/Nightwish/Wishsides/2-03 The Wayfarer.m4a
-----------------------------------
What do you want to do?:
(s)ymbolic link all to Library       (i)gnore

OPTION: 

Creating symbolic links is useful, because you can then drop them into a music player to listen to them.

Downloading fsduplicates

Head over to the project page on Github and download it from there. The install and more complete usage instructions are on the README.md. The project is open source and I welcome contributions, even if they are just cleaning the hacky code I wrote.

Warnings and other notes

I consider this product to be mostly incomplete, but it’s complete enough for my needs. You agree to use this tool under your own risk and not blame me if it creates a black hole in your computer.

Positive SSL