DupeFinder is a simple application for locating, moving, renaming and deleting duplicate files in a directory structure. It's perfect both for users who haven't kept their hard drives very well organized and need to do some cleaning to free space and for users who like to keep lots of backup copies of important data "just in case" something bad should happen.

An application designed for only occasional maintenance such as this needs to be easy to learn so that users don't spend more time figuring it out than actually getting work done with it. DupeFinder sports a very clean and simple interface that stays out of the way and let's you concentrate on what's important: your data.

News

2008.05.11
DupeFinder 1.1.0 is available. This version removes the need for an external md5sum command line utility. This improves performance calculating MD5 digests for small files and eliminates a cumbersome dependency for Windows users.
2006.06.13
DupeFinder 1.0.2 is available. This is a bugfix version to fix problems with new versions of Qt/PyQt and to avoid problems with symbolic links.
2005.06.23
DupeFinder 1.0.1 is available. This is a bugfix version to fix a problem where the interface code did not work correctly with newer versions of PyQt (3.14 or newer). Only people with problems running the 1.0 version need to download this update.
2004.05.12
DupeFinder 1.0 released.

Features

Although DupeFinder is a quite small application, it should have all of the features you will need to remove and reorganize large directories full of duplicate files:

While everything works pretty well in most cases, there are a few issues with DupeFinder to be aware of. I hope to fix most of the following bugs sometime soon:

* these bugs were previously reported but no longer appear to be true, and likely only occur with older versions of Python, Qt, or PyQt.

Requirements

DupeFinder is built on two primary tools: the Python language and the Qt application toolkit. A Python interpreter and the Qt libraries are included in most desktop Linux, BSD and UNIX distributions. Mac OS X (at least the newer versions) includes Python, and Qt is also available for free, though it is not part of a standard install.

Qt is primarily a C++ toolkit, so this means that the PyQt Qt bindings for Python are also required. These are not standard on many/most Linux, etc. distributions, though they are available for all of the systems mentioned.

Versions previous to 1.1.0 require the md5sum utility. This utility is standard on Linux and similar systems, though I've read on Mac OS X it goes by the name md5 instead. I have not confirmed this, but if so then simply change the single occurrence of md5sum in FindDupFiles.py to md5 to run the app on a Mac.

Running DupeFinder on Windows should be possible but probably isn't worth the effort, unless most of the components are already in place for other applications. Qt and PyQt for Windows are only available with a commercial license (this will change when Qt 4 is released). Python is a separate install. Alternatively it is probably possible to satisfy all of the dependencies through X11 on Cygwin.

One more thing: although DupeFinder is intended to be run graphically and interactively, the FindDupFiles.py script can be run standalone from the console. It takes a root search directory followed by any number of file extension filters as command line arguments and outputs the identified duplicate file groups (in no particular order) to STDOUT. This output can be piped to a pager such as less for immediate inspection or redirected straight to a text file using the ">" shell operator (on UNIX-like systems) for logging/reporting.

Screenshots

Here's a couple of images showing DupeFinder in action. There's not much more to it than what you see here, actions to choose directories and move or rename files utilize standard Qt dialogs.

DupeFinder Start Dialog

DupeFinder 1.0 Start Dialog

DupeFinder Results Dialog

DupeFinder 1.0 Results Dialog

License

DupeFinder is Free Software, and is licensed under the GPL (GNU Public License) version 2.0.

Downloads

DupeFinder is currently available only as Python source. Standalone binaries may be made available in the future.

DupeFinder 1.1.0 Source

Everything needed to run the app, assuming Python, Qt, PyQt and the md5sum utility are installed ;-)

This package contains four source files in a self-contained directory. Simply extract the data from the archive to any desired location, then run the application by executing

python DupeFinder.py
inside the directory from your system's command line.

Also included are the Qt Designer *.ui files for the two dialog classes. Neither is necessary for running the program (two of the *.py files are these interfaces compiled directly into Python code), but any developers who want to modify DupeFinder will need them, and they're small enough to not warrant separate downloads.

Older versions available here.

DupeFinder Test Data

A small directory structure containing files which can be used to test the capabilities of DupeFinder in a risk free manner. File names identify the file size and content, and identical files all have the same main name but may differ in extension, e.g. file 2a is the same as 2a.ext, and is the same size but contains different data than file 2b.


Contact me at arkaein@monsterden.net with any questions, suggestions, bug reports or patches for DupeFinder.

Back to Monstrous Software


Last Updated 2008.05.11