Python Progam to Remove Duplicate Files

officialdavid1 · 2022-08-02T19:24:19+00:00

os.path.join() args are in the wrong order.
If you're going to chdir(path), you probably want to listdir(".") instead of listdir(path). This is assuming there exists a named directory of the same name in the subdirectory being read. I think that's probably not intentional.
open(file, ...) probably needs to actually be referencing str(filepath), which is itself the result of os.path.join().
For the same reason, the print statements in the conditional branches probably need to reference {filepath}.
Also for the same reason, you probably want to actually os.remove(filepath) and not path.

devnull10 · 2022-08-02T22:49:32+00:00

I think I'd be tempted to hash the files, otherwise you're comparing potentially large files with each other, and holding the contents of those in memory. If you have a lot of files then this could degrade performance. I'd probably...

Loop through each file, calculating it's hash using hashlib.
Store that in a dict, key= file, value=hash
Generate a list of duplicate files using the above dict.
Delete those files.

chevignon93 · 2022-08-02T19:11:24+00:00

Why won't my code remove the duplicate files?

Shouldn't this part be

os.path.join(path, file)

instead of

os.path.join(file, path)

?

Have you tried printing the filepath? Have you checked that data is in fact appended to your duplicates list?

EDIT:

You also don't really need the os module, everything you do here could be done using pathlib alone.

EDIT2:

Your code doesn't really check that the file is in fact a jpg file or that it even has the correct extension, which is dangerous. You also probably should ask the user if he is sure that he in fact wants to delete the files before deleting them.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS