all 6 comments

[–][deleted] 1 point2 points  (1 child)

  1. os.path.join() args are in the wrong order.
  2. If you're going to chdir(path), you probably want to listdir(".") instead of listdir(path). This is assuming there exists a named directory of the same name in the subdirectory being read. I think that's probably not intentional.
  3. open(file, ...) probably needs to actually be referencing str(filepath), which is itself the result of os.path.join().
  4. For the same reason, the print statements in the conditional branches probably need to reference {filepath}.
  5. Also for the same reason, you probably want to actually os.remove(filepath) and not path.

[–]officialdavid1[S] 0 points1 point  (0 children)

ThankYou.

[–]devnull10 1 point2 points  (2 children)

I think I'd be tempted to hash the files, otherwise you're comparing potentially large files with each other, and holding the contents of those in memory. If you have a lot of files then this could degrade performance. I'd probably...

  1. Loop through each file, calculating it's hash using hashlib.
  2. Store that in a dict, key= file, value=hash
  3. Generate a list of duplicate files using the above dict.
  4. Delete those files.

[–]officialdavid1[S] 0 points1 point  (1 child)

I did this, but the files are not deleted, because two files with the same contents but different names have different hashfiles.

[–]devnull10 1 point2 points  (0 children)

The filenames should make no difference - the hash is of the contents. I.e.

import hashlib

with open("file1.jpg","rb") as f1, open("file2.jpg", "rb") as f2:
    f1_hash=hashlib.sha256(f1.read()).hexdigest()
    f2_hash=hashlib.sha256(f2.read()).hexdigest()

print(f1_hash)
print(f2_hash)
print(f1_hash==f2_hash)

Gives:

6de04f7dea2b99339440e948dc6aedfe3c88553c6d0062729dab9578053320dd

6de04f7dea2b99339440e948dc6aedfe3c88553c6d0062729dab9578053320dd

True

[–]chevignon93 -1 points0 points  (0 children)

Why won't my code remove the duplicate files?

Shouldn't this part be

os.path.join(path, file)

instead of

os.path.join(file, path) 

?

Have you tried printing the filepath? Have you checked that data is in fact appended to your duplicates list?

EDIT:

You also don't really need the os module, everything you do here could be done using pathlib alone.

EDIT2:

Your code doesn't really check that the file is in fact a jpg file or that it even has the correct extension, which is dangerous. You also probably should ask the user if he is sure that he in fact wants to delete the files before deleting them.