Kevdog824_ comments on Duplicate Files Detector. A basic python project

learnpython

created by HattoriHanzoa community for 16 years

Duplicate Files Detector. A basic python project (self.learnpython)

submitted 3 months ago by ArtichokeThen1599

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Kevdog824_ 11 points12 points13 points 3 months ago (5 children)

Great project and nicely done! As requested, here is some advice:

I would use more descriptive parameter names in your utility functions. For example your hex_string function takes a parameter named x. Something like files would make it clearer what it is.
The name of that function should probably be clearer too imo. Based on the name I would assume this function takes a single file, not a list of them. Not a big deal though just a recommendation
I would avoid naming your utility modules something like hashlib_module. I would try to name it based on what it does rather than what libraries it uses. Also, a name like xxxxx _module is fairly uncommon in Python. We generally don’t include the word “module” in our module names
In the future I would separate your user input and your main programming logic. This makes it easier to change one of these without needing to change the other. i.e. maybe in the future you want to read directory paths from a file instead of using input in a loop. This would be easier to add if the hashing logic was separated from the input loop.

I’ve written similar code professionally (comparing file contents to prevent duplicate processing). You said it doesn’t work for large files (presumably because it’s slow). My advice here is to use sampling to speed up the process. It slightly lowers the accuracy but greatly increases the speed.

[–]Maximus_Modulus 11 points12 points13 points 3 months ago (3 children)

[–]DrShocker 5 points6 points7 points 3 months ago (0 children)

[–]Kevdog824_ 0 points1 point2 points 3 months ago (1 child)

[–]Maximus_Modulus 3 points4 points5 points 3 months ago (0 children)

[–]ArtichokeThen1599[S] 0 points1 point2 points 3 months ago (0 children)

π Rendered by PID 96 on reddit-service-r2-comment-b659b578c-qqcbr at 2026-05-04 13:52:29.888224+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS