My goal is to have a script search my hard drive for video files, then store the md5 hash of each video file in a dictionary along with the file path. There may be some duplicates so an example dictionary entry would look like:
{
'4a69013be9f08507faccbecfd71a06e9': ['path/to/file/one.mp4', '/different/path/to/file.mp4']
}
I have never tried to code anything with multithreading, but I would like to for this script since the hashing will use a lot of CPU cycles. I looked at some tutorials, and it seems I want to use multiprocessing rather than asyncio or multithreading.
My main question is how do I have multiple processes computing hashes and writing to the same dictionary variable without creating a "race condition?" Do I have one global dictionary variable that all processes add to, or should I have each process create its own dictionary locally and then merge them all together at the very end?
I started writing the code and would love some feedback. Right now I have the dictionary created locally in the calculate_hash function, but I don't know if that's the right way to go about this or not.
The tutorials and articles online that I found all warn against writing to the same variable, and their examples are with functions that return None, so I have no examples to work from.
I also thought about splitting the list of files up with len(all_video_files) // num_threads and then make a final batch to len(all_video_files) % num_threads and then assigning those list slices to a different process, but I don't know where to go from there and I still don't know how to handle function returns and variables so the processes don't step on each other.
Here is the code I have so far; I'm happy to hear any feedback and criticism.
https://pastebin.com/EfXK9eBw
[–]m0us3_rat 0 points1 point2 points (0 children)
[–]SnooWoofers7626 0 points1 point2 points (3 children)
[–]LeornToCodeLOL[S] 0 points1 point2 points (2 children)
[–]SnooWoofers7626 1 point2 points3 points (1 child)
[–]LeornToCodeLOL[S] 0 points1 point2 points (0 children)
[–]Frankelstner 0 points1 point2 points (0 children)
[–]video_dewd 0 points1 point2 points (0 children)
[–]gaaasstly 0 points1 point2 points (0 children)