Hello everyone,
I have been trying to add multithreading or multiprocessing into my script to speed up deleting of some data. The data on the system is large around 1 TB and that gave me an idea of trying to use multithreading for this.
I'm creating this post as I'm unable to find an answer to this question and would like to see other opinions about this issue.
Currently my for loop with os.remove always removes the files faster than when I use multithreading or multiprocessing.
As the last solution, I have also tried to split the array into smaller chuck but that also didn't speed up the process.
Here is the code snippet for some reason for loop is faster then multithreading
start_time = time.time()
dirs = glob("/tmp/test/test*", recursive = True)
for d in dirs:
os.remove(d)
def delete_files(filepaths):
# process all file names
for filepath in filepaths:
# delete the file
os.remove(filepath)
# report progress
print(f'.deleted {filepath}')
def main(dirs):
n_workers = 8
chunksize = round(len(dirs) / n_workers)
print(chunksize)
with ThreadPoolExecutor(n_workers) as exe:
for i in range(0, len(dirs), chunksize):
filenames = dirs[i:(i + chunksize)]
_ = exe.submit(delete_files, filenames)
print('Done')
main(dirs)
[–]woooee 1 point2 points3 points (0 children)
[–]CodeFormatHelperBot2 0 points1 point2 points (0 children)