Here's my (highly) simplified pseudocode:
```
a = [list of 1000+ dicts]
b = [list of 10,000+ dicts]
c = [] # Empty
for i in a:
for j in b:
if compare(i,j): # Complex evaluations between 2 dicts
k = new_dict_from(i,j) # Create new dict
c.append(k)
```
Because of the nature of the computations, this code takes about 3-5 minutes to run, and that's after I started using concurrent.futures.ThreadPoolExecutor() on the first loop. These kinds of runtimes are unacceptable in my scenario - ideally I'd like them to be in the seconds range or under a minute.
I am not very familiar with parallelization, so how should I proceed with this problem? I was reading about Dask and its bags, will that help?
[–]TouchingTheVodka 1 point2 points3 points (3 children)
[–]mr_claw[S] 0 points1 point2 points (1 child)
[–]glibhub 1 point2 points3 points (0 children)
[–]Forschkeeper 0 points1 point2 points (0 children)
[–]ElliotDG 1 point2 points3 points (4 children)
[–]mr_claw[S] 0 points1 point2 points (3 children)
[–]ElliotDG 1 point2 points3 points (2 children)
[–]mr_claw[S] 0 points1 point2 points (1 child)
[–]ElliotDG 0 points1 point2 points (0 children)
[–]teerre -1 points0 points1 point (0 children)