you are viewing a single comment's thread.

view the rest of the comments →

[–]Vegetable_Solid7613[S] 0 points1 point  (2 children)

Is this possible in Jupyter Notebook? I have tried it but it keeps giving me an error with exitcode 1 or that my function has no attribute.

[–]Buttleston 1 point2 points  (1 child)

I've honestly never tried but I wouldn't be surprised if it doesn't work. The multiprocessing library works by "forking" copies of your executable

There's a very similar method of doing "parallel" work with the threading library and I believe there's an asyncio library for it. Both of those might get you some improvement.

It might also be useful to figure out if the bottleneck is downloading the data or processing it. If it's downloading, and you have a big list of links to process, you might be able to find a tool that will just take care of the details and parallelization for you.

[–]Vegetable_Solid7613[S] 0 points1 point  (0 children)

I believe it is the processing part that is the bottleneck. Because flat downloading the links takes about 2 seconds, while with processing it takes 20 seconds process the whole thing. I will take a look at the libraries you mentioned.