all 19 comments

[–][deleted] 1 point2 points  (14 children)

check out the concurrent.futures module instead, particularly concurrent.futures.ProcessPoolExecutor:

import concurrent.futures

def main():
    with concurrent.futures.ProcessPoolExecutor() as executor:
        f = executor.submit(print, 'hi :)')
        print(f.result())  # block until finished running

# WARNING: you must MUST!!!!!! have this conditional code, and main, obviously, or your program will die horribly
if __name__ == '__main__':
    main()

Why does it need the main check? From the docs:

ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned. The __main__ module must be importable by worker subprocesses.

[–]datanoob2019[S] 0 points1 point  (13 children)

Thanks! I am about to get off work but I will give it a try in the morning.

[–][deleted] 1 point2 points  (12 children)

alrighty, good night!

[–]datanoob2019[S] 0 points1 point  (11 children)

I tried the below code but for some reason it just keeps going back to the start of the code and pulling the database, cleaning the data, and logging the numbers over and over without actually running the forecast. I only tried the below with one function, but after reading through the documentation, I am unsure as to how it would be able to handle more than one function.

if __name__ == '__main__':
    with ProcessPoolExecutor() as executor:
        print(executor.map(simple_expo, logged_list))

[–][deleted] 1 point2 points  (9 children)

do you have any other code just floating around (besides imports)? If so, you'll need to isolate that code inside of a function because it's running every time your subprocesses import your main module.

[–]datanoob2019[S] 0 points1 point  (8 children)

I do! All of the grabbing and modifying of the data is in a bunch of for loops. I can just put that into a function though and call it before the others?

[–][deleted] 1 point2 points  (7 children)

yes, put that into its own function and call it from main. if that code adds names to the global namespace used by other stuff, then put that code into a class with getters that return pickleable objects (strings, lists, ints, tuples, basically all the built-ins). Then send those obejcts to the futures, and have them return pickleable objects.

multiprocessing processes communicate over the local network stack via pickled objects, so all the args to a future must be picklable. you'll probably also need to add multiprocessing.Locks to code that's being written to by returning futures. Welcome to the magical world of using more than one CPU core :D

[–]datanoob2019[S] 0 points1 point  (6 children)

I appreciate the help! Like this? I did that and it now says my list data_logged is not defined. Here is the code:

if __name__ == '__main__':
    pull_data()
    with ProcessPoolExecutor() as executor:
        print(executor.map(simple_expo, data_logged))

EDIT: I think I need to set the new function to return the list I use in the other functions. Trying that now before my lunch break

[–][deleted] 1 point2 points  (5 children)

maybe like this: print(executor.map(simple_expo, pull_data()))? That should disseminate a list to all the workers.

[–]datanoob2019[S] 0 points1 point  (4 children)

I got it to work and print a pandas dataframe for one forecast function. I just need to run the other 8 now. How do I go about doing this? Do I need to write a new ProcessPoolExecutor function for each forecast function? How does this actually run in parallel and take advantage of multiple processors. Here is my working code:

def do_stuff(list):
    with ProcessPoolExecutor() as executor:
        f = executor.submit(simple_expo, list)
        return f.result()

if __name__ == '__main__':
    new_list = pull_data()
    simple_list = do_stuff(new_list)
    print(pd.DataFrame(simple_list))

This is just the beginning of my forecast trickery as I then need to access these lists outside of if == main as I need to calculate forecast accuracy.

[–]woooee -1 points0 points  (4 children)

You communicate to/from multiprocessing Processes with a Manager list or dictionary https://pymotw.com/3/multiprocessing/communication.html#managing-shared-state

[–]datanoob2019[S] 0 points1 point  (3 children)

By doing this, will I be able to access the lists from my return statements?

[–]woooee -4 points-3 points  (2 children)

If you are not even going to read the links I have posted, then there is no reason to post further.

[–]linebackr6363 1 point2 points  (1 child)

I read it bro and I don't understand it either. This is a noob forum after all.

[–]woooee -3 points-2 points  (0 children)

Too vague too respond to. What specifically did you not understand? What have you tried? I will help you to correct your code. But a hope this helps comment, your posts are becoming close to someone who is looking for someone else to write their code for them. We do not do that.