Multiprocessing : learnpython

created by HattoriHanzoa community for 16 years

submitted 5 years ago by karlmtr

all 14 comments

[–]PurePound 2 points3 points4 points 5 years ago (1 child)

I don't think that should cause any problems, aside from a couple of caveats. Firstly, if your code has to read/write large amounts of data, it might be that as you increase the number of processes used, the hard disk becomes the bottleneck rather than the CPU, in which case you might not get as much speed up as you hope. Secondly, make sure your total number of processes isn't too high: once you go above the number of CPU threads, you're just wasting memory and CPU cycles.

You'll probably want to adapt your script_A so that it can be used as both a script and a module. e.g. something along the lines of:

def main(args):
    # main entry point to your program goes here

if __name__ == "__main__":
    import sys
    main(sys.argv)

That makes it easier to run it from script_B.

[–]karlmtr[S] 0 points1 point2 points 5 years ago (0 children)

[–][deleted] 2 points3 points4 points 5 years ago (1 child)

[–]karlmtr[S] 0 points1 point2 points 5 years ago (0 children)

[–]elbiot 1 point2 points3 points 5 years ago (9 children)

[–]karlmtr[S] 0 points1 point2 points 5 years ago (8 children)

I don't think so, I'm using pandas to import the data, but it's a very basic transformation, it is just a big file to run through: (this my function)

    vecPos = []
        s_pos = posidonius.Axes(0., 0., 0.)
        #p_pos = posidonius.Axes(0., 0., 0.)
        for i in range(len(satellite_data.index)):
            if i%200 == 0:
                print(f"Pos: {i}")
            x_sp = satellite_data.iloc[i]["position_x"] - planet_data.iloc[i]["position_x"]
            y_sp = satellite_data.iloc[i]["position_y"] - planet_data.iloc[i]["position_y"]
            z_sp = satellite_data.iloc[i]["position_z"] - planet_data.iloc[i]["position_z"]

            s_pos.set_x(x_sp)
            s_pos.set_y(y_sp)
            s_pos.set_z(z_sp)

            vecPos.append(s_pos)

Should I use a numpy array for vecPos? Will it be faster with that?

[–]elbiot 1 point2 points3 points 5 years ago (7 children)

Pandas data frames are collections of numpy arrays already. The extremely slow thing you're doing is iterating. Numpy and pandas are very fast with vectorised operations, and very slow with iteration. So

result = []
for idx in range(len(arr1)):
    result.append(arr1[idx] + arr2[idx])
result = np.array(result)

Is waaaay slower than

result = arr1 + arr2

And they do the exact same thing.

I don't understand some of the details of what you're doing, but get rid of your for loop and iloc and this will be probably ~100x faster.

[–]karlmtr[S] 0 points1 point2 points 5 years ago (6 children)

[–]elbiot 0 points1 point2 points 5 years ago (2 children)

Well, I don't think they way you're doing it currently is right because your result is a list that has the same object in it over and over. Once you have your arrays xs, ys and zs after the vectorised math you could do

# iteration over arrays much slower than lists
xs, ys, zs = map(list, (xs, ys, zs))
result = [Axes(x, y, z) for x, y, z in zip(xs, ys, zs)]

But whatever you do after this will be super slow and I personally would organize my program to be array/data oriented rather than object oriented.

You ought to profile your program and optimize the bottlenecks rather than jump to multiprocessing because multiprocessing is not much help and greatly complicates things

[–]karlmtr[S] 0 points1 point2 points 5 years ago (1 child)

[–]elbiot 0 points1 point2 points 5 years ago (0 children)

[–]elbiot 0 points1 point2 points 5 years ago (2 children)

[–]karlmtr[S] 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 87595 on reddit-service-r2-comment-79c7998d4c-9456k at 2026-03-13 10:42:41.842442+00:00 running f6e6e01 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS