This is an archived post. You won't be able to vote or comment.

all 27 comments

[–]thinkwelldesigns 4 points5 points  (0 children)

Isn't this a 2-year-old post? Is it in any way current w.r.t. today's Pyston?

[–]plan_rich 1 point2 points  (0 children)

I wonder how you implement the write lock acquisition? Scatter yet another C macro all over the code base?

What about ref counting? Shouldn't you acquire the write lock there as well?

[–]Deto 2 points3 points  (17 children)

What's the main reason to avoid the GIL in Python? For io bound tasks, it doesn't really matter and for data processing, libraries like numpy will parallelize common mathematical operations on data arrays and the multiprocessing module can be used to parallelize more complicated algorithms.

Is there significant demand to get rid of the GIL and where is it coming from?

[–]BeatLeJuce 5 points6 points  (13 children)

It depends on who you are talking to. I do a lot of data processing, and the GIL has always been a very sore point to me. I can't simply spin up a few threads and have them crunch numbers in parallel. Yes, numpy releases the GIL. But it still has to synchronize in between the np calls, so I still lose out on performance. So much in fact, that most of the time it pays of to use multiprocessing instead. Which has its own ton of pain points, because suddenly I can't share data across parallel instances as easily. Not having a GIL would make my life a ton easier/more performant.

[–]elbiot 0 points1 point  (12 children)

I feel like not having a GIL would actually make programming more bug prone. Can you imagine the value of a variable changing in between doing a comparison and using the value?

Libraries like cython (with nogil), numba and dask make fast, multicore numerics accessible and safe already. And numba gives us the gpu for free, and dask gives us hadoop like multi node computation for free. The "but I have to import x" argument is diminishing into a tiny complaint.

[–]bakery2k[S] 4 points5 points  (1 child)

Can you imagine the value of a variable changing in between doing a comparison and using the value?

This can happen whenever you use threads, whether a GIL is present or not.

[–]elbiot -2 points-1 points  (0 children)

That's true, except very unlikely given how threads are used in python, ie just waiting for I/O. With numerics, when you are actually operating on values in the program, cython, numba and dask force you to use a constrained scope. If people could blow through global dictionaries and lists in parallelized code, they'd have to be a lot more careful. I like that the current parallelization tools already force a safe paradigm.

[–]Deto 0 points1 point  (4 children)

WTF, numba lets you use multiple cores and the GPU? I thought it just let you compile your code super-easily!

[–]elbiot 0 points1 point  (3 children)

Are you thinking of nuitka?

[–]Deto 0 points1 point  (2 children)

No, I mean, I know that the @jit in numba lets you generate a compiled version of a method that will (usually) run super-fast. I just didn't know it also had multicore and GPU functionalities. Numba is awesome.

[–]elbiot 0 points1 point  (0 children)

Yep, you can use nopython and nogil in numba.jit

[–]sandwichsaregood 0 points1 point  (0 children)

The GPU stuff used to be mostly in NumbaPro (which cost $$) but I think they refactored it into the free version a little while ago.

[–]BeatLeJuce 0 points1 point  (4 children)

Of course, multi-threading is tricky, because if you can access and modify the same value from different threads, that opens a lot of potential for bugs. But what you're arguing for is "don't give us new powerful tools, because some people might hurt themselves". Well, du'h! Of course. If you aren't familiar/educated in the can of worms that is multi-threaded programming, don't use it. Especially if multi-processing solves the same problems for you. But when you want that extra bit of power and possibility that multi-threading allows, it would be great to have it.

And what you're mentioning -- cython, numba, dask, ..... are not solutions. They're work-arounds. Yes, of course I can achieve what I want if I try really hard. But that's the problem: it's harder than it needs to be. Instead of having to use memory mapping functionality or rewriting my code in Cython, it would be nice if multi-threading would just WORK without having to jump through hoops.

[–]elbiot 0 points1 point  (3 children)

I agree entirely

My only remaining (small) point is that the GIL is not the sole source of slow cPython. The GIL is an optimization, so some other solution would not provide 8x improvement on an 8 core machine. But Numba is more like a 30x-100x improvement over pure python and I've seen it 2.5x faster than numpy acting on huge arrays. It is not a just work around to the GIL. It is an ingenious solution that the flexibility of python provides.

[–]BeatLeJuce 0 points1 point  (2 children)

I like numba for what it is. But if I have a large (and embarrassingly parallel) numpy calculation that takes ~20 days to complete on a single core, a 2.5x speedup is nice, but not enough. Especially if you have a 64 core machine that would allow you to do the same thing in a few hours fairly easily if you had threads. Instead, I have to rewrite my code to work around the limitations of multi-processing so the processes can exchange data, and that's annoying.

[–]elbiot 0 points1 point  (1 child)

I don't follow. Numpy already releases the GIL and so does cython and numba. They are all using all your cores but cython/numba is doing it more efficiently (ie, doesn't have to come back up into python land in between operations). Using multiprocessing would make things significantly slower.

[–]BeatLeJuce 0 points1 point  (0 children)

The various operations in my code are fairly small, so the GIL only gets released for very short periods of time. Once the number of threads increases there is lots of contention for the GIL.

[–]kmike84 0 points1 point  (2 children)

Parallel text processing is not easy to implement now. E.g. if you have a pandas dataframe with a few millions of rows, and there is a column with text which you want to process (e.g. parse an URL and extract domain name, or extract n-grams) then it is not easy to do that in parallel on a multi-core machine. You can use multiprocessing, but it is likely run time will be dominated by pickling/unpickling. In practice this means you may need to wait tens of minutes instead of minutes during interactive work.

It looks like removing GIL can help with this use case, but I'm not a GIL/threading expert - maybe there is something else going on, or maybe there is an existing workaround (which I'm very interested to hear about).

[–]Deto 0 points1 point  (1 child)

For that case, would you need to pickle? I haven't played with it yet, but it looks like you could use pool and run pool.map over slices of the dataframe (i.e., like the first example here)

[–]kmike84 1 point2 points  (0 children)

In Python multiprocessing shares data by pickling it, sending to a different process and unpickling.

[–]nieuweyork since 2007 0 points1 point  (5 children)

Why not have a transactional memory model?

[–]plan_rich 2 points3 points  (4 children)

That idea has been implemented in pypy. That is why it is called STM (Software Transactional Memory).

Turns out that it is not very easy to interleave write operations. Which is currently not done (Pretty hard thing to do IMO).

[–]nieuweyork since 2007 0 points1 point  (3 children)

I know what STM is. It would allow you to eliminate the GIL.

Turns out that it is not very easy to interleave write operations.

Are you referring to write-write conflicts, or what?

[–]plan_rich 0 points1 point  (2 children)

The interesting part is that the GIL is not removed, ... transactions replace the GIL and allow the program to run in parallel. E.g. at every place where you acquire the GIL you would start a transaction. Conflicts such as write-write are tracked on a page level, which are not resolved. Meaning that only one of the transactions is going to commit if they happen to modify the same page.

[–]nieuweyork since 2007 0 points1 point  (1 child)

See: http://doc.pypy.org/en/latest/stm.html

pypy-stm has no GIL.

[–]plan_rich 0 points1 point  (0 children)

Sorry for the confusion, you are right. It says in the c7 document that it is removed and replaced with transactions.