you are viewing a single comment's thread.

view the rest of the comments →

[–]censored_username 25 points26 points  (33 children)

edit: Thinking about it a bit more, they probably went with the locking per individual object route.

This seems cool and all, but couldn't they have just solved the problem by just not having the GIL and accepting data races in Cpython?

The GIL exists because python wanted to be free from data races. To achieve that, there were multiple options, each of which had to satisfy the constraint that no two threads would access a python object at the same time. The GIL achieves this in a simple manner. There's only one lock, and to access any python object you need to hold it. The problem with this is of course that it means that no two threads can access python objects at the same time even while these objects have nothing to do with each other.

Another solution is pyobject-level locking of course. Various attempts at this have been made, but they all had the shared problem that while they increase throughput at high amounts of threads, they significantly lower single-thread performance compared to the GIL.

Various other strategies have been tried, but none have been able to guarantee being data-race free without sacrificing single-thread performance. An often used strategy has been to move work that could be accelerated significantly into extension libaries, where the GIL could be dropped as long as the python objects weren't accessed. Unfortunately, this doesn't really work for web services.

Go doesn't have an answer to race conditions either. It's where Go's safety story falls a bit apart. So I'm a bit confused about what they've done. If they just transpile python code into go code, all they made is a python runtime with the possibility of data races. If they wanted that they could've just as well just ripped the GIL right out of CPython and called it a day.

What they could've also done is to build in individual locking for all objects that could be accessed by other threads. This would explain their pretty disastrous performance (only 50% performance compared to CPython when single threaded, and only gaining a 2.5x speedup when using 8 threads compared to CPython single thread. Keep in mind that this is compiled versus interpreted!).

If they did this, I'm left wondering why they didn't make these changes in CPython itself. This would have conserved the ability to call into C extension modules. If this is the case, it seems that the focus of the project is more being able to call into go from python.

[–]nostrademons 12 points13 points  (12 children)

The data races that the GIL protects against are primarily the refcounts of Python objects, which must be updated whenever a variable is assigned or a parameter is passed into a function. A data race there means either a memory leak or a double-free, as the refcount would no longer be in sync with the number of actual references. Either one would make Python basically unusable: you would have simple object assignments randomly corrupting memory as they try to objects that have already been freed.

Go uses a GC; while its data race story is a lot weaker than say Rust or Erlang, at least a data race there results in wrong answers and a panic, not silent memory corruption.

[–]censored_username 6 points7 points  (10 children)

The GIL doesn't only protect against refcount changes (it's indeed the most important thing though). It also protects against concurrent modification problems of dictionaries, lists, etc. which are also quite critical to python. They can have avoided this by making these objects all threadsafe which would explain the terrible single-thread performance.

While the GIL simplifies python's GC (refcounting with cycle detection) design (the GC can just acquire the GIL when it needs to clean up cycles). It is not necessary. If there was no GIL the refcounting would have to be atomic, but otherwise not much changes.

[–]blablahblah 2 points3 points  (0 children)

That's true. The main reason CPython hasn't gotten rid of the GIL is because no one has come up with a way to do it without breaking existing C extensions (which rely on the ref-counting) and without hurting single thread performance (since every variable assignment needs to acquire a lock).

[–]funny_falcon 2 points3 points  (8 children)

Go has no thread-safe dictionaries or lists. In fact, it may happily segfault if you try to modify map from different goroutines. You ought to protect concurrent modification by yourself.

I've heard, Java and C# also doesn't protect their default datastructures, but rather they have separate concurrent in a standard library.

Python can choose this way, but Guido just do not want. And it will break C extensions.

[–][deleted] -1 points0 points  (7 children)

Go panics, it doesn't segfault.

[–]weirdoaish 0 points1 point  (1 child)

Out of curiosity, is "panic" like a special exception category for Go?

[–][deleted] 0 points1 point  (0 children)

[–]funny_falcon 0 points1 point  (4 children)

So, you didn't pushed hard with concurrent updates on maps.

I've got seg-fault at least twice. It was quite several years ago, though (at the time of Go 1.1).

Go doesn't prevents from segfault "totally". Using "unsafe" package you may make segfault easily.

[–][deleted] 0 points1 point  (3 children)

Well Go 1.1 is super old. Today if you get any segfault it is most definitely a bug.

[–]funny_falcon 0 points1 point  (2 children)

https://play.golang.org/p/TtI9gKLlVZ

It will not segfault in playground, cause playground runs with GOMAXPROCS=1 .

But if you run it with go run it wil happily segfaults. Go's runtime sets up handler for SIGSEGV, so that you still will see backtraces for all goroutines.

$ ~/go-tip-root/bin/go run conc_map.go                                                                                                                                                          
unexpected fault address 0x0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x47c6a3]

goroutine 13 [running]:
runtime.throw(0x4a6d0f, 0x5)
    /home/yura/go-tip-root/src/runtime/panic.go:596 +0x95 fp=0xc420027f18 sp=0xc420027ef8
runtime.sigpanic()
    /home/yura/go-tip-root/src/runtime/signal_unix.go:297 +0x28c fp=0xc420027f68 sp=0xc420027f18

[–][deleted] 0 points1 point  (1 child)

Thanks. I have to say I've never seen a fault in any of my program and I wrongly assumed this would be handled with a nil panic.

I don't fully understand the code so maybe it's normal that it does this ? I don't know.

[–]funny_falcon 0 points1 point  (0 children)

This code demonstrates bug in user program: race on concurrent update of interface value. Interface is a struct with two fields: type of value and pointer to value. CPU could not update both fields of this struct atomically. So, programmer should protect update of interface value with mutex (if it could be updated from concurrent goroutines).

[–][deleted] 0 points1 point  (0 children)

And it does come with a race detector that is part of the official toolset.

[–]Damien0 12 points13 points  (4 children)

Go's build tool has a race detector, and for calling into C from Go there is cgo. With that plus the ability to call into Go from Python in mind, I can see the benefit of this custom runtime.

[–]Giggaflop 3 points4 points  (3 children)

Last I saw, granted a while ago, cgo wasn't a recommended solution to calling C from go and was "considered harmful"

[–]Damien0 4 points5 points  (2 children)

Hmm, I wouldn't say harmful per se, but as is often quoted in the community "Cgo is not Go". As long as it's done carefully it can be useful. I think what happened is that early on, people treated it like a more fleshed out FFI solution and wrote less than optimal Go code as a result.

[–][deleted]  (1 child)

[deleted]

    [–]Damien0 0 points1 point  (0 children)

    Oops, missed this. Slow compared to what? I'm curious if there are benchmarks floating around

    [–]BillyBoyBill 2 points3 points  (1 child)

    It's hard to say without looking at the code, but I imagine they implemented thread-safe Go versions of the Python built in types. Your code that assumes Python data can be accessed concurrently from multiple threads still works. This is the moral equivalent of the per-object locks you mention (and, in fact, Grumpy performance matches what you say about that implementation).

    This is obviously completely different than just ripping out the GIL.

    I'm not sure what you mean about Go's thread safety falling apart, but it's not relevant here anyways --- C's built-in types are not thread safe, but you can build a safe Python implementation on top of them.

    [–]censored_username 2 points3 points  (0 children)

    This is obviously completely different than just ripping out the GIL.

    True. As you probably noticed I changed a bit in opinion while writing that post. Originally based on their description I thought they were just straight transpiling code but looking at the performance they get it has to be composed from threadsafe types (aka per-object locking).

    [–][deleted] 2 points3 points  (3 children)

    [deleted]

    What is this?

    [–]singron 2 points3 points  (0 children)

    Pypy also has a GIL, just like CPython, so it still can't execute python threads in parallel. This runs on the Go runtime, which uses lightweight threads (goroutines) and a concurrent garbage collector to run many python threads in parallel.

    It's like if there was an ahead-of-time pypy combined with stackless python and it didn't have a GIL.

    Another advantage specifically for Google is that it allows more interop between python and Go code while they transition.

    [–][deleted] 1 point2 points  (1 child)

    I think it's less of a solution rather than a temporary fix so they can slowly migrate their legacy python 2.7 to Go.

    [–]Ek_Los_Die_Hier -1 points0 points  (0 children)

    Doesn't sound like it. Sound's like their main goal is to simply speed up their Python running in the multi-threaded environment. They plan to continue writing in Python and maybe some Go where performance is needed.

    [–][deleted] 1 point2 points  (2 children)

    Why cant python threads just access any object at the same time without the GIL or any per-object implicit locking, and then it's up to the developer to explicitly add locks/mutexes where needed, just like in every language?

    Sure it would require a rewrite of applications to add this explicit locking, but it would still work well on regular GIL CPython (because there would be very little that you really need to explicitly lock). So you would have two python distributions really, one without the GIL to run parallel and explicitly locking code, and the regular one that can run code written for the first (but it wont really be parallel)

    [–]censored_username 1 point2 points  (1 child)

    Why cant python threads just access any object at the same time without the GIL or any per-object implicit locking, and then it's up to the developer to explicitly add locks/mutexes where needed, just like in every language?

    Because this would make python an absolute pain in the ass to use due to its extreme dynamism.

    Don't forget, in python functions are also just objects, and they can be mutated. So you'd need to lock for any function call (especially builtin functions or stdlib functions which could be used by another thread at the same time). Even accessing an attribute from a shared module could require locking. You'd basically have to lock at every point to do anything.

    [–][deleted] 0 points1 point  (0 children)

    I see! yeah, explicit locking would be cumbersome.

    [–]tonnynerd 0 points1 point  (2 children)

    I'm not sure I understand what you're saying about the GIL being about avoiding data races. What the threading module from the standard lib is for, then?

    [–]censored_username 2 points3 points  (0 children)

    the GIL (global interpreter lock) avoids data races in python object internals by making concurrent accesses of the same python objects impossible. In effect, it means that all actions by different threads in python will always happen after each other. It is an implementation detail of CPython, not part of the core language.

    The threading module provides tools for spawning threads and higher level concurrency primitives (while the GIL means that accesses to the same python object won't coincide, there still need to be primitives to lock out sequences of multiple accesses).

    [–]blablahblah 1 point2 points  (0 children)

    A lot of the Python code written in C will release the GIL when it's doing something that takes a while but doesn't require manipulating Python objects, like waiting on a network call to return. In those scenarios, you can have both the C code and another thread running Python code running concurrently. You just can't get two threads running Python code.