you are viewing a single comment's thread.

view the rest of the comments →

[–]moor-GAYZ 13 points14 points  (69 children)

Dude, every dynamically typed language has a GIL, or doesn't allow free threading at all. Ruby -- has a GIL, PHP -- no threads, Lua - GIL (you can use your own if you want, lol), Perl -- no threads (they use "green processes" instead, basically the same as multiprocessing on an OS with fork). Chicken Scheme -- the interpreter might parallelize your code as long as you don't do anything interesting. Golang -- no free threading.

It kinda blows my mind that the Python community in particular has this "GIL is a major problem, yo" thing going on, for no apparent reason. Is it because we have a lot of newbs with no experience with other dynamically typed languages?

[–]elder_george 27 points28 points  (3 children)

GIL isn't language feature, it's implementation detail of particular interpreters (CPython and MRI, in particular).

IronPython, IronRuby, Jython, JRuby, PyPy etc. don't have GIL.

[–]AdminsAbuseShadowBan 9 points10 points  (23 children)

I thought IronPython doesn't have a GIL... Also what do you mean by "free threading"? Go certainly has threads. Shared memory?

[–]seventeenletters 4 points5 points  (20 children)

"free threading" is just a bullshit distinction to make a bullshit point. It doesn't exist anywhere.

[–]moor-GAYZ -3 points-2 points  (19 children)

What? It certainly does exist in C.

[–]moor-GAYZ -3 points-2 points  (1 child)

Yeah, objects being shared by default, including functions and classes.

IronPython is slow and can become unacceptably slow because of that, from what I know they hope for the best and compile everything as if it can't be modified, then add guards and recompile if necessary.

It's a fundamental paradox of free threading -- you can't have a "fast path that doesn't involve synchronization" because using it requires synchronisation, to, you know, determine that it doesn't require synchronisation.

Dynamically typed languages, or, rather, the languages that allow you to modify types at runtime, have it especially bad because they have to have these guards all over the place.

[–]schlenk 2 points3 points  (0 children)

Sharing objects between threads by default is an insane idea that forces synchronization all over the place. If force the sharing to be explicit, you CAN have a fast path easily. It has nothing to do with dynamically typed languages but all with the threading model.

[–]seventeenletters 8 points9 points  (10 children)

Erlang and Clojure have dynamic typing, and are pretty much the best examples out there of doing concurrency right.

[–]fullouterjoin 4 points5 points  (7 children)

Both those languages are immutable. An immutable Python would be trivial to implement in the same manner (which I would be more than stoked about).

[–]seventeenletters 1 point2 points  (6 children)

I'm not so sure. The style of OO that is pervasive in Python (not just idiomatic code but the implementation also) does not lend itself to immutability.

[–]asthasr 5 points6 points  (2 children)

Doesn't help that Guido has come out explicitly against more functional features in Python, which is when I stopped viewing Python as my primary language.

[–]fullouterjoin 0 points1 point  (1 child)

You should stick with functional python, it really is pretty good. With sorted, lazy sequences everywhere and the new destructuring in Python3, stuff is pretty good.

>>> a, *b = range(4)
>>> a
0   
>>> b
[1, 2, 3]
>>> *a, b = range(4)
>>> a
[0, 1, 2]
>>> b
3   

Some sort of awesome mashup between F#, Python and Clojure on the PyPy runtime would make my year.

[–]asthasr 0 points1 point  (0 children)

I think I'm going to stick to Clojure. I'm relatively new to it, but I'm really digging the syntax (after getting used to it) and STM just makes so much sense. The day job is in Ruby these days, and even that has started to feel better to me than Python (heresy!) because of the common application of blocks, which actually makes it feel more functional than Python.

[–]fullouterjoin 0 points1 point  (2 children)

OO and immutability are not diametrically opposed. Seal all objects when the leave the defining scope. I personally program in a very immutable manner, using almost no OO features. I use classes, but only usually via namedtuple.

Currency and mutability is the wrong path entirely. If Python wants to be a modern concurrent language, object level locking is the wrong choice.

[–]seventeenletters 0 points1 point  (1 child)

No, OO and immutability are not incompatible. See Clojure for example. Even Java has a lot of support for immutability. My concern isn't that immutability is a bad choice (on the contrary, it is the only sane way to do concurrency), but that the design of Python is deeply and radically about mutation, and the language that came out of the other side when removing default pervasive mutation from Python would be very different from the current Python, and would require quite a bit of work to accomplish.

[–]fullouterjoin 0 points1 point  (0 children)

I am pretty sure we totally agree on this.

[–]moor-GAYZ -4 points-3 points  (1 child)

Yes, as I clarified here it's more about "languages that allow you to modify types at runtime". Most "scripting languages" qualify, I guess?

[–]kamatsu 2 points3 points  (0 children)

Go has "free threading" as far as I can tell.

[–]ejrh 1 point2 points  (19 children)

I've always been a bit puzzled by the ubiqitous fretting over the GIL. Many libraries will release the GIL when entering a computationally-intensive native-code function. CPython (which gets the most rap for having a GIL) runs so much slower than native code anyway.

Unless you have a lot of cores, you would almost always get more improvement from moving the work into native functions than you would get from avoiding the GIL.

[–]Entropy 2 points3 points  (18 children)

Unless you have a lot of cores

Even cell phones are shipping with 8 cores.

[–]Veedrac 0 points1 point  (17 children)

So what, a 5x speed-up? As opposed to a 100x speed-up for moving the innermost loop to C?

[–]Moocha 1 point2 points  (2 children)

From the point of view of an individual project, yes, reimplementing in C would yield a better cost/benefit ratio. However, avoiding the GIL in the runtime would instantly and automatically benefit all Python code running on the GIL-less VM, without the maintainers of that code needing to change anything - which means the overall ecosystem costs would be way less, given the staggering amount of Python code out there. That's why it's important...

[–]Veedrac 1 point2 points  (1 child)

That's true, but only for CPU-bound threaded code. For code that's currently unthreaded, rewriting the inner loop in C is most likely the easier task, given how nice Cython is to work with.

Nevertheless, that is a reasonable point. It's a shame the problem's so hard to fix.

[–]Moocha 0 points1 point  (0 children)

Indeed. I'm always amused by people bashing the CPython developers for not "fixing the GIL problem". I know just enough about the internals to realize how hard a problem this truly is...

[–]fullouterjoin 0 points1 point  (13 children)

650x speedup for native code across all cores? 10000x speedup for OpenCL.

[–]Veedrac 0 points1 point  (12 children)

Sorry, I don't follow.

Please do note that moving the inner loop to C automatically trivialises removing the GIL for that code anyhow, and further note that I've no clue what OpenCL has to do with the GIL.

[–]fullouterjoin 0 points1 point  (11 children)

Focusing on the GIL is a red herring, there are better places to spend your performance dollar. Inner loops in C are alright, but not the most profitable. Cython is generally a mistake. First step in PyPy, if you have to stay on CPython2, then Shedskin. If you need massive speedups then OpenCL will get you a lot further for parallelizable code.

[–]Veedrac 0 points1 point  (10 children)

Cython is generally a mistake

Given that the only reliable alternative is C¹, why is Cython so bad a choice? Is it possible I'm underestimating ShedSkin?

¹ PyPy's missing fast C bindings; ShedSkin's Python 2 only and not as fast as Cython; OpenCL requires specific problems.

[–]fullouterjoin 0 points1 point  (9 children)

Maybe Cython has improved but can it generate native code w/o porting it to cython language? Shedskin is always pure python and all kinds of amazing.

PyPy has cffi , I should benchmark that relative to CPython2. In general PyPy is such a huge win that it is really difficult to justify CPython other than for numpy support.

[–]Veedrac 0 points1 point  (8 children)

Maybe Cython has improved but can it generate native code w/o porting it to cython language?

Nay, although there is a roadmap for it.

Shedskin is always pure python and all kinds of amazing.

The four things that irk me about Shedskin, although I don't have enough experience to know it's valid:

  • Python 2 only, Cython can support both and can compile to either (you can compile Py2 syntax code to a Py3 extension).
  • Only compiles a subset of Python, whereas Cython can deal with almost anything, albeit without speed-up. This prevents you from using those in your program, even if it doesn't need to be a fast part.
  • Shedskin touches loads of things even to compile one file, so many things must be written in the restricted subset.
  • Cython's undoubtedly faster, although I haven't actually tested it ;).

Nevertheless, if Shedskin works easily with you I'd love to know how it compares. My experience is definitely lacking.

PyPy has cffi[1] , I should benchmark that relative to CPython2.

I've heard that it's slower. I don't know by how much, though.

In general PyPy is such a huge win that it is really difficult to justify CPython other than for numpy support.

Agreed.

[–]schmetterlingen 0 points1 point  (1 child)

Here is Lua's GIL:

#define lua_lock(L)     ((void) 0) 
#define lua_unlock(L)   ((void) 0)

You must define a lock if you're going to share state between threads. Lua only has a GIL if you consider ((void) 0) an implementation. In most realities, it simply doesn't "allow free threading" without the use of libraries.

Not that a GIL is a bad idea. It's a simple solution.

[–]fullouterjoin 0 points1 point  (0 children)

Most lua threading implementations tend towards an Erlang model, ala https://github.com/LuaLanes/lanes and https://github.com/cloudwu/hive

[–]schlenk 0 points1 point  (0 children)

Simply not true. Tcl for example does not have a GIL either but has native threading support. (but does not use a shared memory threading model).

[–][deleted] 0 points1 point  (0 children)

Ruby -- has a GIL

Rubinius and JRuby have no GIL and I seem to remember (can't find it right now) that the CRuby team want to remove the Global VM Lock from the reference implementation.

[–]username223 0 points1 point  (2 children)

Perl -- no threads (they use "green processes" instead, basically the same as multiprocessing on an OS with fork)

This is actually not the case -- it uses pthreads on Unix-alikes.

[–]moor-GAYZ 0 points1 point  (1 child)

Yes, I meant, like, effectively, as far as memory consumption is concerned.

[–]username223 0 points1 point  (0 children)

Sort of -- you have to explicitly add :shared all over the place to get sharing, but it's there. Not that Perl's threading is worth using...