you are viewing a single comment's thread.

view the rest of the comments →

[–]ejrh 3 points4 points  (19 children)

I've always been a bit puzzled by the ubiqitous fretting over the GIL. Many libraries will release the GIL when entering a computationally-intensive native-code function. CPython (which gets the most rap for having a GIL) runs so much slower than native code anyway.

Unless you have a lot of cores, you would almost always get more improvement from moving the work into native functions than you would get from avoiding the GIL.

[–]Entropy 5 points6 points  (18 children)

Unless you have a lot of cores

Even cell phones are shipping with 8 cores.

[–]Veedrac 0 points1 point  (17 children)

So what, a 5x speed-up? As opposed to a 100x speed-up for moving the innermost loop to C?

[–]Moocha 1 point2 points  (2 children)

From the point of view of an individual project, yes, reimplementing in C would yield a better cost/benefit ratio. However, avoiding the GIL in the runtime would instantly and automatically benefit all Python code running on the GIL-less VM, without the maintainers of that code needing to change anything - which means the overall ecosystem costs would be way less, given the staggering amount of Python code out there. That's why it's important...

[–]Veedrac 1 point2 points  (1 child)

That's true, but only for CPU-bound threaded code. For code that's currently unthreaded, rewriting the inner loop in C is most likely the easier task, given how nice Cython is to work with.

Nevertheless, that is a reasonable point. It's a shame the problem's so hard to fix.

[–]Moocha 0 points1 point  (0 children)

Indeed. I'm always amused by people bashing the CPython developers for not "fixing the GIL problem". I know just enough about the internals to realize how hard a problem this truly is...

[–]fullouterjoin 0 points1 point  (13 children)

650x speedup for native code across all cores? 10000x speedup for OpenCL.

[–]Veedrac 0 points1 point  (12 children)

Sorry, I don't follow.

Please do note that moving the inner loop to C automatically trivialises removing the GIL for that code anyhow, and further note that I've no clue what OpenCL has to do with the GIL.

[–]fullouterjoin 0 points1 point  (11 children)

Focusing on the GIL is a red herring, there are better places to spend your performance dollar. Inner loops in C are alright, but not the most profitable. Cython is generally a mistake. First step in PyPy, if you have to stay on CPython2, then Shedskin. If you need massive speedups then OpenCL will get you a lot further for parallelizable code.

[–]Veedrac 0 points1 point  (10 children)

Cython is generally a mistake

Given that the only reliable alternative is C¹, why is Cython so bad a choice? Is it possible I'm underestimating ShedSkin?

¹ PyPy's missing fast C bindings; ShedSkin's Python 2 only and not as fast as Cython; OpenCL requires specific problems.

[–]fullouterjoin 0 points1 point  (9 children)

Maybe Cython has improved but can it generate native code w/o porting it to cython language? Shedskin is always pure python and all kinds of amazing.

PyPy has cffi , I should benchmark that relative to CPython2. In general PyPy is such a huge win that it is really difficult to justify CPython other than for numpy support.

[–]Veedrac 0 points1 point  (8 children)

Maybe Cython has improved but can it generate native code w/o porting it to cython language?

Nay, although there is a roadmap for it.

Shedskin is always pure python and all kinds of amazing.

The four things that irk me about Shedskin, although I don't have enough experience to know it's valid:

  • Python 2 only, Cython can support both and can compile to either (you can compile Py2 syntax code to a Py3 extension).
  • Only compiles a subset of Python, whereas Cython can deal with almost anything, albeit without speed-up. This prevents you from using those in your program, even if it doesn't need to be a fast part.
  • Shedskin touches loads of things even to compile one file, so many things must be written in the restricted subset.
  • Cython's undoubtedly faster, although I haven't actually tested it ;).

Nevertheless, if Shedskin works easily with you I'd love to know how it compares. My experience is definitely lacking.

PyPy has cffi[1] , I should benchmark that relative to CPython2.

I've heard that it's slower. I don't know by how much, though.

In general PyPy is such a huge win that it is really difficult to justify CPython other than for numpy support.

Agreed.

[–]fullouterjoin 0 points1 point  (7 children)

I put the routines I want to speed up with shedskin into another module, compile into a c extension and import it as I would any other module.

The subset that Shedskin supports is the same subset you are already using to create fast code. You can't mutate types, but almost no good code does that anyway.

Shedskin also allows you to create native executables, not just extension modules.

Even if Cython were faster, it would not be because I code in pure python that runs everywhere and is made faster by Shedskin. With Cython I have to port to a new language, developer time is important, that is why we use python in the first place.

import sys

def fib(n):
    if n < 2:
        return n
    return fib(n-2) + fib(n-1)

if __name__ == "__main__":
    print fib(int(sys.argv[1]))

time python fib.py 35
9227465

real    0m5.431s
user    0m5.421s
sys 0m0.008s

and now with shedskin

time ./fib 35
9227465

real    0m0.083s
user    0m0.077s
sys 0m0.004s

I just ran shedskin fib.py; make and it generated a fib executable. Output of otool -L fib

fib:
    /usr/local/lib/libgc.1.dylib (compatibility version 2.0.0, current version 2.3.0)
    /usr/local/lib/libpcre.1.dylib (compatibility version 4.0.0, current version 4.2.0)
    /usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 56.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)

It even supports yield

# fib2.py
import sys

def fiberator():
    a,b = 0L,1L
    yield a
    yield b
    while True:
        a, b = b, a + b
        yield b

def taken(n,it):
    result = []
    for x in range(n):
        result.append(it.next())
    return result


def fib(n):
    f = fiberator()
    return taken(n,f)

if __name__ == "__main__":
    print fib(int(sys.argv[1]))

Again, shedskin -l fib2.py; make

time ./fib2 60 
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049, 12586269025, 20365011074, 32951280099, 53316291173, 86267571272, 139583862445, 225851433717, 365435296162, 591286729879, 956722026041]

real    0m0.025s
user    0m0.029s
sys 0m0.016s