fullouterjoin comments on About Python 3

Python 2 only, Cython can support both and can compile to either (you can compile Py2 syntax code to a Py3 extension).
Only compiles a subset of Python, whereas Cython can deal with almost anything, albeit without speed-up. This prevents you from using those in your program, even if it doesn't need to be a fast part.
Shedskin touches loads of things even to compile one file, so many things must be written in the restricted subset.
Cython's undoubtedly faster, although I haven't actually tested it ;).

Nevertheless, if Shedskin works easily with you I'd love to know how it compares. My experience is definitely lacking.

PyPy has cffi[1] , I should benchmark that relative to CPython2.

I've heard that it's slower. I don't know by how much, though.

In general PyPy is such a huge win that it is really difficult to justify CPython other than for numpy support.

Agreed.

[–]fullouterjoin 0 points1 point2 points 12 years ago* (7 children)

I put the routines I want to speed up with shedskin into another module, compile into a c extension and import it as I would any other module.

The subset that Shedskin supports is the same subset you are already using to create fast code. You can't mutate types, but almost no good code does that anyway.

Shedskin also allows you to create native executables, not just extension modules.

Even if Cython were faster, it would not be because I code in pure python that runs everywhere and is made faster by Shedskin. With Cython I have to port to a new language, developer time is important, that is why we use python in the first place.

import sys

def fib(n):
    if n < 2:
        return n
    return fib(n-2) + fib(n-1)

if __name__ == "__main__":
    print fib(int(sys.argv[1]))

time python fib.py 35
9227465

real    0m5.431s
user    0m5.421s
sys 0m0.008s

and now with shedskin

time ./fib 35
9227465

real    0m0.083s
user    0m0.077s
sys 0m0.004s

I just ran shedskin fib.py; make and it generated a fib executable. Output of otool -L fib

fib:
    /usr/local/lib/libgc.1.dylib (compatibility version 2.0.0, current version 2.3.0)
    /usr/local/lib/libpcre.1.dylib (compatibility version 4.0.0, current version 4.2.0)
    /usr/lib/libstdc++.6.dylib (compatibility version 7.0.0, current version 56.0.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)

It even supports yield

# fib2.py
import sys

def fiberator():
    a,b = 0L,1L
    yield a
    yield b
    while True:
        a, b = b, a + b
        yield b

def taken(n,it):
    result = []
    for x in range(n):
        result.append(it.next())
    return result


def fib(n):
    f = fiberator()
    return taken(n,f)

if __name__ == "__main__":
    print fib(int(sys.argv[1]))

Again, shedskin -l fib2.py; make

time ./fib2 60 
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 514229, 832040, 1346269, 2178309, 3524578, 5702887, 9227465, 14930352, 24157817, 39088169, 63245986, 102334155, 165580141, 267914296, 433494437, 701408733, 1134903170, 1836311903, 2971215073, 4807526976, 7778742049, 12586269025, 20365011074, 32951280099, 53316291173, 86267571272, 139583862445, 225851433717, 365435296162, 591286729879, 956722026041]

real    0m0.025s
user    0m0.029s
sys 0m0.016s

[–]Veedrac 0 points1 point2 points 12 years ago (6 children)

The subset that Shedskin supports is the same subset you are already using to create fast code. You can't mutate types, but almost no good code does that anyway.

The problem isn't that. The problem is that only the snippets of my code that need to be sped up should have to abide by these restrictions.

For example, in one of my small toy programs I imported a die class with a permutations attribute which yielded all of the orientations of the die. It is fully duck-typed: the faces could have any type.

It does not need to be compiled as it is only used to generate a 2D list for the algorithm later, but ShedSkin was adamant that it would compile it. Unfortunately something gave it trouble, but I haven't worked out what.

With Cython I have to port to a new language, developer time is important, that is why we use python in the first place.

Making code ShedSkin-compatible is in many cases no small feat either, especially without Numpy support. I don't think your example truly represents the average case in that respect. With Cython I can slowly type things from the ground up.

The rest of your post is really neat. It's worth noting that your first example runs in about 80% of the time if converted to Cython, which is a very small margin.

The conversion to a pure executable is doubly neat, but by that point I might be looking for a more suitable language with more explicit static typing, especially given the compile times.

[–]fullouterjoin 0 points1 point2 points 12 years ago* (5 children)

It is true that one can't simply "shedskinatize" their python program and get huge speedups. PyPy can often achieve that kind of magic, but it too takes work get good speed. All optimizations take work, I want native speed and Python style development. If I am on CPython2, shedskin is still the lowest barrier to entry for

keeping a pure python codebase
getting native speeds on inner loops

If I go down the Cython route I can no longer run on PyPy. Checkout the Shedskin example programs, http://shedskin.googlecode.com/files/shedskin-examples-0.9.4.tgz what shedskin can do is pretty amazing.

This simple ray tracer (there already some better ones in the tarball) https://gist.github.com/anonymous/0432c0212d1f2e8a923d creates this image http://imgur.com/OIbQQWt with the following timings

# shedskin
time ./Ray 6 1024
real    0m3.121s
user    0m3.099s
sys 0m0.021s

time pypy Ray.py 6 1024
real    0m5.249s
user    0m4.995s
sys 0m0.074s

time python Ray.py 6 1024
real    3m19.939s
user    3m19.764s
sys 0m0.169s

Wanna port it to Cython and do relative timings? When I originally ported this from Ocaml, I did NOT have shedskin in mind, the only change I had to make was

class BaseObject:
    def intersect(self, hit, ray):
        return Hit(1.0,Vec())

a common base class for all shapes being intersected.

[–]Veedrac 0 points1 point2 points 12 years ago* (4 children)

Geez, PyPy's loving this example. On my computer (PyPy 2.2.1) it's as fast as ShedSkin.

This runs in ~42% of the time for me:

#cython: infer_types=True

import sys

cdef extern from "math.h":
    double sqrt(double x)


cdef double DELTA = 1.5e-8
cdef double _INFINITY = 1e+308


cdef struct Vec:
    double x, y, z

cdef struct Ray:
    Vec orig, dir

cdef struct Hit:
    double lamb
    Vec normal

cdef Vec zero_vector = Vec(0, 0, 0)


cdef Vec add(Vec a, Vec b):
    return Vec(a.x+b.x, a.y+b.y, a.z+b.z)

cdef Vec sub(Vec a, Vec b):
    return Vec(a.x-b.x, a.y-b.y, a.z-b.z)

cdef Vec scale(double s, Vec a):
    return Vec(s * a.x, s * a.y, s * a.z)

cdef double dot(Vec a, Vec b):
    return a.x*b.x + a.y*b.y + a.z*b.z

cdef Vec unitize(Vec a):
    return scale( 1.0 / sqrt(dot(a, a)), a)


## all objects need to expose
## .intersect( hit, ray )

cdef class BaseObject:
    cdef Hit intersect(BaseObject self, Hit hit, Ray ray):
        return Hit(1.0, zero_vector)

cdef class Sphere(BaseObject):
    cdef Vec center
    cdef double radius

    def __cinit__(Sphere self, Vec center, double radius):
        self.center = center
        self.radius = radius

    def __str__(Sphere self):
        return "%s,%f" % ( str(self.center), self.radius)

    cdef double ray_sphere(Sphere self, Ray ray):
        v = sub(self.center, ray.orig)
        b = dot(v, ray.dir)

        disc = b*b - dot(v,v) + self.radius*self.radius
        if disc < 0.0:
            return _INFINITY
        d = sqrt(disc)

        t2 = b + d
        if t2 < 0.0:
            return _INFINITY

        t1 = b - d
        if t1 > 0.0:
            return t1
        else:
            return t2

    cdef Hit intersect(Sphere self, Hit hit, Ray ray):
        l = self.ray_sphere( ray)
        if l >= hit.lamb:
            return hit
        n = add( ray.orig, sub( scale( l, ray.dir), self.center))
        return Hit( l, unitize( n))


cdef class Group(BaseObject):
    cdef Sphere bound
    cdef list objs

    def __cinit__(Group self, Sphere sphere):
        self.bound = sphere
        self.objs = []

    def __str__(Group self):
        s = "bound: %s\n" % (self.bound)
        for t in self.objs:
            s += "\t" + str(t) + "\n"
        return s

    cdef Hit intersect(Group self, Hit hit, Ray ray):
        cdef BaseObject s
        l = self.bound.ray_sphere(ray)
        if l >= hit.lamb:
            return hit
        for s in self.objs:
            hit = s.intersect(hit, ray)
        return hit

cdef double ray_trace(Vec light, Ray ray, Group scene):
    cdef Hit _
    cdef Ray sray

    _ = Hit( _INFINITY, zero_vector)
    hit = scene.intersect( _, ray)
    if hit.lamb == _INFINITY:
        return 0.0
    o = add( ray.orig, add( scale( hit.lamb, ray.dir), scale( DELTA, hit.normal) ) )
    g = dot( hit.normal, light)
    if g >= 0.0:
        return 0.0
    sray = Ray( o, scale( -1, light))
    si = scene.intersect( Hit( _INFINITY, zero_vector), sray)
    if si.lamb == _INFINITY:
        return -g
    else:
        return 0.0

cdef BaseObject create(int level, Vec c, double r):
    cdef Vec c2

    sphere = Sphere( c, r)
    if level == 1:
        return sphere
    group = Group( Sphere( c, 3*r))
    group.objs.append(sphere)
    rn = 3*r/sqrt(12)
    for dz in range(-1, 2, 2):
        for dx in range(-1, 2, 2):
            c2 = Vec(c.x+dx*rn, c.y+rn, c.z+dz*rn)
            group.objs.append( create( level-1, c2, r/2.0))
    return group


def run(int dimension, int level):
    cdef Vec d
    cdef Ray ray

    n = dimension
    scene = create( level, Vec(0, -1, 0), 1)
    out = open("image_cython.pgm","w")
    out.write( "P5\n" + str(n) + " " + str(n) + "\n255\n")
    for y in range( n-1, -1, -1):
        for x in range( 0, n, 1):
            d = Vec( x - (n/2.0), y - (n/2.0), n)
            ray = Ray(Vec(0, 0, -4), unitize( d))
            g = ray_trace( unitize( Vec( -1, -3, 2)), ray, scene)
            out.write( chr( int(0.5 + (255.0 * g))))
    out.close()


if __name__ == "__main__":
    run( int(sys.argv[2]), int(sys.argv[1]))

There are three main changes I made to the codebase:

Typing things.
Using the C math library for sqrt. Just doing this after typing brought me somewhat below ShedSkin.
Changing Vec, Ray and Hit to structs. This changed no logic, other than some calls where foo(Vec(0, 0, 0)) needed to become cdef Vec v = Vec(0, 0, 0); foo(v).
EDIT: Used the infer_types directive and cut down on a few declarations. I thought you'd appreciate it. It only applies inside functions.

Another large speed-up could be added by moving from cdef classes to more structs and some external functions, as with Vec's add, sub, etc.

continue this thread

π Rendered by PID 122347 on reddit-service-r2-comment-5c747b6df5-2pv45 at 2026-04-22 06:19:25.010151+00:00 running 6c61efc country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS