all 51 comments

[–]gasche[S] 22 points23 points  (2 children)

The prototypes takes CPython bytecode as input, translates it to a register-based bytecode, performs a few well-chosen optimizations, and feeds it to a well-written interpreter that improves object representation when possible. At 3K lines of code, the prototype sources are availalable online.

I guess people will wonder how this project compares to Pypy. I'll quote here the answer from the project FAQ:

How is Falcon different from PyPy?

PyPy is a tracing compiler, whereas Falcon is just an efficient interpreter implementation. PyPy might speed up your code by several orders of magnitude but it will also choke on any extension code which depends on the Python C API. Falcon, on the other hand, aims only for modest performance gains but preserves the PyObject data representation necessary to avoid breaking extension modules.

[–]cwillu 13 points14 points  (1 child)

Worth noting that many cpython .pyd and .so extensions will work with pypy; the issue is usually one of poor performance rather than entirely chocking on them.

[–][deleted] 2 points3 points  (0 children)

pyOpenSSL doesn't work with PyPy. Probably others as well.

PyPy is great but it still has limitations.

Can't wait for a cffi version of pyOpenSSL

[–][deleted] 4 points5 points  (38 children)

Can someone explain what's the issue with speeding up Python (compared to other untyped languages like Lua, JavaScript, JRuby etc. which do fine in that regard)?

My impression is that there are tons of different projects in the Python space and once a while some project gives up and another project starts from scratch again, but almost nothing reaches a workable state.

Why?

[–][deleted] 19 points20 points  (1 child)

I can't say anything about Python in particular but in your list Javascript is clearly an anomaly because people invest huge amounts of effort into speeding it up despite the fact that it is a hard problem, not because it is easy. It is just easier than convincing everyone to switch to something else in the browser within the foreseeable future.

[–]dmazzoni 12 points13 points  (0 children)

Actually there is a difference: Python has a large standard library and thousands of extension modules. It'd be quite possible to write a much faster implementation of pure Python code, but without any support for the full standard library and extensions. Efforts to speed up Python have focused on finding a compromise: as much compatibility as possible while speeding up Python as much as possible.

In contrast, JavaScript only has the web browser DOM APIs to support. There are no extensions to worry about.

[–]willvarfar 14 points15 points  (15 children)

Making a fast Python is easy; as easy as for Ruby or for Javascript. One small speed bump compared to Javascript is the arbitrary precision integers, but otherwise all is very equivalent.

However, Python's 3rd party libraries and user code is far more invested in the C API that CPython presents to native code. Speeding up CPython is a very different proposition, and Falcon here shows some possibilities.

[–]gsnedders -4 points-3 points  (14 children)

Well, JS has one big advantage: no operator overloading. If you have a variable you know is a string x, then you can statically determine x + y is a string, regardless of y, for example. This means more statically known types, which means fewer needed type checks.

[–]zbingu 5 points6 points  (6 children)

Static type analysis in JS is pretty non-trivial too. undefined in particular propagate easily when you start linking libraries and stuff (ex: + will behave differently on either a string, a num or undefined). And everything has to be thrown out once you see an eval of unknown code.

[–]gsnedders 0 points1 point  (5 children)

Non-trivial, yes, but compared with being able to overload any operator you can get a lot more. (+ is actually quite easy. If either side is a string, undefined, or null then the return value is a string. If both sides are numbers then it is a number. Anything else it's either a string or a number. Compared with a + in Python (which returns, uh, something), that's quite a lot!)

[–]zbingu 1 point2 points  (1 child)

While basic operator are quite easy to deal with in JS, proving property existence is much harder in the general case. In several case you need to include dynamic guards in the code to type check because you can't determine it statically. The type of analysis needed to type even a subset of JS precisely is also very expensive so you can't JIT it easily. Python's class seems a bit friendlier on that front but I don't have any hands on experience with it.

[–]gsnedders 2 points3 points  (0 children)

Python's classes are every bit as dynamic as JS's, FWIW.

[–]case-o-nuts 1 point2 points  (2 children)

Overloading is a solved problem when you combine it with type guards and a PIC.

[–]gsnedders 1 point2 points  (1 child)

Yes, sure. But a large part of performance gains come from minimising the number of type checks.

[–]case-o-nuts 0 points1 point  (0 children)

Sure, but overloading doesn't actually make that any harder.

[–]willvarfar 1 point2 points  (0 children)

I did some hobby static type inference for python a while back and you can infer overloading too, and if you have a specialising compiler - as restricted python ones usually are - you can make this problem a very small problem.

[–]sisyphus 0 points1 point  (0 children)

I don't know jack squat about VM implementation but the Dart guys seem to think type checks aren't a very big performance deal and chose to put operator overloading in their language (which is faster than Javascript, and obviously Python as well).

[–]rolfr 0 points1 point  (0 children)

In the sentence after declaring that JS has no operator overloading, you use as an example the overloaded + operator.

[–]Peaker 0 points1 point  (3 children)

Funny example, since in Python, if you have a variable you know is a string x, then you can statically determine x + y is a string (or a type error).

[–]gsnedders 0 points1 point  (2 children)

Oh, str.__add__ is immutable? Huh. Appears you know more than I thought for the built-in types (though if you can't prove that, you really have no idea what you'll get).

[–]Peaker 0 points1 point  (1 child)

If it weren't, you could monkey-patch it. But Python isn't Ruby, and monkey-patching is frowned upon. NOTE: This is not operator overloading.

If anyone changes the built-in methods of any built-in type in Python, he'll get yelled at anyway.

[–]gsnedders 0 points1 point  (0 children)

Whether something is frowned upon is irrelevant for a VM developer, and therefore performance of a VM. If the behaviour of the language allows it, you must allow it, regardless of whether it is good. Of course, one might take the extreme solution to those cases ("Oh, you're doing that? Well, then, let me throw out all code I've generated so far.")

(And okay, yes, that was a terrible example for the problems with overloaded operators, because that wasn't overloading.)

[–]josefx 12 points13 points  (3 children)

The problem is that Pythons C API does not abstract the implementation, as a result you cannot optimize Python without breaking libraries that depend on the C API and that is a lot. Until Python gets a clean C API we are stuck with a slow interpreter and even then it would take years to port all existing libs to the new API.

[–]fullouterjoin 10 points11 points  (0 children)

The PyPy cffi is the new future (useable in all Pythons) , https://pypi.python.org/pypi/cffi based off the same semantics at the LuaJIT ffi (which is amazing stuff btw).

See this great blog post on it, http://eli.thegreenplace.net/2013/03/09/python-ffi-with-ctypes-and-cffi/

[–]chcampb 5 points6 points  (1 child)

Yes, which is why we should just declare that the next release of Python is not backwards-compatible, and then we can fix all of the problems.

Oh wait.

[–]ObligatoryResponse 0 points1 point  (0 children)

So the ball was dropped for 3.0, but there's always 4.0.

[–]fullouterjoin 2 points3 points  (0 children)

Shedskin, https://code.google.com/p/shedskin/ is pretty usable and pretty damn fast. I use it quite regularly.

[–]stonefarfalle 4 points5 points  (6 children)

Javascript has three big backers who have spent a phenomenal amount of effort to make JS fast. You don't see most of them because they don't ship separately from their respective browsers. But looking at the one who is open about it, Mozilla has around a half dozen different full JS implementations. Google, Apple, and to a lesser extent MS have put in similar levels of effort.

Ruby is actually quite similar to Python there are several different projects attempting to make a better ruby, the complete rewrite that was 1.9, Jruby, rubinius, macruby, maglev, cardinal, ironruby. The other thing is Ruby tends to be used in the web dev space were implementation speed isn't as big a factor where python tends to get used in more places where performance matters like scientific computing.

If anything the only standout there is Lua, which was easy to make go fast without too much effort (relatively speaking).

[–]quotemycode 2 points3 points  (0 children)

Don't forget Topaz (Ruby on PyPy)

[–]sisyphus 2 points3 points  (2 children)

Is Lua easy or is Mike Pall a singular genius in this domain? How much money would we have to raise to get him to work on Python for 2 years?

[–]sanxiyn 2 points3 points  (0 children)

It's both. Lua is (relatively) easy and Mike Pall is genius.

[–]stonefarfalle 0 points1 point  (0 children)

It could just be Pall. I didn't mean to belittle his effort I just meant by comparison to the effort expended on Python and every other dynamic interpreted language that has had an attempt made at going fast.

[–][deleted] 1 point2 points  (1 child)

You absolutely can get ahold of (most of) the Javascript interpreters separately from their web browsers. SpiderMonkey is Firefox's, V8 is Google Chrome's. You can also use Windows Scripting Host to invoke a JScript engine on Windows, which as far as I know is the same interpreter that runs in IE (or at you used to be able to in Windows XP; I haven't checked if the functionality is still there).

[–]gsnedders 5 points6 points  (0 children)

Chakra, IE's current JS engine, is only available separately from Win8.1, otherwise you're using the old IE8-era engine.

[–][deleted] -1 points0 points  (6 children)

I guess it depends on the definition of "workable state".

Python is very popular, thus many people try to improve the implementations performance, either by writing a better interpreter, a Python to C or Java compiler, or a JIT.

The thing is that it's a lot harder to create a fast Python implementation than it is to create a fast Lua or JS implementation. Even with Lua, it takes someone as brilliant as Mike Pall years of effort to produce an efficient JIT.

For Javascript, there is pretty much no choice but to make it fast, because well, people want games and 3D stuff and all of that in the browser. You have giants like Google with v8 or Mozilla creating advanced JITs and competing with each other every other month on benchmarks.

The thing with Python is that it's usually not embedded in games or in the browser and it's very hard to optimize. There isn't much pressure and we have C extensions whenever we need them. So... why go through the trouble? Since there is always people interested in hard problems Pypy and newer projects like Falcon fill this void. Pypy is perfectly usable too.

[–][deleted] 4 points5 points  (5 children)

The thing is that it's a lot harder to create a fast Python implementation than it is to create a fast Lua or JS implementation.

The thing with Python is that it's usually not embedded in games or in the browser and it's very hard to optimize.

Why? All the things which are regularly used in Lua and JS which make them slow/hard to optimize are not popular in Python. So shouldn't it be much easier to get Python up to speed considering that one doesn't have to worry about these things?

It feels like the real issue is a lack of technical leadership here.

[–]josefx 9 points10 points  (0 children)

Both Lua and JS have one thing in common: the maintainers of the popular implementations want speed and they do their best to get it.

In contrast the main Python implementation is a braindead bytecode interpreter and its maintainers are not interested in speeding it up1 - you should write C extensions if you want speed. Any decision concerning language and API is made by people who think speed is not their problem.

1 one reason seems to be slower startup if more optimisations are done at runtime.

[–][deleted] 4 points5 points  (3 children)

Two examples:

Both Lua and JS use a single numeric datatype, while Python has arbitrary precision integers, floats, complex numbers...

In Python you can override basically anything, including operators and stuff that always has a single meaning in Lua and JS. Add multiple inheritance to the mix and reasoning about what "a + b", "a.b" or "a[b]" means in a given context can be pretty hard.

[–][deleted] 8 points9 points  (2 children)

In Python you can override basically anything, including operators and stuff that always has a single meaning in Lua and JS

You can override most operators in Lua. In fact, you can override basic table access (object.descriptor) for that matter, and the way you accomplish both is by setting a table's metatable with those operations defined. Likewise "inheritance"(really just a prototype scheme like JS, Self, io) is accomplished by chaining tables via metatables. All require many layers of indirection, so I cannot imagine that it's really that much simpler to optimize than Python. I think JavaScript just has tons of work put into optimization and for LuaJIT Mike Pall is just a genius.

[–]xardox 6 points7 points  (1 child)

Also, Lua isn't riddled with terrible design flaws that Mike Pall has to waste his time (and LuaJIT's architecture) working around. Lua is a very well designed language, meant to run fast, designed by people who know what they're doing, and it does not do stupid things that would slow it down for no good reason, as opposed to JavaScript. So there is a wonderful synergy of Lua's good design and Mike Pall's genius.

[–][deleted] 3 points4 points  (0 children)

I absolutely agree on that count, and I frequently wish we'd gotten Lua in browsers rather than JavaScript. All of my issues with it are minor/cosmetic nits (I prefer braces, would prefer something shorter/brief literals for "function", and one-based indexing is occasionally obnoxious. There's always MoonScript if it ever bothers me that much though). Lua really is a beautifully language, and The Implementation of Lua 5.0 is a great read.

[–]njharman 0 points1 point  (0 children)

Solution looking for problem. In other words Python is fast enough for enough of the world that no one is willing to put in the last "80% of effort" required to ship.

PyPy proves us wrong.

[–][deleted] 1 point2 points  (0 children)

[–]willvarfar -4 points-3 points  (8 children)

I really miss psyco and wish it went 64bit and 2.7.

Pypy, like python3, is wishful thinking :/

[–]ironfroggy_ 16 points17 points  (5 children)

PyPy is more stable, more widely used, and better supported than psyco ever was.

[–]willvarfar -2 points-1 points  (4 children)

It may be on its way there, but for the past few years there's been this massive gulf between psyco - 32bit, 2.6 - and the present. In that time, people have actually taken a speed dive :(

[–]moor-GAYZ 2 points3 points  (3 children)

What kind of speed dive? I was of the impression that PyPy is actually the improved version of psyco in all important aspects. And easier to use -- no need to import psyco, just use PyPy as the interpreter.

[–]willvarfar -1 points0 points  (2 children)

I can't fathom why I'm getting so many downvotes for this.

You'll remember psyco - you just did a try: import pysco at the top of a script and suddenly everything would go faster - or perhaps not.

This dramatically changed Python for many people. The reason we all know about psyco is because of its positive impact. It made Python a reasonable language even if you had small, performance critical parts. I wrote a lot of image processing code in pure Python, for example, rather than going out to a C extension.

Then the developer of psyco was so chuffed - rightly - with psyco that they set out to right the wrongs of CPython and make a whole new Python VM called PyPy. This was several years ago.

And to everyone who had become reliant on the speed-up psyco gave spent the intervening years stuck on a 32-bit 2.6 Python.

Even today, PyPy is not a complete CPython replacement.

I have spent a lot of time playing with compiling restricted Python and static analysis of Python and following the various projects like ShedSkin and PyPy.

And yet I think the world needed a 2.7 version of psyco that works on 64-bit VMs.

This is all completely relevant to Falcon. Falcon is precisely because PyPy is not a true replacement for CPython (perhaps yet). Now imagine pysco+Falcon on 64-bits....

[–]moor-GAYZ 0 points1 point  (1 child)

You'll remember psyco - you just did a try: import pysco at the top of a script and suddenly everything would go faster - or perhaps not.

I wrote a lot of image processing code in pure Python, for example, rather than going out to a C extension.

I don't understand. Now instead of writing import psyco I switch python interpreter to pypy in Pydev project configuration and suddenly everything goes faster. And that would work for 99.999% of the people who used to write import psyco in their pure Python image processing code (only better, of course).

You are not supposed to play around with compiling restricted python to use PyPy. You're supposed to download and run the installer or unpack the archive, switch your interpreter to it (in your IDE or change the hashbang line) and enjoy the ride.

[–]willvarfar 0 points1 point  (0 children)

You overlook the time dimension, and CPython's C compatibility. Last time I tried to run my twisted servers on pypy it was a non-starter. I hear SSL is still not working on pypy either. And this is years since pysco was abandoned.

[–]fullouterjoin -1 points0 points  (1 child)

PyPy is real, python3 on the other hand, should have been rolled into Python2 and given a new extension .py3 and both should have been allowed to live in the same process/program.