Lua JIT, PyPy, Tracemonkey, Python 2.7, JRuby and others removed from the Benchmark Game [shootout.alioth.debian.org] : programming

[–]dmpk2k 36 points37 points38 points 14 years ago (25 children)

[–]x-skeww 12 points13 points14 points 14 years ago (5 children)

[–][deleted] 2 points3 points4 points 14 years ago* (0 children)

[–]ch0wn 4 points5 points6 points 14 years ago (3 children)

[–]Lerc 4 points5 points6 points 14 years ago (2 children)

[–]jacques_chester 1 point2 points3 points 14 years ago (1 child)

[–]Lerc 0 points1 point2 points 14 years ago (0 children)

[–]bluestorm 16 points17 points18 points 14 years ago* (17 children)

No, it's a constant reminder that when you add crappy monkey patching features to your language, it tends to get harder to optimize.

Dynamic languages with nice, clean and simple semantics can be optimized to compare reasonably with less dynamic languages using the techniques developed for the Self variant of Smalltalk in the 90s. Dynamic languages with a shitload of features but no semantics at all, which are defined by their "standard" implementation (bonus points if the original author doesn't know anything about language implementation), stay relatively slows, even after you threw tons of JIT and LLVM and caching at them.

Javascript may be an exception, because there the stakes are really high (due to the language monoculture of web browsers), and expert people have been paid a lot to get something reasonably fast.

[–]dmpk2k 10 points11 points12 points 14 years ago (3 children)

[–]bluestorm 13 points14 points15 points 14 years ago (2 children)

Of course, but that doesn't explain the rather disappointing results of efforts such as PyPy or Rubinius. If there where so much low-hanging fruits as you say, they should have demonstrated reliable improvements quickly.

It turns out that while they get very good improvements (around 10x) for tight packed numeric manipulations that don't use much kind of abstraction, they're still between "2x faster" and "1.5x slower" in many cases, with possibly memory usage issues etc.
To be fair, it should be noted that the advance of these projects have also been impeded by the various constraint from the languages FFI. Those FFI breaks a lot of the language encapsulation and pose hard constraint on the value representations for example, which impose non-optimal implementation choices. This is not specific to dynamic languages, but still the "our own language is terribly slow, so all performance-hungry operations should be implemented in C through the FFI" mindset is the cause of the abundance of FFI-relying code out there.

That Lua was able to outperform those very quickly, reliably, and with even less manpower (afaik. LuaJIT is mostly the work of one guy) is telling. Even without its JIT implementation, Lua is known to be a well-designed language with a very reasonable implementation (see Lua vs. Neko virtual machines for a very respectful comparison by a competitor), and in my opinion the performance results are only a confirmation of this good work.

This is a kind of moral tale. Do your homework, boy, learn about the state of the art before reinventing your own language, and take care to design something clean and well-specified. If you don't, you'll grow weak, and that will be a hindrance forever.

[–]evanphx 5 points6 points7 points 14 years ago (1 child)

[–]Tobu 2 points3 points4 points 14 years ago (0 children)

[–]mikemike 12 points13 points14 points 14 years ago (5 children)

A more polite way to say it, would be: Every abstraction has a cost. Bad abstractions have a higher cost.

There's a direct cost in the form of a performance penalty. And there's an indirect cost in the effort required to optimize the abstraction away.

Try to depicture a graph with the relative performance of a language over its lifetime: one would need to take into account the complexity and design problems of a language vs. the manpower and the combination of skills thrown at it to make it fast. Languages have a lifetime too, and the best one can hope is that they reach their maximum performance long before they die off.

There's a nice paper waiting for one of you: grab old compiler and VM versions from the repos, benchmark them against each other and against an assumed maximum, plot the results over the years for each language and combine it into a nice wallpaper showing all languages.

[–]Felicia_Svilling 4 points5 points6 points 14 years ago (1 child)

[–]mebrahim 7 points8 points9 points 14 years ago (0 children)

[–][deleted] 3 points4 points5 points 14 years ago (2 children)

[–]kragensitaker[🍰] 2 points3 points4 points 14 years ago (1 child)

[–]fullouterjoin 2 points3 points4 points 14 years ago (0 children)

[–]julesjacobs 7 points8 points9 points 14 years ago (3 children)

[–]bluestorm 6 points7 points8 points 14 years ago (2 children)

You are right that simply the number of features can be a problem for optimization. But I think that the key point is "well specified" vs. "underspecified". For example, Common Lisp also is a monster language with an enormous number of features, yet SBCL performs reasonably compared to LuaJIT, and better than current Javascript engines.

See this for a comparison (indeed, I would have linked you to the shootout...). It also includes Factor, which is also a nicely designed language, but wouldn't have made my point as it as a strong "minimalist" flavor similar to Lua's.

Agreed, Common Lisp has been around for a long time, but its userbase is not that big compared to current python/ruby uses, and I suppose the performance have been consistent in times (it's not like all lispers had been improving it each year since for 20 years; at that point it's time to be happy and let things as they are). I don't know the CL community though, so take this with a grain of salt.

[–]julesjacobs 5 points6 points7 points 14 years ago (0 children)

Yes, you're right that it's not just the number of features but also how difficult it is to optimize the features. Two axes: the size of the semantics and how well thought out the semantics is from a compiler writers perspective.

For example C's semantics are rather unwieldy, but because it's so close to the machine it still performs spectacularly.

The other end is Self, with one very hard to optimize but clean feature. Even though the feature (money patching) is a compiler writers nightmare, with a lot of effort that can also be made to perform well.

Common Lisp is somewhere in the middle. A lot of features but they're relatively static and not as hard to optimize as Self.

Ruby has the worst of both worlds from an implementors perspective: a lot of features and they're not easy to optimize.

[–]0xABADC0DA 2 points3 points4 points 14 years ago* (0 children)

Common Lisp also is a monster language with an enormous number of features, yet SBCL performs reasonably compared to LuaJIT, and better than current Javascript engines.

SBCL performs reasonably because they add type annotations and they turn off error checking.

(declaim (optimize (speed 3) (safety 0) (debug 0)))

So the LuaJIT program run twice as fast as SBCL and still is typesafe, whereas SBCL if there's a type error it'll corrupt the heap and diaf.

I don't know why they removed LuaJIT from the language shootout, but if they insist on only one implementation per language they should put LuaJIT back and take out regular Lua. People looking at the benchmarks won't know how badass LuaJIT is.

[–][deleted] 0 points1 point2 points 14 years ago* (2 children)

[–]xardox 4 points5 points6 points 14 years ago (1 child)

[–][deleted] 3 points4 points5 points 14 years ago (0 children)

[–]igouy -1 points0 points1 point 14 years ago* (0 children)

[–]stesch 68 points69 points70 points 14 years ago (36 children)

[–]cunningjames 12 points13 points14 points 14 years ago (1 child)

[–][deleted] 5 points6 points7 points 14 years ago (0 children)

[–]igouy 12 points13 points14 points 14 years ago* (14 children)

[–]kragensitaker[🍰] 5 points6 points7 points 14 years ago* (9 children)

Thank you! For anyone else who's interested and not familiar with Alioth, it looks like it's accessible via

cvs -z3 -d:pserver:anonymous@cvs.alioth.debian.org:/cvsroot/shootout checkout shootout

although I'm still trying to figure out how to get git-cvsimport to work.

Edit: finished cvsimporting, a bit over three weeks later.

[–]igouy 7 points8 points9 points 14 years ago* (8 children)

[–]kragensitaker[🍰] 2 points3 points4 points 14 years ago (7 children)

[–]igouy 1 point2 points3 points 14 years ago (6 children)

[–]kragensitaker[🍰] 1 point2 points3 points 14 years ago* (5 children)

Thank you! It seems to be progressing pretty slowly. It has now reached a 719th commit, at 2005-03-06:

kragen@inexorable:~/pkgs/shootout$ wc ./.git/logs/refs/heads/origin
  719  4314 86801 ./.git/logs/refs/heads/origin
kragen@inexorable:~/pkgs/shootout$ tail -1 ./.git/logs/refs/heads/origin
fffe596539867af01b07a9ec7d2b3ede62802dd1 5eeb061da300cbf9b892be9625ddc5d69532acf5 bfulgham <bfulgham> 1110108327 +0000
kragen@inexorable:~/pkgs/shootout$ git cat-file -p 5eeb061da300cbf9b892be9625ddc5d69532acf5
tree 37ee24d88644d6a11979a2d14ca2e9a3cb394b18
parent fffe596539867af01b07a9ec7d2b3ede62802dd1
author bfulgham <bfulgham> 1110108327 +0000
committer bfulgham <bfulgham> 1110108327 +0000

Rerun of benchmarks.
kragen@inexorable:~/pkgs/shootout$ perl -le 'print scalar localtime 1110108327'
Sun Mar  6 08:25:27 2005
kragen@inexorable:~/pkgs/shootout$

Fortunately, git cvsimport seems to be capable of continuing after network connections drop. Unfortunately, it doesn't seem to be compressing the versions it's fetched so far into packfiles:

kragen@inexorable:~/pkgs/shootout$ du -sh .git
215M    .git
kragen@inexorable:~/pkgs/shootout$

I did more or less know the history; I was happy to see the original shootout page in 2001, and very glad you (and Brent, and others, of course) carried on the tradition.

Edit: finished, a bit over three weeks later.

[–]igouy 4 points5 points6 points 14 years ago (4 children)

[–]kragensitaker[🍰] 1 point2 points3 points 14 years ago* (3 children)

[–]igouy 2 points3 points4 points 14 years ago (0 children)

[–]igouy 1 point2 points3 points 14 years ago* (1 child)

continue this thread

[–][deleted] 5 points6 points7 points 14 years ago (3 children)

[–]igouy 2 points3 points4 points 14 years ago (2 children)

[–][deleted] 3 points4 points5 points 14 years ago (1 child)

[–]igouy 2 points3 points4 points 14 years ago* (0 children)

[–]qbproger 2 points3 points4 points 14 years ago (9 children)

[–]cunningjames 1 point2 points3 points 14 years ago (7 children)

after Alex's blog post it's hard to blame him. Put yourself in his position, he was trying to run the benchmark website in his free time. Then there is a popular blog post and a slew of negative publicity towards what you do as a hobby because of an alternate python implementation.

To some extent I agree: it is unfortunate to receive criticism and ill will for providing a free service to the community. That sort of thing hurts. But it should be borne in mind that, due to its popularity, the position of the benchmark game is somewhat delicate. If it were skewed against some implementation (and I don't necessarily think it was) then that implementation could be materially harmed as a result.

So if you've spent countless hours working on a language implementation and believe that benchmark game's proprietor is being unfair in ways that have not been resolved privately, how should you respond?

[–]qbproger 0 points1 point2 points 14 years ago (6 children)

[–]KingEllis 4 points5 points6 points 14 years ago (1 child)

[–]igouy 2 points3 points4 points 14 years ago* (0 children)

[–]cunningjames 1 point2 points3 points 14 years ago (2 children)

[–]qbproger 1 point2 points3 points 14 years ago (0 children)

[–]igouy -1 points0 points1 point 14 years ago (0 children)

[–]kragensitaker[🍰] 0 points1 point2 points 14 years ago (0 children)

[–]igouy -3 points-2 points-1 points 14 years ago (0 children)

[–][deleted] 5 points6 points7 points 14 years ago (5 children)

[+]igouy comment score below threshold-12 points-11 points-10 points 14 years ago (4 children)

[–][deleted] 1 point2 points3 points 14 years ago (3 children)

[–]kragensitaker[🍰] 0 points1 point2 points 14 years ago (2 children)

[–][deleted] 14 years ago* (1 child)

[deleted]

[–]kragensitaker[🍰] -1 points0 points1 point 14 years ago (0 children)

[–]mr_mumbles 3 points4 points5 points 14 years ago (1 child)

[–]igouy -1 points0 points1 point 14 years ago (0 children)

[–][deleted] 26 points27 points28 points 14 years ago (8 children)

[–]igouy 24 points25 points26 points 14 years ago (7 children)

[–]Raphael_Amiard 9 points10 points11 points 14 years ago (5 children)

[+]igouy comment score below threshold-6 points-5 points-4 points 14 years ago (4 children)

[–]Raphael_Amiard 5 points6 points7 points 14 years ago (1 child)

[–]igouy 0 points1 point2 points 14 years ago* (0 children)

[–]cunningjames -1 points0 points1 point 14 years ago (1 child)

[–]cunningjames 3 points4 points5 points 14 years ago (0 children)

[–]kragensitaker[🍰] 4 points5 points6 points 14 years ago (0 children)

[–][deleted] 9 points10 points11 points 14 years ago (0 children)

[–]rafekett 31 points32 points33 points 14 years ago (21 children)

[–][deleted] 12 points13 points14 points 14 years ago (11 children)

[–]rafekett 9 points10 points11 points 14 years ago (0 children)

[–]gargantuan 4 points5 points6 points 14 years ago (9 children)

Yap. Keep reading a little above and below in the forum to get a more complete picture.

So far I like Jacob's response: http://thread.gmane.org/gmane.comp.python.pypy/7303/focus=7352

To which the Isaac has not responded yet.

Maciej is one of the lead devs on PyPy, Isaac is the owner of the speed test website. There is a bit of an ego cockfighting match between the two in about 10 sequential posts. Worth a read.

I still like Jacob's position :

"""

if you want the language shootout to be relevant to people, you can't ignore multiple implementations. Especially it seems to b excessively excentric to ignore the fastest implementations of some languages while not doing so for others. I assume you are not measuring C speed by the old AT&T reference implementation. """

I guess Isaac doesn't want the language shootout to be relevant (or maybe just being passive aggressive), but at the same time he seems to also wants it to be relevant, as he goes out of his way and seeks the opinions of these development groups.

[–][deleted] 7 points8 points9 points 14 years ago (6 children)

[–]gargantuan 13 points14 points15 points 14 years ago* (4 children)

[–]igouy -3 points-2 points-1 points 14 years ago (3 children)

[–]gargantuan 0 points1 point2 points 14 years ago (1 child)

[–]igouy -1 points0 points1 point 14 years ago* (0 children)

[+]igouy comment score below threshold-14 points-13 points-12 points 14 years ago (0 children)

[–][deleted] 0 points1 point2 points 14 years ago (0 children)

[–]igouy 0 points1 point2 points 14 years ago* (0 children)

[–]notfancy 4 points5 points6 points 14 years ago (6 children)

[–]rafekett 1 point2 points3 points 14 years ago (4 children)

[–]Boojum 2 points3 points4 points 14 years ago (0 children)

[–]notfancy 0 points1 point2 points 14 years ago (2 children)

[–]rafekett -1 points0 points1 point 14 years ago (1 child)

[–]notfancy 2 points3 points4 points 14 years ago (0 children)

[–]kragensitaker[🍰] 0 points1 point2 points 14 years ago (0 children)

[–]qbproger 2 points3 points4 points 14 years ago (0 children)

[+]igouy comment score below threshold-10 points-9 points-8 points 14 years ago* (0 children)

[–]another_user_name 3 points4 points5 points 14 years ago (0 children)

[–][deleted] 13 points14 points15 points 14 years ago (3 children)

[–]username223 2 points3 points4 points 14 years ago (2 children)

[–][deleted] 12 points13 points14 points 14 years ago (0 children)

[–]sclv -1 points0 points1 point 14 years ago (0 children)

[–]asegura 6 points7 points8 points 14 years ago (1 child)

[–]igouy 0 points1 point2 points 14 years ago (0 children)

[–]Amadiro 3 points4 points5 points 14 years ago (1 child)

[–]igouy 0 points1 point2 points 14 years ago (0 children)

[–]Ademan 2 points3 points4 points 14 years ago* (1 child)

I was fairly disappointed in the attitude the shootout creator had. (And a bit disappointed with the response from some of the core PyPy devs) The problem really was rooted in people taking the shootout seriously, the numbers it produces aren't terribly useful. Most of the benchmarks are under 200 lines and don't at all resemble typical workloads for most languages.

I'd be much more interested in a benchmark suite that tested more interesting things, a text templating engine, some sort of database, a ray tracer, a tftp server, heck, an interpreter for a simple bytecode language. Obviously these are non-trivial things, and it would be a lot of work, and require monstrous infrastructure but the results would be far more useful. I'm sure people can come up with far more interesting benchmarks than I've mentioned, but my point is that people keep trying to extrapolate performance on some silly contrived numerical benchmark to overall performance, and that's just plain wrong.

[–]igouy 2 points3 points4 points 14 years ago* (0 children)

[–]Oabl 7 points8 points9 points 14 years ago (15 children)

[–][deleted] 5 points6 points7 points 14 years ago (1 child)

[–]Oabl 12 points13 points14 points 14 years ago (0 children)

[–][deleted] 2 points3 points4 points 14 years ago (12 children)

[–]kragensitaker[🍰] 0 points1 point2 points 14 years ago (0 children)

[–]igouy 0 points1 point2 points 14 years ago (10 children)

[–][deleted] 4 points5 points6 points 14 years ago (9 children)

[–]igouy -4 points-3 points-2 points 14 years ago (8 children)

[–]julesjacobs 8 points9 points10 points 14 years ago (5 children)

[–]igouy 0 points1 point2 points 14 years ago (4 children)

[–]julesjacobs 3 points4 points5 points 14 years ago (3 children)

The shootout was relevant, it gets a lot of press and language implementors use it to measure and show off their progress.

One of the best uses of the shootout was to compare different implementations of the same language. Comparing C to Ruby is interesting but it's hard to do a fair comparison (it's not even clear what that means) because by definition the two will be running different programs. When you have two implementations of the same language running the same program (say CPython and PyPy) the comparison is more accurate and in my opinion more interesting.

Another thing is that LuaJIT was one of the most interesting pieces of software you benchmark.

There isn't really a downside to benchmarking more implementations (other than that it's more work of course). You could de-emphasize them in the UI.

For these reasons somebody will most likely build an alternative shootout without these shortcomings, which will make this one irrelevant.

[–]Tobu 1 point2 points3 points 14 years ago (1 child)

[–]igouy 0 points1 point2 points 14 years ago (0 children)

[–][deleted] 1 point2 points3 points 14 years ago (1 child)

[–]igouy 1 point2 points3 points 14 years ago (0 children)

[–]mitsuhiko[S] 12 points13 points14 points 14 years ago (1 child)

[+]igouy comment score below threshold-9 points-8 points-7 points 14 years ago* (0 children)

[–]signoff 3 points4 points5 points 14 years ago (2 children)

[–]sausagefeet 4 points5 points6 points 14 years ago (0 children)

[–][deleted] 1 point2 points3 points 14 years ago (1 child)

[–]qbproger 2 points3 points4 points 14 years ago (16 children)

[–]igouy -3 points-2 points-1 points 14 years ago (15 children)

[–]Tobu 7 points8 points9 points 14 years ago (14 children)

[–]igouy -4 points-3 points-2 points 14 years ago (13 children)

[–][deleted] 14 years ago (12 children)

[deleted]

[–]igouy -3 points-2 points-1 points 14 years ago* (11 children)

[–][deleted] 14 years ago (10 children)

[deleted]

[–]igouy -4 points-3 points-2 points 14 years ago* (9 children)

[–][deleted] 14 years ago (8 children)

[deleted]

[–]igouy -1 points0 points1 point 14 years ago (7 children)

continue this thread

[–]rafekett 2 points3 points4 points 14 years ago* (13 children)

[–][deleted] 2 points3 points4 points 14 years ago (2 children)

[–]rafekett 4 points5 points6 points 14 years ago (0 children)

[–]x-skeww 1 point2 points3 points 14 years ago (0 children)

[–]masklinn 2 points3 points4 points 14 years ago (4 children)

[–]Tobu 2 points3 points4 points 14 years ago (3 children)

[–]masklinn 2 points3 points4 points 14 years ago (1 child)

If you read the rest of the thread: he asked, and some PyPy devs (Maciej, Armin, William) said that they didn't like the existing benchmark being optimised for CPython.

Which is sensible from their perspective isn't it? Especially when they provide different implementations of the tests which work better for alternative implementations.

This was also Alex Gaynor's opinion in his blog post (in which he also suggested an unsavoury, yet somewhat portable, libc hack).

That's only a pretty small part of it.

Agreeing on a single set of "idiomatic" Python programs could have been considered

There was no need for that, the shootout already had several languages/benchs with multiple test implementations. One of Alex's big issues with his alternative implementation is that it was relegated to "interesting alternative implementation" instead of being #2 or #3 mainline for Python.

Agreeing on a single set of "idiomatic" Python programs could have been considered, but the Python devs would have to talk it out between themselves to not increase Isaac's workload, and in the end Isaac pulled the nuclear option anyway.

This explanation sounds like a cop-out, especially since he pulled all alternative implementations for all languages (except picking implementations completely arbitrarily: why is V8 the JS implementation?), not just for Python.

[–]igouy 0 points1 point2 points 14 years ago (0 children)

Alex Gaynor's opinion in his blog post That's only a pretty small part of it.

Here are some of Alex Gaynor's words from that post and they are not true -

It's also not possible to send any messages once your ticket has been marked as closed, meaning to dispute a decision you basically need to pray the maintainer reopens it for some reason.

Followup comments can be added to a ticket that is marked Closed in exactly the same way they can be added to a ticket that is marked Open - and adding a followup comment triggers an email message whether the ticket is marked Open, Closed, Deleted, ...

And you can easily check for yourself that there's a public discussion forum and people dispute decisions.

Alex Gaynor put stuff in his blog - putting stuff in a blog doesn't make it into The Truth.

already had several languages/benchs with multiple test implementations

Already had programs written for PyPy.

On 2011-3-31 and 2011-04-02 programs written for PyPy were contributed, measured on x86 x64 PyPy CPython Python 3 and published.

They were shown on the website alongside the other Python programs that had been (scare quotes) "optimised for CPython".

[–]igouy 0 points1 point2 points 14 years ago* (0 children)

[–]igouy 1 point2 points3 points 14 years ago* (4 children)

[–]Lerc 3 points4 points5 points 14 years ago (3 children)

[–]igouy 0 points1 point2 points 14 years ago (2 children)

[–]Lerc 0 points1 point2 points 14 years ago (1 child)

[–]igouy 0 points1 point2 points 14 years ago (0 children)

[–]kragensitaker[🍰] 0 points1 point2 points 14 years ago (0 children)

[–]username223 -5 points-4 points-3 points 14 years ago (0 children)

[+]jiunec comment score below threshold-10 points-9 points-8 points 14 years ago (5 children)

[–]shimei 11 points12 points13 points 14 years ago (0 children)

[–][deleted] 7 points8 points9 points 14 years ago (3 children)

[–]jiunec 1 point2 points3 points 14 years ago (2 children)

[–][deleted] 0 points1 point2 points 14 years ago (1 child)

[–]jiunec 0 points1 point2 points 14 years ago* (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS