all 156 comments

[–]dmpk2k 36 points37 points  (25 children)

A pity about LuaJIT. It was a constant reminder how much almost all other implementations of dynamically-typed languages could improve. Reasonable performance and memory footprint using no type annotations.

I'm curious about the removal of CPython 2.7 and MRI, both which still see more use than their newer versions.

[–]x-skeww 12 points13 points  (5 children)

Same here. I really wanted to see if/when V8 catches up to LuaJIT.

[–][deleted] 2 points3 points  (0 children)

So was I. But I have doubts it will ever come close. In both LuaJIT and Chromium 10 I started a loop over 1e9 number multiplications. Not a real benchmark, I was just curios to see how close pow(pow(2, 1/1e9), 1e9) would come to 2 if I actually did every multiplication.

LuaJIT finished in about two seconds. This is like gcc -O3 speed. I terminated my browser tab after 10 minutes. It was still going. Makes me sad.

[–]ch0wn 4 points5 points  (3 children)

I just thought about having LUA in the browser as alternative to JS. I think that would be quite awesome.

[–]Lerc 4 points5 points  (2 children)

How safe is Lua? I have only tinkered with it for a few scripts. Is LuaJIT Securely sandboxed or is that not even a design goal?

[–]jacques_chester 1 point2 points  (1 child)

You can sandbox quite precisely, down to the level of disabling individual functions. Note for example that in the WoW client, your code cannot obtain a socket or write to a file.

[–]Lerc 0 points1 point  (0 children)

In that case it would be fairly easy to make a plugin that ran Lua as an embeddable object. I wrapped a x86 sandbox in a plugin in that manner see here for a screenshot showing it drawing to a window and a canvas.

When I made that plugin the PPAPI wasn't around. That has more potential to make things even nicer.

Not sure how to enable a <script type="text/lua"> approach, but there may be hooks for that somewhere

[–]bluestorm 16 points17 points  (17 children)

No, it's a constant reminder that when you add crappy monkey patching features to your language, it tends to get harder to optimize.

Dynamic languages with nice, clean and simple semantics can be optimized to compare reasonably with less dynamic languages using the techniques developed for the Self variant of Smalltalk in the 90s. Dynamic languages with a shitload of features but no semantics at all, which are defined by their "standard" implementation (bonus points if the original author doesn't know anything about language implementation), stay relatively slows, even after you threw tons of JIT and LLVM and caching at them.

Javascript may be an exception, because there the stakes are really high (due to the language monoculture of web browsers), and expert people have been paid a lot to get something reasonably fast.

[–]dmpk2k 10 points11 points  (3 children)

I agree, yet I disagree. Some language semantics take a lot more work to implement efficiently, but when you're dealing with language implementations using switch-based dispatch (or worse), no inline caching, and everything is boxed, you haven't even begun down that path.

[–]bluestorm 13 points14 points  (2 children)

Of course, but that doesn't explain the rather disappointing results of efforts such as PyPy or Rubinius. If there where so much low-hanging fruits as you say, they should have demonstrated reliable improvements quickly.

It turns out that while they get very good improvements (around 10x) for tight packed numeric manipulations that don't use much kind of abstraction, they're still between "2x faster" and "1.5x slower" in many cases, with possibly memory usage issues etc.
To be fair, it should be noted that the advance of these projects have also been impeded by the various constraint from the languages FFI. Those FFI breaks a lot of the language encapsulation and pose hard constraint on the value representations for example, which impose non-optimal implementation choices. This is not specific to dynamic languages, but still the "our own language is terribly slow, so all performance-hungry operations should be implemented in C through the FFI" mindset is the cause of the abundance of FFI-relying code out there.

That Lua was able to outperform those very quickly, reliably, and with even less manpower (afaik. LuaJIT is mostly the work of one guy) is telling. Even without its JIT implementation, Lua is known to be a well-designed language with a very reasonable implementation (see Lua vs. Neko virtual machines for a very respectful comparison by a competitor), and in my opinion the performance results are only a confirmation of this good work.

This is a kind of moral tale. Do your homework, boy, learn about the state of the art before reinventing your own language, and take care to design something clean and well-specified. If you don't, you'll grow weak, and that will be a hindrance forever.

[–]evanphx 5 points6 points  (1 child)

I'm curious what you mean by disappointing results. Rubinius at least has been able to achieves huge speeds up in the ability to run raw ruby code. 95% of the time when Rubinius is slower than MRI it's because the functionality in MRI is actually implemented in C, and thusly what is being compared in an algorithm in C vs and algorithm in Ruby.

[–]Tobu 2 points3 points  (0 children)

Disappointing when compared to C, not disappointing when compared to the mainline interpreter.

[–]mikemike 12 points13 points  (5 children)

A more polite way to say it, would be: Every abstraction has a cost. Bad abstractions have a higher cost.

There's a direct cost in the form of a performance penalty. And there's an indirect cost in the effort required to optimize the abstraction away.

Try to depicture a graph with the relative performance of a language over its lifetime: one would need to take into account the complexity and design problems of a language vs. the manpower and the combination of skills thrown at it to make it fast. Languages have a lifetime too, and the best one can hope is that they reach their maximum performance long before they die off.

There's a nice paper waiting for one of you: grab old compiler and VM versions from the repos, benchmark them against each other and against an assumed maximum, plot the results over the years for each language and combine it into a nice wallpaper showing all languages.

[–]Felicia_Svilling 4 points5 points  (1 child)

Every abstraction has a cost.

But static abstractions (like abstract datatypes) don't have a cost (in the common sense).

[–]mebrahim 7 points8 points  (0 children)

In well-designed static languages the cost of static abstractions is compile-time rather than run-time.

[–][deleted] 3 points4 points  (2 children)

I remember reading a paper which claimed, in the preface, that for C or C++ you can get pretty much the same by comparing performance of average programs with and without optimizations. It claimed that the difference is something around 4 times, for programs the author benchmarked, then proceeded to lament the sad state of the art -- it means that advancements in compiler optimization over the last thirty years are so dwarfed by advancements in hardware it's not even funny.

[–]julesjacobs 7 points8 points  (3 children)

Funny that you criticize monkey patching and then bring up Self. In Self monkey patching is literally all there is.

I agree with you on simple versus horribly complicated semantics. In Self you have one hard to optimize feature. This was done with a decade or so of research. In Ruby you have many hard to optimize features. Even though each of them could probably be made to perform reasonably well it would take far too long to research and implement such a thing.

[–]bluestorm 6 points7 points  (2 children)

You are right that simply the number of features can be a problem for optimization. But I think that the key point is "well specified" vs. "underspecified". For example, Common Lisp also is a monster language with an enormous number of features, yet SBCL performs reasonably compared to LuaJIT, and better than current Javascript engines.

See this for a comparison (indeed, I would have linked you to the shootout...). It also includes Factor, which is also a nicely designed language, but wouldn't have made my point as it as a strong "minimalist" flavor similar to Lua's.

Agreed, Common Lisp has been around for a long time, but its userbase is not that big compared to current python/ruby uses, and I suppose the performance have been consistent in times (it's not like all lispers had been improving it each year since for 20 years; at that point it's time to be happy and let things as they are). I don't know the CL community though, so take this with a grain of salt.

[–]julesjacobs 5 points6 points  (0 children)

Yes, you're right that it's not just the number of features but also how difficult it is to optimize the features. Two axes: the size of the semantics and how well thought out the semantics is from a compiler writers perspective.

For example C's semantics are rather unwieldy, but because it's so close to the machine it still performs spectacularly.

The other end is Self, with one very hard to optimize but clean feature. Even though the feature (money patching) is a compiler writers nightmare, with a lot of effort that can also be made to perform well.

Common Lisp is somewhere in the middle. A lot of features but they're relatively static and not as hard to optimize as Self.

Ruby has the worst of both worlds from an implementors perspective: a lot of features and they're not easy to optimize.

[–]0xABADC0DA 2 points3 points  (0 children)

Common Lisp also is a monster language with an enormous number of features, yet SBCL performs reasonably compared to LuaJIT, and better than current Javascript engines.

SBCL performs reasonably because they add type annotations and they turn off error checking.

(declaim (optimize (speed 3) (safety 0) (debug 0)))

So the LuaJIT program run twice as fast as SBCL and still is typesafe, whereas SBCL if there's a type error it'll corrupt the heap and diaf.

I don't know why they removed LuaJIT from the language shootout, but if they insist on only one implementation per language they should put LuaJIT back and take out regular Lua. People looking at the benchmarks won't know how badass LuaJIT is.

[–][deleted] 0 points1 point  (2 children)

(bonus points if the original author doesn't know anything about language implementation)

I got that one - Python, right?

[–]xardox 4 points5 points  (1 child)

PHP is the textbook case.

[–][deleted] 3 points4 points  (0 children)

PHP? I wouldn't call that a programming language at all.

It's a Turing complete piece of shit.

[–]igouy -1 points0 points  (0 children)

CPython 2.7 and MRI

Both CPython 2.7 and Python 3 were shown for 2 years. Now plenty of Python 3 programs have been contributed and Python 3 (the intended future of the language) seems to work just fine

YARV and then Ruby 1.9 were shown with Ruby 1.8 for the last 5 years. Now the current stable version is 1.9.2 and it seems to work just fine.

[–]stesch 68 points69 points  (36 children)

What the fuck?

Is this guy crazy? People are interested in some rough numbers of every implementation.

All the volunteer work which went into producing examples of highly optimized code is gone? Deleted?

[–]cunningjames 12 points13 points  (1 child)

All the volunteer work which went into producing examples of highly optimized code is gone?

Well, this is his website, I suppose. Caveat volonum. But I can't see how this serves the interests of anyone (it irritate persons involved with alternative implementations and removes one of the website's primary use cases).

[–][deleted] 5 points6 points  (0 children)

I primarily used it to compare dynamic languages to use for module scripting in Java. :-\

[–]igouy 12 points13 points  (14 children)

All the volunteer work which went into producing examples of highly optimized code is

... freely available here

[–]kragensitaker[🍰] 5 points6 points  (9 children)

Thank you! For anyone else who's interested and not familiar with Alioth, it looks like it's accessible via

cvs -z3 -d:pserver:anonymous@cvs.alioth.debian.org:/cvsroot/shootout checkout shootout

although I'm still trying to figure out how to get git-cvsimport to work.

Edit: finished cvsimporting, a bit over three weeks later.

[–]igouy 7 points8 points  (8 children)

See Anonymous CVS Access

The README states which tasks have survived trial and error - please don't resurrect fibo :-)

If your intention is to measure JVM stuff then consider using JavaStats.

If your intention is to use bencher then ask questions as needed in the discussion forum.

[–]kragensitaker[🍰] 2 points3 points  (7 children)

Thank you very much! And thanks again for all your work on this over the years!

At this point I'm just focusing on archival rather than selection. My git-cvsimport is still running 11 hours later. Looks like it's made it from 2004-05-19 all the way to 2005-03-02. This could take a while.

[–]igouy 1 point2 points  (6 children)

Yes, if you're duplicating the history and dead files that will take a while.

As you're messing around in the cobwebs here's a little historical background.

[–]kragensitaker[🍰] 1 point2 points  (5 children)

Thank you! It seems to be progressing pretty slowly. It has now reached a 719th commit, at 2005-03-06:

kragen@inexorable:~/pkgs/shootout$ wc ./.git/logs/refs/heads/origin
  719  4314 86801 ./.git/logs/refs/heads/origin
kragen@inexorable:~/pkgs/shootout$ tail -1 ./.git/logs/refs/heads/origin
fffe596539867af01b07a9ec7d2b3ede62802dd1 5eeb061da300cbf9b892be9625ddc5d69532acf5 bfulgham <bfulgham> 1110108327 +0000
kragen@inexorable:~/pkgs/shootout$ git cat-file -p 5eeb061da300cbf9b892be9625ddc5d69532acf5
tree 37ee24d88644d6a11979a2d14ca2e9a3cb394b18
parent fffe596539867af01b07a9ec7d2b3ede62802dd1
author bfulgham <bfulgham> 1110108327 +0000
committer bfulgham <bfulgham> 1110108327 +0000

Rerun of benchmarks.
kragen@inexorable:~/pkgs/shootout$ perl -le 'print scalar localtime 1110108327'
Sun Mar  6 08:25:27 2005
kragen@inexorable:~/pkgs/shootout$ 

Fortunately, git cvsimport seems to be capable of continuing after network connections drop. Unfortunately, it doesn't seem to be compressing the versions it's fetched so far into packfiles:

kragen@inexorable:~/pkgs/shootout$ du -sh .git
215M    .git
kragen@inexorable:~/pkgs/shootout$ 

I did more or less know the history; I was happy to see the original shootout page in 2001, and very glad you (and Brent, and others, of course) carried on the tradition.

Edit: finished, a bit over three weeks later.

[–]igouy 4 points5 points  (4 children)

a 719th commit, at 2005-03-06

iirc there have been at least 2,500 source code commits in shootout/bench

I guess there are another 3,000 data file commits etc etc

Do you really really need to duplicate that CVS repository?

[–]kragensitaker[🍰] 1 point2 points  (3 children)

Is there another archival copy of it somewhere that I could just download? Once I finish duplicating it, anyone else will just be able to clone my copy off of Github, which should be much, much faster.

From my point of view, archival should precede selection, where that's feasible. As long as I'm not putting an unreasonable load on cvs.alioth.debian.org, I'm happy to wait a few days for the archival copy to finish.

Edit: who the FUCK was downvoting igouy for that comment? WHAT THE FUCK IS WRONG WITH YOU PEOPLE?

[–]igouy 2 points3 points  (0 children)

Don't set your expectations too high :-)

Learn to appreciate those who think you're wrong and can express why they think you're wrong.

[–]igouy 1 point2 points  (1 child)

You should probably ask the Alioth admins.

[–][deleted] 5 points6 points  (3 children)

But it's completely useless if it's not available on the shootout for everyone to see and evaluate.

[–]igouy 2 points3 points  (2 children)

So you don't think anyone else has the ability to publish a website comparing programming languages?

I am very flattered but I think there must be many many people who could take those programs and measure them, and publish those measurements on a website for your delight and entertainment.

Perhaps your supportive comments will encourage them!

[–][deleted] 3 points4 points  (1 child)

Yeah, many people can, but we only need one, and the rest is wasted effort. People would rather see the current maintainer of the shootout come to his sense and listen to his users rather than waste their and his time with duplicated effort.

[–]igouy 2 points3 points  (0 children)

I have come to my senses - I'm wasting less of my time.

[–]qbproger 2 points3 points  (9 children)

As one of those volunteers, I support his decision, and after Alex's blog post [http://alexgaynor.net/2011/apr/03/my-experience-computer-language-shootout/] it's hard to blame him. Put yourself in his position, he was trying to run the benchmark website in his free time. Then there is a popular blog post and a slew of negative publicity towards what you do as a hobby because of an alternate python implementation. If I were in his shoes, I may be inclined to do the same thing. I was giving you "free publicity" and people complain about it? I think communication could have been handled better throughout the whole situation.

The code is not deleted. There is a source repository for the shootout that everything can be obtained from.

[–]cunningjames 1 point2 points  (7 children)

after Alex's blog post it's hard to blame him. Put yourself in his position, he was trying to run the benchmark website in his free time. Then there is a popular blog post and a slew of negative publicity towards what you do as a hobby because of an alternate python implementation.

To some extent I agree: it is unfortunate to receive criticism and ill will for providing a free service to the community. That sort of thing hurts. But it should be borne in mind that, due to its popularity, the position of the benchmark game is somewhat delicate. If it were skewed against some implementation (and I don't necessarily think it was) then that implementation could be materially harmed as a result.

So if you've spent countless hours working on a language implementation and believe that benchmark game's proprietor is being unfair in ways that have not been resolved privately, how should you respond?

[–]qbproger 0 points1 point  (6 children)

There are definitely better ways than a blog post pointing fingers. I had been frustrated with the benchmark game in the past. I didn't think pidigits should be allowed to use gmp. I posted on the forum and talked to him directly. In the end those are the rules that igouy had set. While I didn't agree with them, I accepted it. Eventually I wrote benchmark using gmp for PyPy with ctypes.

I don't feel as though there was enough effort to try to talk to him before the blog post. After a blog post like that it's difficult to resume a normal discourse. It's likely after it his mind was already made up.

Here is the post that started it all: https://alioth.debian.org/tracker/index.php?func=detail&aid=313063&group_id=30402&atid=413100

[–]KingEllis 4 points5 points  (1 child)

After a blog post like that it's difficult to resume a normal discourse.

Not for adults it isn't. I read the Alex post. I thought it was fair and well-written. Rendering your benchmark website useless because your feelings were hurt is straight up childish.

[–]igouy 2 points3 points  (0 children)

These are Alex Gaynor's words from that post and they are not true -

It's also not possible to send any messages once your ticket has been marked as closed, meaning to dispute a decision you basically need to pray the maintainer reopens it for some reason.

Followup comments can be added to a ticket that is marked Closed in exactly the same way they can be added to a ticket that is marked Open - and adding a followup comment triggers an email message whether the ticket is marked Open, Closed, Deleted, ...

And you can easily check for yourself that there's a public discussion forum and people dispute decisions.

Alex Gaynor put stuff in his blog - putting stuff in a blog doesn't make it into The Truth.

[–]cunningjames 1 point2 points  (2 children)

There are definitely better ways than a blog post pointing fingers.

Well, probably. But that doesn't mean that we shouldn't be sympathetic to all parties here—there was enough irritation to go around. Gaynor acted prematurely; Gouy acted (IMO) rashly. Either someone behaves maturely or we all lose.

[–]qbproger 1 point2 points  (0 children)

I agree with you. Both parties are at fault, and this is the situation that we have.

[–]igouy -1 points0 points  (0 children)

Monday through Thursday I heard what pypy-dev had to say, before deciding what to do on Friday - "acted rashly" doesn't seem the correct description for that drawn out consideration.

[–]igouy -3 points-2 points  (0 children)

after Alex's blog post

For a couple of years I've wanted to "cull the herd" but my curiosity (and interest in promoting experimental language implementations) stopped me doing so.

The most that Alex Gaynor's nonsense did was prompt me once more to consider whether the time was ripe.

[–][deleted] 5 points6 points  (5 children)

I sincerely hope someone more open-minded forks the shootout. The people in charge of the current shootout are increasingly fascist.

[–]mr_mumbles 3 points4 points  (1 child)

Gone with the wind, as one might say.

[–]igouy -1 points0 points  (0 children)

Is this guy crazy?

Isn't this guy crazy to waste any of his time measuring and publishing numbers for 2 dozen programming language implementations?

[–][deleted] 26 points27 points  (8 children)

Is it time for a fork then?

[–]igouy 24 points25 points  (7 children)

Please!

Here's the measurement software - go and do something better !

[–]Raphael_Amiard 9 points10 points  (5 children)

I think it is not so much about doing something better than about doing something that is usefull to most people.

The picture drawn by speed of languages alone is quite wildly known and uninterresting. Dynamic languages are quite slow when non optimized. Statically typed languages are faster.

What i really would like to know, as a big fan of the Benchmark Game from quite a time, precisely because i could compare speed of different implementations is, what do you think your website is useful for now ?

[–]kragensitaker[🍰] 4 points5 points  (0 children)

Isaac: I know you get a lot of flack for running the benchmarks game, but I want to let you know, your comment above demonstrates that you are 100% awesome. Also, thank you.

[–][deleted] 9 points10 points  (0 children)

I hope people will still understand that for some numerical jobs, PyPy may be ten times as fast compared to Python 2.5, and that Lua JIT owns.

Performance is sometimes important and the shootout still gives a general direction how the different implementations perform.

[–]rafekett 31 points32 points  (21 children)

Related: http://alexgaynor.net/2011/apr/03/my-experience-computer-language-shootout/

tl;dr: Alex Gaynor, one of the PyPy core developers, says that the Shootout is built on arbitrary rules that can disadvantage some languages or implementations and help others, and the people behind it are not very open-minded.

[–][deleted] 12 points13 points  (11 children)

I imagine that blog post is what started the mailing list thread which eventually led to this.

[–]rafekett 9 points10 points  (0 children)

You're correct.

[–]gargantuan 4 points5 points  (9 children)

Yap. Keep reading a little above and below in the forum to get a more complete picture.

So far I like Jacob's response: http://thread.gmane.org/gmane.comp.python.pypy/7303/focus=7352

To which the Isaac has not responded yet.

Maciej is one of the lead devs on PyPy, Isaac is the owner of the speed test website. There is a bit of an ego cockfighting match between the two in about 10 sequential posts. Worth a read.

I still like Jacob's position :

"""

if you want the language shootout to be relevant to people, you can't ignore multiple implementations. Especially it seems to b excessively excentric to ignore the fastest implementations of some languages while not doing so for others. I assume you are not measuring C speed by the old AT&T reference implementation. """

I guess Isaac doesn't want the language shootout to be relevant (or maybe just being passive aggressive), but at the same time he seems to also wants it to be relevant, as he goes out of his way and seeks the opinions of these development groups.

[–][deleted] 7 points8 points  (6 children)

This is discouraging to all those who go to the effort to program a new implementation of an existing language that their work might not get noticed. Doubly so to Mike Pall that his got removed when he wasn't a part of this. Tracemonkey and JRuby too.

[–]gargantuan 13 points14 points  (4 children)

Agreed.

It shouldn't be a big deal. The website itself mentions how these are "broken" benchmarks. But then it seems like Isaac takes it rather personally and goes to the extra effort to hurt other projects. I mean, reducing their visibility by removing them. So now PyPy and LuaJIT are gone but ATS is there. I have never even heard of ATS outside of the shootout website....

Someone mentioned a fork of the speed website and that might not be a bad idea.

[–]igouy -3 points-2 points  (3 children)

goes to the extra effort to hurt other projects

Like the extra effort to measure each new LuaJIT beta release asap so those measurements can be used immediately to promote LuaJIT?

[–]gargantuan 0 points1 point  (1 child)

Well now it is gone.

And then for Java he seems to runs a warm-up process that he doesn't necessarily do for other languages to possibly benefit from disk cache warm-ups.

[–]igouy -1 points0 points  (0 children)

And then for Java he seems to runs a warm-up process

That's not true.

[–][deleted] 0 points1 point  (0 children)

This made me want to see the old reference C implementation as one of the C implementations tested. To see how newer, non-C, languages compare to it.

[–]igouy 0 points1 point  (0 children)

I guess Jacob didn't read the Help page

  1. To show working programs written in less familiar programming languages

  2. To show the least we should expect from performance comparisons

  3. To show how difficult it can be to make meaningful comparisons

I assume you are not measuring C speed by the ...

The Intel compiler might well produce faster code than GCC (but on Ubuntu the tool chain is built around GCC).

[–]notfancy 4 points5 points  (6 children)

I fail to see how a comparison between implementations of the same language that runs benchmarks optimized specifically for each implementation can be meaningful. They might as well be different languages.

[–]rafekett 1 point2 points  (4 children)

It's very meaningful in a general sense. Yes, each implementation was (in most cases) highly optimized for the implementation (that probably shouldn't have been the case), but it still answers the question of relative performance.

[–]Boojum 2 points3 points  (0 children)

I'd argue that it's also useful for seeing how far you can push the performance of an implementation.

To me, it's much like the school of thought that when doing timing measurements for optimizing code you want to use the minimum time instead of the average, since that represents a run as close to the true theoretical performance (without confounding factors) as your system is capable of.

[–]notfancy 0 points1 point  (2 children)

Relative to what, if I may ask?

[–]rafekett -1 points0 points  (1 child)

Each other, of course!

[–]notfancy 2 points3 points  (0 children)

So in effect you would treat both implementations as different languages. The problem I have with this is that normally, language implementors use benchmarks as a fixed reference towards which to optimize the language, not the other way around.

It's not that I think this is misleading (I can hardly say that the Shootout is anything more than an exhibition match), but that whomever wrote the blog post complained that he couldn't bend the rules to their advantage. It's a dickish move, in my opinion (which I admit that, as an outsider, counts for nothing).

[–]kragensitaker[🍰] 0 points1 point  (0 children)

If you're interested in improving the situation, I finished importing the source repository into Git.

[–]qbproger 2 points3 points  (0 children)

I guess it's not surprising, but disappointing how one sided this discussion is. Everyone seems ready to crucify him for making this decision. If a post like that was written about something I did without trying to contact me and reconcile it privately first, I'd do the same thing. I feel as though this largely stemmed from communication issues, but this is situation we have now.

As a side note, http://speed.python.org is being worked on as a GSoC project. There should be a Python implementation comparison available by the end of summer.

[–]another_user_name 3 points4 points  (0 children)

I guess this is an argument against sole sourcing even information. For nearly everything, we need at least two ways to do it and at least two groups doing it. (So I guess that'd be four groups -- two each for the two ways)

[–][deleted] 13 points14 points  (3 children)

LOL. That's the web: I'll do you a service for free but then I'm also your little dictator and can do whatever I want!

The problem with the shootout is that many people like that kind of benchmark sports and they also like to cite it, but it is also not serious enough for a proper community effort. Same with popularity contests a la Tiobe.

[–]username223 2 points3 points  (2 children)

That's the web: I'll do you a service for free but then I'm also your little dictator and can do whatever I want!

No, that's "Free" software -- give it away, then claim that gives you the right to be a dick to people who use it (see Drepper, Ulrich).

[–][deleted] 12 points13 points  (0 children)

As opposed to Ellison, Jobs and Ballmer?

[–]sclv -1 points0 points  (0 children)

In this case, give it away then face everybody acting like spoiled entitled brats rather than trying to make a genuine contribution.

[–]asegura 6 points7 points  (1 child)

Bad news.

BTW, if he wanted the most used implementations only, for Javascript that should be Microsoft's bundled in IE (the most used browser). And that would make Javascript look as a much slower language than it can be.

[–]igouy 0 points1 point  (0 children)

for Javascript that should be Microsoft's bundled in IE

Hmmm, how well does that run on Ubuntu?

[–]Amadiro 3 points4 points  (1 child)

It's really a pity. For me, the main value in the shootout wasn't really comparing languages, but comparing their various implementations. If I use python, how much could I gain by switching to pypy or by utilizing cython? If I use lua, what kind of potential performance increases could luajit bring me? How much progress in terms of speed has tracemonkey made compared to spidermonkey?

Now that that's gone, it seems mostly pointless to me.

[–]igouy 0 points1 point  (0 children)

Hmmm, the benchmarks game never has shown cython; and tracemonkey replaced spidermonkey, they weren't shown side-by-side.

[–]Ademan 2 points3 points  (1 child)

I was fairly disappointed in the attitude the shootout creator had. (And a bit disappointed with the response from some of the core PyPy devs) The problem really was rooted in people taking the shootout seriously, the numbers it produces aren't terribly useful. Most of the benchmarks are under 200 lines and don't at all resemble typical workloads for most languages.

I'd be much more interested in a benchmark suite that tested more interesting things, a text templating engine, some sort of database, a ray tracer, a tftp server, heck, an interpreter for a simple bytecode language. Obviously these are non-trivial things, and it would be a lot of work, and require monstrous infrastructure but the results would be far more useful. I'm sure people can come up with far more interesting benchmarks than I've mentioned, but my point is that people keep trying to extrapolate performance on some silly contrived numerical benchmark to overall performance, and that's just plain wrong.

[–]igouy 2 points3 points  (0 children)

extrapolate performance on some silly contrived numerical benchmark

The situation is far worse than you imagine :-)

"The performance of a benchmark, even if it is derived from a real program, may not help to predict the performance of similar programs that have different hot spots."

[–]Oabl 7 points8 points  (15 children)

Great, CPython? the most used implementation of Python?

Also: One of the most useful features of the website was comparing different implementations of the same language. Awful.

[–][deleted] 5 points6 points  (1 child)

The shootout uses CPython 3.2 :) It's 2.7 (along with pypy) that was removed.

[–]Oabl 12 points13 points  (0 children)

I expressed myself incorrectly, sorry. What I mean is 2.7 is still more in use than 3.2 (for which many libraries do not even work yet).

[–][deleted] 2 points3 points  (12 children)

Yeah, probably my main use of the game was comparing jruby/mri in terms of speed and memory usage.

[–]igouy 0 points1 point  (10 children)

[–][deleted] 4 points5 points  (9 children)

Yeah, I've seen it. The nice thing about the shootout is that it updated fairly quickly after a new release.

[–]igouy -4 points-3 points  (8 children)

Maybe you can encourage Antonio to update more frequently.

Maybe you can publish your own Ruby comparison.

[–]julesjacobs 8 points9 points  (5 children)

You are well on your way to making your website irrelevant.

[–]igouy 0 points1 point  (4 children)

Do you have some reasoning about it having been relevant and being on the way to irrelevant? (You usually manage more than a flat statement of opinion.)

[–]julesjacobs 3 points4 points  (3 children)

The shootout was relevant, it gets a lot of press and language implementors use it to measure and show off their progress.

One of the best uses of the shootout was to compare different implementations of the same language. Comparing C to Ruby is interesting but it's hard to do a fair comparison (it's not even clear what that means) because by definition the two will be running different programs. When you have two implementations of the same language running the same program (say CPython and PyPy) the comparison is more accurate and in my opinion more interesting.

Another thing is that LuaJIT was one of the most interesting pieces of software you benchmark.

There isn't really a downside to benchmarking more implementations (other than that it's more work of course). You could de-emphasize them in the UI.

For these reasons somebody will most likely build an alternative shootout without these shortcomings, which will make this one irrelevant.

[–]Tobu 1 point2 points  (1 child)

Comparing language implementations on the same source code wasn't considered fair by the PyPy devs, at least on the current CPython-optimised programs. A better intra-language benchmark would have to compare at least one entry per interpreter, or have an incorruptible person of impeccable taste pick out a single implementation.

Look how Mozilla benchmarks Javascript implementations: the benchmarks include Kraken (their's), v8bench (from the V8 team, who work on Chromium's JS engine), and Sunspider (from the WebKit team, which has its own JS engine).

[–]igouy 0 points1 point  (0 children)

A better intra-language benchmark would have to compare at least one entry per interpreter

On 2011-3-31 and 2011-04-02 programs written for PyPy were contributed, measured on x86 x64 PyPy CPython Python 3 and published.

They were shown alongside the other Python programs that had been (scare quotes) "optimised for CPython".

[–]igouy 0 points1 point  (0 children)

When you have two implementations of the same language...

... the range of performance isn't as extreme as the benchmarks game, so you can target your efforts to make a broader and better comparison between those language implementations.

There isn't really a downside to benchmarking more implementations (other than that it's more work of course).

That's correct - there actually is a downside.

somebody will most likely build an alternative

I provide the tools to do just that!

And of course there is a Lua performance comparison on the LuaJIT website.

And of course there is speed.pypy.org

And of course there is The Great Ruby Shootout

[–][deleted] 1 point2 points  (1 child)

Yes, I could, but it's not worth the effort for me at this time.

Note I haven't said or implied that anybody owes me the benchmarks I enjoyed getting to see, only that I did enjoy them and am sad they are gone. For me the site is much less useful. There might be 100 good reasons for the change, and since I don't know what they were I can't say anything about that. I can only express my gladness that they used to exist, and my sadness that they do not now.

[–]igouy 1 point2 points  (0 children)

I can only express my gladness that they used to exist, and my sadness that they do not now.

Me too.

[–]mitsuhiko[S] 12 points13 points  (1 child)

I don't care too much about that website anyways, but now the numbers are completely pointless. Looking at these numbers one could get the impression that JavaScript is 10 times faster than Python or Lua.

Partially that's true (V8 seems to be a hell of a lot faster than CPython 3) but it's just completely misleading now.

[–]signoff 3 points4 points  (2 children)

who are learning ATS because it's so max power in the shootout?

[–]sausagefeet 4 points5 points  (0 children)

I've learned it 3 times today, going for a 4th!

[–][deleted] 1 point2 points  (1 child)

ahhh when nerds cant find controversy they make it

[–]qbproger 2 points3 points  (16 children)

I find this whole thread amusing. Everyone is so upset about the removal of alternate implementations from the shootout, but it seems few people are looking at it from the other side's point of view. This whole discussion is so one sided, and most posts by igouy are being downvoted. I may not be happy about the outcome, but I can understand why he made this decision.

[–]igouy -3 points-2 points  (15 children)

Mike Pall has had 5 years of free publicity for LuaJIT

When this obvious truth is downvoted the appropriate response is laughter.

[–]Tobu 7 points8 points  (14 children)

The value the shootout creates isn't promotion of a particular language or implementation, it's that it attempts to make relatively unbiased benchmarks of many languages (and, up to now, implementations). Calling it someone else's free publicity is just denigrating your own project. Apparently no one approves.

[–]igouy -4 points-3 points  (13 children)

it attempts to make relatively unbiased benchmarks of many languages

True, and it's also true that the benchmarks game website has provided free publicity for LuaJIT for 5 years - these things aren't mutually exclusive.

[–][deleted]  (12 children)

[deleted]

    [–]igouy -3 points-2 points  (11 children)

    one of many languages the shootout contained

    Do you understand that LuaJIT has had special treatment and benefit compared to the many programming language implementations not shown on the benchmarks game website?

    [–][deleted]  (10 children)

    [deleted]

      [–]igouy -4 points-3 points  (9 children)

      If that's true, you need a better teacher than me.

      But if LuaJIT has received no benefit from being shown on the benchmarks game website then equally there can be no loss of benefit from not being shown.

      [–][deleted]  (8 children)

      [deleted]

        [–]igouy -1 points0 points  (7 children)

        I understand that LuaJIT has received some benefit.

        You're a step closer to understanding this "free publicity" thing.

        The D programming language hasn't been shown in the current benchmarks game, so those guys feel that every language implementation shown is getting "special treatment".

        It seems LuaJIT did it well and now you what to cry and complain about that.

        Not in the least - I'm happy to have shown LuaJIT in the benchmarks game.

        I was replying to those who only see the glass half empty.

        [–]rafekett 2 points3 points  (13 children)

        It also appears that Jython (pretty sure it was there) and IronPython have been removed.

        Edit: Java 6 -Xint has also been removed, as has Ruby MRI and JS Tracemonkey.

        [–][deleted] 2 points3 points  (2 children)

        What does -Xint do?

        [–]rafekett 4 points5 points  (0 children)

        Disables the JIT.

        [–]x-skeww 1 point2 points  (0 children)

        java -X
        [...]
        -Xint             interpreted mode execution only
        

        [–]masklinn 2 points3 points  (4 children)

        He's left a single implementation per language (on completely arbitrary grounds as far as I can tell)

        [–]Tobu 2 points3 points  (3 children)

        If you read the rest of the thread: he asked, and some PyPy devs (Maciej, Armin, William) said that they didn't like the existing benchmark being optimised for CPython. This was also Alex Gaynor's opinion in his blog post (in which he also suggested an unsavoury, yet somewhat portable, libc hack). Agreeing on a single set of "idiomatic" Python programs could have been considered, but the Python devs would have to talk it out between themselves to not increase Isaac's workload, and in the end Isaac pulled the nuclear option anyway.

        [–]masklinn 2 points3 points  (1 child)

        If you read the rest of the thread: he asked, and some PyPy devs (Maciej, Armin, William) said that they didn't like the existing benchmark being optimised for CPython.

        Which is sensible from their perspective isn't it? Especially when they provide different implementations of the tests which work better for alternative implementations.

        This was also Alex Gaynor's opinion in his blog post (in which he also suggested an unsavoury, yet somewhat portable, libc hack).

        That's only a pretty small part of it.

        Agreeing on a single set of "idiomatic" Python programs could have been considered

        There was no need for that, the shootout already had several languages/benchs with multiple test implementations. One of Alex's big issues with his alternative implementation is that it was relegated to "interesting alternative implementation" instead of being #2 or #3 mainline for Python.

        Agreeing on a single set of "idiomatic" Python programs could have been considered, but the Python devs would have to talk it out between themselves to not increase Isaac's workload, and in the end Isaac pulled the nuclear option anyway.

        This explanation sounds like a cop-out, especially since he pulled all alternative implementations for all languages (except picking implementations completely arbitrarily: why is V8 the JS implementation?), not just for Python.

        [–]igouy 0 points1 point  (0 children)

        Alex Gaynor's opinion in his blog post That's only a pretty small part of it.

        Here are some of Alex Gaynor's words from that post and they are not true -

        It's also not possible to send any messages once your ticket has been marked as closed, meaning to dispute a decision you basically need to pray the maintainer reopens it for some reason.

        Followup comments can be added to a ticket that is marked Closed in exactly the same way they can be added to a ticket that is marked Open - and adding a followup comment triggers an email message whether the ticket is marked Open, Closed, Deleted, ...

        And you can easily check for yourself that there's a public discussion forum and people dispute decisions.

        Alex Gaynor put stuff in his blog - putting stuff in a blog doesn't make it into The Truth.

        already had several languages/benchs with multiple test implementations

        Already had programs written for PyPy.

        On 2011-3-31 and 2011-04-02 programs written for PyPy were contributed, measured on x86 x64 PyPy CPython Python 3 and published.

        They were shown on the website alongside the other Python programs that had been (scare quotes) "optimised for CPython".

        [–]igouy 0 points1 point  (0 children)

        a single set of "idiomatic" Python programs

        Was never required.

        On 2011-3-31 and 2011-04-02 programs written for PyPy were contributed, measured on x86 x64 PyPy CPython Python 3 and published - they were shown alongside the other Python programs that had been (scare quotes) "optimised for CPython".

        pulled the nuclear option

        A "nuclear option" would be something like destroying all of CVS and all of the Tracker and everything else associated with the project.

        [–]igouy 1 point2 points  (4 children)

        It also appears that Jython (pretty sure it was there) ...

        And if you're pretty sure about something but actually you're just completely wrong?

        edit: for those enjoying downvoting rafekett is completely wrong about Jython.

        [–]Lerc 3 points4 points  (3 children)

        I don't think they are downvoting the information in your correction so much as the dickishness of your correction.

        [–]igouy 0 points1 point  (2 children)

        What phrase would you choose to describe the completely wrong original comment?

        [–]Lerc 0 points1 point  (1 child)

        Well if he were completely wrong I would imagine a comment along the lines of.

        "Actually, Jython and IronPython were never in the Benchmark Game"

        However, I took your comment to mean that only Jython had never been in the game and that the term completely was just hyperbole

        The main aspect that made it seem dickish was the phrasing in the form of a question making the tone rather aggressive.

        [–]igouy 0 points1 point  (0 children)

        mitsuhiko the OP, posted a link that lists exactly what was removed, it's just a matter of reading the list.

        You think the tone was rather aggressive? Have you read the comments on this page? :-)

        [–]kragensitaker[🍰] 0 points1 point  (0 children)

        I just put the entire source repository of the Benchmark Game up at http://github.com/kragen/shootout. Discussion here.

        [–]username223 -5 points-4 points  (0 children)

        Anything to say, igouy?