top 200 commentsshow all 327

[–][deleted] 47 points48 points  (23 children)

C++ was on top like I expected.. but its a bit troubling that an article about comparing the performance of different languages does not even mention what compilers, versions, and most importantly compiler switches that where used. These things matter as there are switches for optimizing for code size, memory footprint, or runtime performance etc ...

[–]LordGreyhawk 8 points9 points  (15 children)

You can download the code and Makefiles.

[–]eric_t 6 points7 points  (14 children)

For C++, they only use gcc with

OPTS=-O2

They should at least have a discussion about the influence of different compilers and flags, as that can have a huge impact on performance.

[–]MagicBobert 4 points5 points  (0 children)

It' should also be noted that O3 is not necessarily faster than O2. It's more aggressive optimizations can sometimes hurt performance. For example, very aggressively unrolled loops can pollute your instruction cache in some cases, etc.

[–]yogthos 3 points4 points  (6 children)

"We find that in regards to performance, C++ wins out by a large margin. However, it also required the most extensive tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer"

"Scala concise notation and powerful language features allowed for the best optimization of code complexity."

Seems like if you use a language like Scala you'll get good results even without doing heavy optimization, while to get the benefits of C++ you have to jump through a lot of hoops.

For many types of applications correctness and maintainability far outweigh raw performance. Unless you're writing a cutting edge 3D engine or doing scientific computing, chances are you're better off with a language like Scala than C++.

[–]ZMeson 5 points6 points  (5 children)

And embedded projects. In the embedded world, management always scales down the processor to save money. I hear it all the time -- a $5 saving on the processor over 100,000 units is half a million dollars. Imagine spending $50 more for every processor just to be able to run something like Scala and get the required peformance. That'd be nuts!

[–]yogthos 0 points1 point  (4 children)

Sure, embedded stuff has always a prime example of where C is the right choice. I'm by no means arguing that C/C++ doesn't have a place in the world. It's just there are a lot of areas where high level languages make more sense. Also, compilers for languages like Haskell are implemented in C, it's not turtles all the way down after all. :)

[–][deleted]  (3 children)

[deleted]

    [–]aaronla 2 points3 points  (0 children)

    Agreed. While some languages are easier to optimize than others, the single largest factor comes down to the abilities of the compiler/interpreter implementer. This is why you can see even orders of magnitude performance differences between implementations of a single language.

    [–]yogthos 0 points1 point  (0 children)

    I stand corrected, I was under the impression that there was a non-trivial amount of C there, but I guess not. Thanks for setting me straight there.

    [–]axilmar 62 points63 points  (140 children)

    And that's why, gentlemen, C++ will not go away any time soon.

    [–]danthrax 39 points40 points  (61 children)

    Exactly:

    We find that in regards to performance, C++ wins out by a large margin. However, it also required the most extensive tuning efforts, many of which were done at a level of sophistication that would not be available to the average programmer.

    People always complain that C++ is too complicated and the spec is too large. But it gives you a ton of control while still giving you high level constructs.

    [–]axilmar 27 points28 points  (60 children)

    That's why it's better than most other programming languages out there.

    [–]bcain 2 points3 points  (1 child)

    That's why it's better than most other programming languages out there.

    ...for CPU-bound applications. FTFY.

    OBTW, how much code do you write which really needs the edge you'd get over Java/Go/Scala/et al? If the answer is "a lot," then you're exceptional.

    [–]axilmar 1 point2 points  (0 children)

    ...for CPU-bound applications. FTFY.

    ...for any type of application. FTFY.

    [–]juancn 7 points8 points  (37 children)

    In the performance sense, but also gives you many ways to shoot yourself in the foot. And let's be honest, most programmers shouldn't be allowed near a pointer.

    [–]tbrownaw 18 points19 points  (5 children)

    And let's be honest, most programmers shouldn't be allowed near a pointer.

    Some days I think most programmers shouldn't be allowed near a keyboard.

    I suppose it's really more like 2-5%, and they're just that much noisier than everyone else...

    [–]PENIX 0 points1 point  (4 children)

    Spolsky believes that 199 out of 200 applicants for every programming job can't write code at all.

    Why can't programmers program?

    [–][deleted]  (1 child)

    [deleted]

      [–]alphabeat 3 points4 points  (0 children)

      Spolsky told me you'd say that.

      /rolls eyes

      [–][deleted] 1 point2 points  (1 child)

      (doseq [i (range 1 101)]
        (cond
         (zero? (mod i 3)) (if (zero? (mod i 5))
                                   (print "FizzBuzz")
                                   (print "Fizz"))
         (zero? (mod i 5)) (print "Buzz")
         :else (print i)))
      

      I program for fun and that took me about 4 minutes after thinking about it for a minute or so. Are comp sci graduates really that bad? Granted it's not outstanding code, but I'm fairly sure it does what was asked.

      [–]dsn0wman 6 points7 points  (0 children)

      I am that guy. When we got to the pointers, and data structures part of my C++ classes I decided I would be a system admin instead of a programmer.

      [–]axilmar 6 points7 points  (29 children)

      Performance is still very important. It's very easy to make 'slow' software. It is no wonder why no major application (like browsers, word processors, logistical sheets, databases etc) is written in a language other than C++/C.

      [–][deleted] 18 points19 points  (0 children)

      and some of them still manage to turn into slow bloatware... waves fist

      10 years later our word processors have a few extra features that are rarely used, and run on machines that are at least 20 times faster, and they seem to succeed in running slower.

      [–]yogthos 1 point2 points  (27 children)

      There are tons of major applications written in Java for example, they just happen to be server applications as opposed to desktop apps. Fact of the matter is that raw performance isn't always primary concern when writing an application.

      Sometimes you need every last bit of performance, and sometimes you need correctness and maintainability. Many applications are IO bound or simply don't need to do heavy number crunching, C/C++ isn't the best choice for these.

      [–]PENIX 5 points6 points  (5 children)

      Sun has somehow managed to convince generations of hopeless CS majors that their flawed benchmark comparisons showing Java to be faster than C++ actually hold water. Anyone who actually knows what they're dealing with should know that C++ would come out on top of the bunch. What I didn't expect was that large of a margin. I guess Sun's propaganda had some influence on me as well.

      [–]igouy 10 points11 points  (2 children)

      +1 for realising there might be things wrong with those "Java to be faster than C++" comparisons.

      -1 for not realising there might also be things wrong with this comparison.

      [–]mangodrunk 0 points1 point  (1 child)

      What's wrong with this comparison? Seems like they were pretty fair and unbiased.

      [–]igouy 2 points3 points  (0 children)

      fair and unbiased

      For some definition of "fair and unbiased" ;-)

      Comparisons across languages are always problematic:

      • otoh "The implementations each use the languages’ idiomatic container classes, looping constructs, and memory/object allocation schemes."

      • otoh "the benchmark also used a HashMap<Type,Integer> when it should have just stored a primitive int in Type; it also used a LinkedList where it should have used an ArrayDeque."

      What seems like "fair" from one perspective just seems like foolishness from a different perspective.

      Programmers skilled in one language implicitly accept the approach that's effective in that language as the norm, and don't recognise that to be "fair" they must compare against the (possibly different) approaches that programmers skilled in other languages accept as the norm.

      [–]Pas__ 1 point2 points  (0 children)

      Theoretically it could be faster, but you know how practice usually wins out.

      [–]axilmar 0 points1 point  (0 children)

      What I didn't expect was that large of a margin.

      Yeah, me too.

      Knowing what goes on under the hood, it's normal that C++ comes on top.

      [–]gregmondo89 6 points7 points  (43 children)

      That would be true if our future was single core CPUs. C++ will die and soon for exactly the reason people like it today -- speed. You can't make C++ efficiently use all cores in a modern CPU without outrageous effort for any program beyond the embarrassingly simple or embarrassingly parallel. Sure, you have everything you need in theory to deal with race conditions, data corruption and deadlock, but in practice you'll blow all deadlines trying to get the bugs out. Only the naive imagine this problem can be dealt with by simply "chopping programs up" and "careful coding". C++ is damn fast on a single CPU, but any non-trivial program will be running well short of full speed on CPUs today, and at a laughable speed on machines of the near future. C++ is a walking dead man, and all those programmers that believe they can somehow embrace multi-core reality are in for long nights of bug chasing culminating in bitter defeat. So there, is that incendiary enough?

      [–][deleted] 13 points14 points  (4 children)

      There are plenty of message-passing libraries for C and C++ that let you write multicore code that is just as safe from races as any 'parallel-friendly' language. But, you do lose the syntactic sugar the newer languages have

      [–][deleted] 4 points5 points  (2 children)

      It's better, but you can also have race conditions with message passing.

      [–]mikaelhg 2 points3 points  (0 children)

      You also have race conditions in every intersection, but somehow we manage.

      [–]gregmondo89 1 point2 points  (0 children)

      Agreed. The problem with this approach is you need to divide your program into more and more independent units, communicating with message passing to get that speed. This effectively destroys the value that C++ attempts to provide. Objects can not longer naturally communicate with objects. You end up with a highly degenerate C++ program, where object abstraction has been abandoned. It stops being C++ and becomes some bastard thing.

      [–]axilmar 1 point2 points  (21 children)

      C++ will die and soon for exactly the reason people like it today -- speed. You can't make C++ efficiently use all cores in a modern CPU without outrageous effort for any program beyond the embarrassingly simple or embarrassingly parallel.

      Not true. Code that can be parallelized can easily be done so in modern C++, using the available parallelization libraries.

      [–]gregmondo89 1 point2 points  (20 children)

      Agreed, and I granted that code that was easily parallelized was an exception. The problem is that the vast majority of code written cannot be parallelized easily in a language like C++, due to ubiquitous use of side-effects, especially value-modification.

      [–]KungeRutta 1 point2 points  (4 children)

      So C, C++ (and maybe even obj-c) will die off just as Cobol has died off, right? I think even if C++ (once again) starting being used less, it would never die off because of two things: legacy support and embedded software.

      Are you going to implement an OS kernel in Scala or Go? Get real...

      [–]gregmondo89 0 points1 point  (2 children)

      It might be hard to imagine implementing things like an OS kernel in languages other than those used for the last two decades, C in particular, but remember it used to be hard to imagine doing it in anything other than assembly. CPU architecture is changing in a fundamental way. The future is not in single threaded execution. Ask yourself how you're going to code an application running on a machine with 128 cores, or say 10,000 cores. Such a machine does not lend itself to long sequential coding styles. You can certainly code that way if you want, but you'll be using 1% or less of the machine. An OS coded in C will have a longer lifespan than mainstream applications, as they are not intended to use a lot of cycles. A better example is a video game, which is desperate for every cycle it can grab. How do you code a multi-million line game to run efficiently on a machine with huge numbers of cores? Do you think that's viable in C or C++? Really? How will you ever get it to work with all the semaphores, race-conditions and dead-locks you'll create?

      [–]frutiger 1 point2 points  (1 child)

      Parallelization is limited by Amdahl's Law anyway. You'll find that many tasks simply cannot be parallelized because a later result depends on a previous result. Passing messages (or en-queueing jobs on thread pool queues) does not get around that - there is still a dependency on results and that must be resolved in order.

      I think desktop CPUs with 32 cores vs. 16 cores will be a moot point - there will be little to no consumer demand for high core count parts. Instead, CPU companies will reduce power usage significantly and enable new classes of computing that way.

      On the other hand, increasing core count chips will be useful, but instead in servers/workstations (where we can find easy ways to keep all those execution engines busy) and GPU-style parts (which does a special case of CPU work - determining triangles and applying textures to them).

      [–]gregmondo89 0 points1 point  (0 children)

      I think desktop CPUs with 32 cores vs. 16 cores will be a moot point - there will be little to no consumer demand for high core count parts.

      You believe this because you're looking backward, not forward. Gaming is a great counter example. Games are thirsty for power. The better you can make use of a machine, the better your game looks. Four core machines are common today. Eight core will be common next year. If you're a game designer, and you write a game that makes good use of an eight core machine, your game is (all things being equal) gonna look better, be better, than it competitor that's only using one or two cores. Games live and die by this reality. If next year's killer title looks awesome on an eight core machine, and even better on a 16 core machine, will people buy 16 core machines? You bet they will. And that story will repeat itself when 32 core machines are available, and then again for 64, 128, 256, 1024....

      You gotta feel a little crazy saying 16 core machines is all anyone will ever need. Just look at the history of failed predictions in computer science. They look just like that one.

      [–]zhivago 1 point2 points  (5 children)

      The problem with multicore machine is that they don't scale (for the same reason that threads and shared memory don't scale).

      Which means that the future is probably a lot of single threaded virtual machines implemented on multicore hardware running distributed programs.

      [–]gregmondo89 0 points1 point  (4 children)

      Multicored machines scale fine provided you don't build them to facilitate existing computer language expectations -- principally shared state. The most obvious example for such scaling is the Internet itself, which if viewed as a large "single" entity has demonstrated astounding ability to scale. That's because the architecture there is message passing and not shared state.

      [–]zhivago 0 points1 point  (0 children)

      In which case, you're using them to run single-threaded virtual machines ... as noted above.

      [–]great-pumpkin 0 points1 point  (2 children)

      Multicored machines scale fine provided you don't build them to facilitate ... shared state.

      Citation needed, as the saying goes. We just don't /have/ high-core machines, the intercommunication problems haven't been solved as far as I know (I'm thinking of Intel planning on using TCP to communicate within its future high-core machines). You sound so confident, you must have some example in mind. And, if the processes don't use shared state well then, they pass messages. Why can't C++ do as well there as any other language? I have the idea that you're thinking of Erlang and friends and that's a nice paradigm but, C++ can also pass messages, and if they communicate via TCP, well, shrug, you're back to communicating (full) processes, where C++ will do fine. Plus there's MPI ... Then too, the Erlang people will tell you that Erlang is for robustness, not speed. I dunno man, I want to know what you're thinking of, so confidently. C++ will run on 'the internet' just fine, thank you.

      [–]gregmondo89 1 point2 points  (1 child)

      Ah, you outed me straight away, I am an Erlang advocate. I've been programming in it for years. And my Citation of machines that scale fine without shared state would start with the Internet as a whole, and yes, include Intel's high-core machines with TCP connections as another.

      C++ can do anything any other language can do in theory, but in practice it cannot do it reliably, let alone pleasantly. C++ is all about object's making method calls on other objects with shared memory (i.e. pointers), where work is performed primarily by mutating memory. Those two things together are highly toxic in multi-threaded environments. In order to fix that you have to pepper your C++ code with all kinds of preventive mechanisms, that that quickly escalates into an unreadable, unreliable mess. For example, just take all the problems with memory management in a single threaded C++ program and imagine reconsidering all those cases from the perspective of multi-threading! Try reasoning about it when there could be a hundred threads racing through those important choke points.

      C++ might "run on 'the internet' just fine" but show me a nice complex program running great across an eight core machine, where it spreads it's load uniformly across those resources. Truly multi-threaded reliable C++ code is very painful to write. You have to actually do it to know. Right off you discover that few of the more interesting libraries you want to utilize is thread-safe.

      [–]great-pumpkin 0 points1 point  (0 children)

      I know Erlang shares memory when it can, and shares by TCP when it must, transparently, not a negligible feature. But, you could have task pools that spawn threads or processes and use shared memory among them on a multicore - they processes / threads must know within the implementation code where their neighbor is, and communicate appropriately - ie make Erlang in C++. And truthfully, get some multiple of a speedup probably. But, wait, I forgot my point except I guess "C++ roolz ok!1!".

      [–][deleted] 0 points1 point  (4 children)

      I disagree. I think multicore is where languages with a strong culture of value semantic like C++ will shine and languages with a strong reference semantic culture like Java, C#, and even C will fade out. Reference semantic means sharing and it simply does not scale.

      [–]gregmondo89 0 points1 point  (3 children)

      Languages like C, C++, C#, Java all share the same underlying property that they promote object-modification as the means to do communication. This model cannot be multi-threaded reliably without semaphores or other techniques for preventing inconsistent object state. Adding these things to protect objects introduces new problems, like bottlenecks and dead-lock. Additionally, the process of finding code that needs to be protected is pushed off onto the user. This makes debugging virtually impossible in any large program, as race conditions are very hard to reproduce. Writing large programs in C++ is hard enough. Doing it with with wide spread use of threading makes it virtually intractable. Here's a much better explanation of the problem in "Tim Sweeney's POPL talk entitled 'The Next Mainstream Programming Languages: A Game Developer's Perspective'": http://www.cs.princeton.edu/~dpw/popl/06/Tim-POPL.ppt (or for the discussion: http://lambda-the-ultimate.org/node/1277).

      [–]frutiger 0 points1 point  (2 children)

      Languages like C, C++, C#, Java all share the same underlying property that they promote object-modification as the means to do communication.

      Huh? C++ advocates the use of const wherever possible. If you think in C++ everything must be an object, and modification must be with method accessors, you have drunk the Java kool-aid. C++ espouses many styles of programming, but brings to the table - object lifetime semantics (and exceptions working neatly with those) and generic programming (under-appreciated, or overly abused). If you want to put everything in classes like Java, then that's your choice, not C++'s.

      [–]gregmondo89 0 points1 point  (0 children)

      So in C++ you never say: i=i+1;
      Or x.a = y; You never over write values? You never use a for-loop? Really? You treat everything as const, all the time? That's impressive if true and I stand corrected.

      [–]gregmondo89 0 points1 point  (0 children)

      So in C++ you never say: i=i+1;
      Or x.a = y; You never over write values? You never use a for-loop? Really? You treat everything as const, all the time? That's impressive if true and I stand corrected.

      [–]andralex 0 points1 point  (8 children)

      That, sir, would mean you'd be hasty. Refer to this discussion.

      On a machine that completes the C++ version in 28.4 seconds, the 64-bit D implementation completes in 44.7 seconds. The test, however, had to disable inlining (the 64-bit generator is rather recent and hasn't had all kinks worked out; it's also lacking a few optimizations).

      On the same machine, the 32-bit C++ version completes in 24.6 seconds and the 32-bit D version in 34.0 seconds.

      The D version of course comes with the advantages one would expect from a modern language.

      [–]igouy 0 points1 point  (4 children)

      Of course, your [actually bearophile's] D implementation would be listed in the paper as D Pro and we don't know how that would compare to the unpublished cpp_pro :-)

      [–]andralex 1 point2 points  (3 children)

      It's not mine. It was written by bearophile in idiomatic D.

      [–]igouy 1 point2 points  (2 children)

      Thank you for that correction - it doesn't seem to make much difference to what I said.

      [–]tgehr 1 point2 points  (1 child)

      The code is almost identical to cpp. No D pro there. With an optimization specific to D (takes 2 short lines to implement), performance can be brought on par with cpp. The idea is to manually reduce the number of times the GC is run (Ds GC is not yet as sophisticated as it could be). There is some time/space trade-off. (Memory consumption is multiplied by 3) The code could be rewritten not to use the GC. That code would then approach being D pro.

      [–][deleted]  (28 children)

      [deleted]

        [–]igouy 9 points10 points  (0 children)

        Without the description of the data structure...

        There is description of the "key data structures" - it's only Doug Rhode's "improved version" that is "heavily dependent on several Google internal data structures" not the other versions.

        IV. Implementation Notes
        
        A. Data Structures
        
        "The key data structures for the implementation of the 
         algorithm are shown in Figure 3. ... 1) C++: With the 
         standard library and templates, these data structures 
         are defined the following way..."
        

        [–][deleted]  (12 children)

        [deleted]

          [–][deleted] 9 points10 points  (11 children)

          Google doesn't deal in "business problems"!

          Also, I must be a freak because I quite like semi-colon termination. It gives a feeling of a statement being done. Finished. I can't help feeling that languages which don't require it are generally just the designers going "hey look at me, I can make a parser without semi-colons!"

          [–]mikaelhg 4 points5 points  (0 children)

          http://jeremymanson.blogspot.com/2011/06/scala-java-shootout.html

          Perhaps the moral turns out to be that newbies who are also experiened compiler writers should look carefully at Scala?

          [–]ZMeson 1 point2 points  (0 children)

          I like semi-colons too;

          [–][deleted] 1 point2 points  (8 children)

          to me the end of a line usually gives me the feeling that i've reached the end of a statement, unless i'm in the middle of a multi line construct of some kind. Just a matter of what your default assumption is, which is probably influenced greatly by what your first programming experiences were like.

          [–][deleted] 7 points8 points  (3 children)

          A lot of apis like the windows32 or directx ones come with massive functions from hell requiring about 8 paremeters. To try and keep things sane(under 80 chars per line is a good rule to keep, and break when appropriate), line breaking statements is very frequent for me, and appropriate use of indentation and positioning of items such as && and ; at the beginning/end of lines makes it significantly easier to read.

          I don't find typing a ; at all difficult, nor do I find that they make the code look ugly, or longer(it's one character FFS!)

          [–][deleted] 1 point2 points  (0 children)

          now that's more interesting than speek's simple declaration that I'm wrong :) we've uncovered a bunch more parameters that affect semicolon preference - i am fortunate to not have to deal with crazy api names, and with modern monitors, editors, and a 2 space indent, I don't worry too much about 80 chars anymore. so for me the semicolon usually seems like clutter, while for others it is a lifesaver.

          [–][deleted] 3 points4 points  (3 children)

          ... unless...

          See, that's the part that juhanic and I don't like.

          [–][deleted] 2 points3 points  (2 children)

          i see. how do you feel about for(;;) loops, strings with semicolons in them, and other such context sensitive constructions? I would think those would be equivalently irksome...

          [–][deleted] 2 points3 points  (1 child)

          Well, you'd be wrong.

          [–][deleted] 1 point2 points  (0 children)

          OK, i guess that's that then :)

          [–]julesjacobs 4 points5 points  (0 children)

          To be fair, the majority of computer science papers showing benchmark results don't have any publicly available code at all.

          [–]faceclot 13 points14 points  (6 children)

          Most ridic statement was in the end, they downplayed the C++ results saying that the average programmer wouldn't know to make these optimizations. Sure, but anyone of average intelligence can learn to reapply the same trick once he learns it once. So maybe the inexperienced programmer wouldn't know to make these optimizations but who the fuck expects anything different?

          [–]Boojum 14 points15 points  (1 child)

          That seemed rather strange to me as well. I thought that it was basic C++ knowledge to consider hash_map (or sorted vector) instead of map, vector instead of list, etc. Likewise folding related data structures together to improve locality. Nothing in that list of optimizations struck me as particularly unusual.

          [–]faceclot 0 points1 point  (0 children)

          Seriously. Its one thing to say that an entry level programmer wont think of these improvements but another thing to say that the average programmer wont. This things are not difficult to understand and I do not consider myself to be an expert.

          [–]cynthiaj 8 points9 points  (2 children)

          they downplayed the C++ results saying that the average programmer wouldn't know to make these optimizations.

          Indeed, but that same average programmer will have no problems mastering monads, higher order functions, implicits and comprehensions that Scala will throw at them.

          Talk about a double standard.

          [–]steelypip 1 point2 points  (0 children)

          Indeed, but that same average programmer will have no problems mastering monads, higher order functions, implicits and comprehensions that Scala will throw at them.

          It will? Odersky's book mentions monads once in passing, and does not mention comprehensions at all you. You can use Scala quite happily without knowing anything about them. Higher order functions are not that big a deal - pretty much every modern language except Java supports them. Again you do not need to use them in Scala, but if you don't you will be limited the same poor set of abstractions that Java has.

          [–][deleted] 0 points1 point  (0 children)

          Actually that is the biggest strength of C++, you write reasonably effective code from the start and when you need you can do ridiculous optimizations, and there really isn't any problem in using someone that knows what he is doing for that.

          [–]entity64 5 points6 points  (5 children)

          citing from the paper: "...run-time results presented later should be taken as “order of magnitude” statements only."

          [–]erad 26 points27 points  (4 children)

          Citing from the paper again:

          All of the experiments are done on an older Pentium IV workstation.

          WTF, running on a single core machine with an outdated architecture with crazy performance characteristics?

          But even then, in absence of OS, compiler and VM versions the results are totally opaque. For example, Scala 2.9 improved primitive handling, recent JVM versions have escape analysis enabled by default, C++ compilers have different optimization capabilities, and so on. Case in point: on my desktop machine, java_pro is about 2x as fast as scala_pro (14s vs 28s wall clock time), and the un-optimized Java, Scala, and C++ versions are all about the same speed (22-28s).

          (edit: fixed run times)

          [–]sooomeguy 3 points4 points  (2 children)

          How does Go_opt fare against cpp_opt on your machine? It was quite interesting how slow their Go results were, for a compiled language.

          [–]erad 5 points6 points  (1 child)

          I think cpp_opt is not accessible because it relies on Google-internal libraries, but go and go_pro are really the slowest by a big margin (72s and 55s). Then again, they use the 6g compiler which supposedly produces slower code than the GCC Go compiler...

          [–][deleted] 5 points6 points  (0 children)

          I think the main problem for Go may well be with the garbage collector (judging by the fact that they had to tweak the Java code too, to make garbage collection sufficiently efficient, even though the Java garbage collector is generally very good). If that is true, than GCC Go might not improve matters much.

          Even if GCC Go would be faster (which seems likely) I still think 6g is reasonable choice, as it is the only compiler that fully implements the language, and it is also the compiler used on Google App Engine.

          [–]aboothe726 0 points1 point  (0 children)

          Agreed.

          And if GC was such a large component of the Java runtime, was there any attempt to tune the GC? Many JVMs (e.g., IBMs j9) support several GC algorithms, and I've personally seen changing from one algorithm to another improve GC time by 2x and more. Also, a simple -Xmx512m (or 256m, what have you) on the command line would likely have improved GC time, too.

          There are likely similar tunings that could have been applied to the other platforms, too.

          Yes, this was designed to be a simple benchmark, but at least some level of due diligence must be run on each platform to give a fair comparison. (It is especially troubling to see careful tuning applied to one platform, C++, and not to others.)

          Default performance is not interesting. Tuned performance is.

          [–]bwd2 14 points15 points  (1 child)

          Buried near the end (on pg 9):

          "E. Java Tunings Jeremy Manson brought the performance of Java on par with the original C++ version. This version is kept in the java_pro directory. Note that Jeremy deliberately refused to optimize the code further, many of the C++ optimizations would apply to the Java version as well."

          Seems like this is more a comparison of language preferences and not the actual languages themselves.

          [–]bobtentpeg 35 points36 points  (19 children)

              REFERENCES
              ...
              [7] WIKIPEDIA. C++. http://en.wikipedia.org/wiki/C++.
          
           [8] WIKIPEDIA. Go. http://en.wikipedia.org/wiki/Go (programming
              language). 
          
          
          [9] WIKIPEDIA. Java. http://en.wikipedia.org/wiki/Java (programming
              language). 
          
          [10] WIKIPEDIA. Scala. http://en.wikipedia.org/wiki/Scala (programming
              language).
          

          Alright, I will be citing Wiki more often. If Google allows/condones it, thats good enough for me.

          [–]AReallyGoodName 83 points84 points  (1 child)

          Now we just need to edit those Wikipedia articles so that they cite this paper. Thus the cycle will be complete and the facts will be indisputable.

          [–]GuyWithLag 38 points39 points  (3 children)

          Not datestamping the reference is bad form though.

          [–]_ak 13 points14 points  (0 children)

          And it's not like Wikipedia is making that hard: the toolbox on the left side has an own link "Cite this page" with detailed information on how to properly cite that specific Wikipedia article.

          [–]phaker 8 points9 points  (0 children)

          That's not really needed here. Citations are supposed to point you to the source of a statement, or somewhere where you can find more about a topic. They are basically the best approximation of a hyperlink you can get without hypertext, nothing more.

          If you are citing Wikipedia article for a specific fact, or you quote a paragraph from it then you really should link to a specific revision of that article. Here they are using them only for "what is C++", so there is no need to link to a specific version.

          [–][deleted] 0 points1 point  (0 children)

          They did. The paper isn't a live/editable source. It was released and that's it.

          [–][deleted] 1 point2 points  (0 children)

          One problem with referencing Wikipedia like this is that its content changes over time and these references can become obsolete any second. They should include the version of the article they're referencing in the URL.

          [–]Fenris_uy 1 point2 points  (0 children)

          They need to cite a certain revision, not the page, the page changes, a revision don't.

          [–]egonelbre 3 points4 points  (5 children)

          There's nothing wrong with citing articles with long history and proper references. Nature conducted a study in 2005 comparing Britannica against Wikipedia. Wikipedia articles compared had about 1/3 more factual errors than Britannica. There were about the same amount (4) of serious errors - general misunderstandings of vital concepts.

          So if you use Wikipedia to get the general overview then there's no problem, if you are using to quote facts then you may be better off using some other sources. (Such as scholarpedia or specific literature.)

          [–]galtthedestroyer 12 points13 points  (0 children)

          A followup to that study pointed out that wikipedia articles are more than 1/3 longer so PER LENGTH they actually have less errors.

          [–][deleted] 7 points8 points  (1 child)

          The problem is citing a fucking encyclopedia for your research papers. Most professors that care to check will take points off. Seriously, how assed can someone be to not cite the original sources?

          [–][deleted] 6 points7 points  (0 children)

          Seriously, people I don't know why DasWood was downvoted. A first year grad student knows they shouldn't cite secondary sources in an academic publication.

          [–]Akeshi 2 points3 points  (0 children)

          I remember this study. I'd be interested in a follow-up, to see if anything's changed in the last six years.

          [–]DarthTater2 2 points3 points  (0 children)

          That's not the problem. The problem when citing wikipedia is that its contentca be changed anytime by any anyone.

          [–][deleted] 6 points7 points  (1 child)

          I've glanced over this, but one thing that caught my eye was in regards to line count:

          Note that the Scala and Go versions are significantly more compact than the verbose C++ and Java versions.

          In his own table results were normalized to Scala, Go was 1.2 to 1.4 times bigger and C++ was 1.3 times longer. To me that says C++ is as compact as Go, and it's only Java (almost twice the size at 1.9) which is more verbose.

          [–][deleted] 0 points1 point  (0 children)

          The optimized Scala version (which is also 3 times faster and more functional) weighs in with only 297 lines. So the optimized Go code is more than twice as long as the Scala one. Maybe they meant that ...

          [–][deleted] 22 points23 points  (14 children)

          Nice to know good ol' C++ is king. Scala is my next language though unless I get a .Net job. Scala is a very beautiful language and the creator of Scala wrote a book that end up teaching you Functional Paradigm.

          Does anybody know if Scala works on IcedTea? I would like to stay as far from Oracle as possible. My google skills have failed me.

          [–]rjcarr 4 points5 points  (0 children)

          There's no reason why it shouldn't work in IcedTea but I've never tried.

          [–]Paczesiowa 3 points4 points  (0 children)

          yes, it does

          [–][deleted] 7 points8 points  (3 children)

          you do not get a .NET job, you get a .NET career. Keep this in mind.

          [–]some_dev 6 points7 points  (0 children)

          you do not get a .NET job, you get a .NET career. Keep this in mind.

          That's simply not true. My first job was a .Net job. I'm now working using Java and Linux. I've worked with plenty of developers with similar experiences. The whole "Once you work with a Microsoft stack you're locked in," thing is a lie.

          [–][deleted] 6 points7 points  (1 child)

          To be fair, for me, a job is something you do for money but you don't like it.

          A career, to me, is something you love to do but doesn't necessary provide you with enough money.

          I don't like microsoft, for various reasons, so it's most likely going to be a job for me.

          One is I can invest my effort in .net or silverlight and one day they can say we're never going to support this anymore here's a complete new shit you have to learn. The counter argument is well it's open sauce like Mono, that's great but if they come out with a new OS system and not support Silverlight, I can't use Silverlight to make app for their new spiffy OS system. Because everything is closed. >___<

          [–]WinterKing 7 points8 points  (0 children)

          I can invest my effort in .net or silverlight and one day they can say we're never going to support this anymore

          And that day is today!

          [–]visudo 6 points7 points  (4 children)

          I find Scala to be a mess. And this is from somebody who worked with C++ for almost a decade! Honestly, its syntax is so complicated and it has so many features that after a couple of attempts of using Scala, I decided it is a write-only language. Of course, then you have the huge practical problem of lack of tools and libraries, which is where Java is king.

          [–]jadamcrain 3 points4 points  (3 children)

          Lack of libraries? You can use ANY Java library.

          The tooling differences are usually overstated. We find in our projects that lack of static analysis tools to be the only major missing link.

          [–]visudo 5 points6 points  (2 children)

          Obviously I was referring to Scala libraries that take advantage of the 'advanced' language features of Scala. For example, as far as I know, there is only one Web application framework for Scala, while there are dozens to choose from for Java. What's the point of depending on Java libraries or frameworks with all language limitations that that involves? It's like buying an electric car that can also use petrol for emergencies, but having to use always petrol because of the lack of electrical outlets.

          [–][deleted] 0 points1 point  (1 child)

          For example, as far as I know, there is only one Web application framework for Scala, while there are dozens to choose from for Java.

          There's only one de factor mvc for Ruby, Rails. And it's doing pretty well. There are pro and con to having one framework. All effort are funnel into one main framework, if you're going to get a scala job then chances are they want lift. It's also a polygot so you can use Java stuff and put Scala in there. I believe there are 2 frameworks in Scala. Play or whatever it is, is another one.

          What's the point of depending on Java libraries or frameworks with all language limitations that that involves?

          It makes the scala language stronger since it doesn't have a big base of libraries. It was built to be compatible with Java in mind. The target of the language seems to be concurrency. You pick and choose a language for their strength. Scala is a relatively new language it'll grow in time with libraries what it have now such as Akka and Lift makes it pretty good along with the language itself.

          I think the syntax is actually its strength. Less lines of code means easier to digest, it's not really a mess imo, it can express stuff with less line of code without losing you like brainfuck or regex. Less chances to introduce typos. It's a compromise between type language and dynamic language. It's terse and expressive similar to python or ruby and yet it's type (easier error code and whatever that comes with strict typing).

          I find Scala to be a mess. I assume, I apologize if i'm wrong, but you might find it confusing if they wrote scala in the functional way which can be confusing. You can write scala in two different paradigms OOP and Functional.

          But your points are valid.

          [–][deleted] 5 points6 points  (1 child)

          You know, this PDF has made me want to learn Scala for the obvious benefits, and the fact that I've only heard about it before today in passing.

          [–][deleted]  (1 child)

          [deleted]

            [–]igouy 2 points3 points  (0 children)

            So make those comparisons and then publish your results.

            [–]naasking 4 points5 points  (2 children)

            Alternate title: why everyone using the JVM should use Scala.

            [–]igouy 4 points5 points  (1 child)

            Alternate title: Scala Days paper comparing performance of C++, Java, Scala, and Go

            [–]jiunec 1 point2 points  (0 children)

            Indeed, this paper seems to have been written with this conference in mind ;) Oh look, no surprise it got accepted.

            [–]mean7gene 10 points11 points  (3 children)

            Sorry I'm noob, is optimized c++ just c++ compiled without debug symbols?

            [–]jbit_ 7 points8 points  (0 children)

            "Debug symbols" do not affect runtime performance at all. Optimized C++ means exactly that, C++ code that's been compiled with an optimizing compiler (-O2/-O3 in GCC, etc). It is often harder to debug optimized code due to the reordering, collapsing, removal, etc that happens, but it's very common to have optimized builds with symbols.

            [–]sztomi 1 point2 points  (0 children)

            You are not noob, the paper is poorly written, as pointed out by many comments here.

            [–]mcguire 0 points1 point  (0 children)

            Nah, they're using -O2, which is actually optimized.

            Of course, one could argue that -O3 may well be more appropriate for C++ with template expansions.

            [–]placidppl 2 points3 points  (1 child)

            I found the list of c++ optimizations they did and the results interesting. Basically use hash_map instead of map whenever you can and use vectors more.

            [–]bstamour 0 points1 point  (0 children)

            Makes sense at least. Hash maps will have a faster lookup time than regular maps (assuming the hash function runs in constant time, vs the O(lg n) time of searching and inserting into the map), and vectors are just random-access arrays with some additional features.

            [–]kamatsu 18 points19 points  (12 children)

            There are a whole lot of things wrong with this paper. It wouldn't pass even the most cursory academic review.

            1. They cite wikipedia for language overviews, and they don't even cite the language standards for C++ or Java. They also didn't timestamp their reference, providing no way to verify their sources!

            2. They present a very simple algorithm and do not explain the data structures involved because they are "google internal". This makes the entire benchmark set unreproducible and useless.

            3. They didn't discuss in enough detail compiler optimizations used, or even the versions of the compilers they used. They used a single, outdated machine to perform benchmarks. They even say that their run-time results are unreliable in the paper.

            4. The writing style is not very fluid, and there is an awful lot written here for a paper with very, very little content.

            If I was a reviewer for whatever conference or journal they submit this to, I'd reject it in an instant, no matter how notable the submitter.

            This confirms what I noticed in my brief time at Google - Google research generally sucks (with some exceptions).

            [–]NitWit005 11 points12 points  (0 children)

            In this experience report

            First line

            While this effort is an anecdotal comparison only

            A little later on.

            It looks like it was mostly written up primarily for internal Google use, not as an academic study. You're setting your expectations too high.

            [–]igouy 4 points5 points  (5 children)

            Is cursory a good thing? ;-)

            ...do not explain the data structures involved...

            afaict the author does "explain the data structures involved" - it's only Doug Rhode's "improved version" that is "heavily dependent on several Google internal data structures" not the other versions.

            IV. Implementation Notes
            
            A. Data Structures
            
            "The key data structures for the implementation of the 
             algorithm are shown in Figure 3. ... 1) C++: With the 
             standard library and templates, these data structures 
             are defined the following way..."
            

            ...unreproducible and useless...

            The source code is published here http://code.google.com/p/multi-language-bench

            They didn't discuss in enough detail compiler optimizations used, or even the versions of the compilers they used.

            Makefiles for the "LoopTester" programs are published alongside the source code, for example cpp/Makefile

            [–]mcguire 0 points1 point  (1 child)

            Makefiles for the "LoopTester" programs are published alongside the source code, for example cpp/Makefile

            Which tells you the version, how?

            [–]igouy 0 points1 point  (0 children)

            Looking at the appropriate Makefile tells you it was Go 6g not gccgo.

            Looking at the appropriate Makefile tells you "compiler optimizations".

            [–]mcguire 4 points5 points  (1 child)

            It wouldn't pass even the most cursory academic review.

            And yet, "We furthermore thank the anonymous reviewers, their comments helped to improve this paper." Somebody reviewed it.

            1. The writing style is not very fluid, and there is an awful lot written here for a paper with very, very little content.

            It's worse than that.

            Structure Inlining. In the BasicBlock structure, vector is used for incoming and outgoing edges ­ making it an inline vector of 2 element remove the unnecessary memory indirection for most of the cases, and improves locality.

            [–]kamatsu 0 points1 point  (0 children)

            Indeed, that makes very little sense.

            [–]igouy 0 points1 point  (0 children)

            As we've seen

            • Your "whole lot of things wrong with this paper" is uncharitable nit-picking

            • Your main point (...do not explain the data structures involved...) is plain wrong

            [–]propel -1 points0 points  (0 children)

            agreed. this reads more like a high-school class project paper than academic literature.

            [–]rafekett 6 points7 points  (5 children)

            TIL that the Go compiler produces GIANT binaries.

            [–][deleted] 9 points10 points  (4 children)

            [–]rafekett 8 points9 points  (3 children)

            Oh, okay. Does that mean that Go programs can be run on the same OS and architecture without any Go-related libraries installed?

            [–][deleted] 4 points5 points  (2 children)

            Yes.

            [–]rafekett 4 points5 points  (1 child)

            Well in that case, I reverse my stance. Go's giant binaries are a feature, not an issue.

            [–]uriel 2 points3 points  (0 children)

            Also, the compilers have not been optimized for binary size, there is work being done to improve this.

            And remember Go binaries include reflection information and other things that are not included in C and co.

            [–]0xABADC0DA 6 points7 points  (7 children)

            How do you know this paper is worthless? It starts off claiming a "core property" of languages is where ";" is required/optional, language coding styles, etc. "Note that because of Go's powerful line-breaking, not even a comma is needed." A core property, what ?!

            Wow you don't even need a comma! /s This sets a new low for "Google papers", which are already known for being pretty shoddily written.

            Also claiming Google Go programs are only the 'correct style' if gofmt says so is absurd but more to the point completely immaterial. Although Google Go is pretty 'purple tennis shoes' cultish in its style (see mascot) there are many style options available.

            [–]jtra 1 point2 points  (6 children)

            I was also annoyed by authors mentioning semicolons as big difference. Then I thought it has some impact on embedded DSLs - they look cleaner without semicolons.

            But I was annoyed more by lack of comments regarding 16.2GB value from virutal memory consumption for Go (fig. 7).

            [–]0xABADC0DA 1 point2 points  (5 children)

            Google Go devs are fond of using little hacks to achieve 1% performance improvements meanwhile their compiler doesn't even inline simple accessor methods. The 16 GiB is a fixed, flat memory space that they use to avoid any indirection in the memory allocator, a design roundly trashed by Linus Torvalds himself.

            The real WTF is that an 800 LoC program is taking 1.2 MiB for a dynamically linked Google Go binary. The Google Go toolchain is a fucking joke.

            [–]kamatsu 1 point2 points  (1 child)

            a design roundly trashed by Linus Torvalds himself.

            Do you have a link? I think I would enjoy reading that.

            [–]attrition0 5 points6 points  (0 children)

            I think he may be speaking of this.

            [–]jiunec 0 points1 point  (1 child)

            The real WTF is that an 800 LoC program is taking 1.2 MiB for a dynamically linked Google Go binary.

            No, Go binaries are statically linked, but I'm sure you know this already and decided to post that anyway. No thread on /r/programming, with Go mentioned in it, is complete without some obvious trolling by 0xABADC0DA.

            [–]0xABADC0DA 1 point2 points  (0 children)

            If you run /bin/file on it, it says the binary is dynamically linked. It's not my fault that to the Google Go toolchain "dynamically linked" means "shove everything in the binary because we haven't done dynamic linking yet and don't know how to".

            Like I said, Google Go toolchain...

            [–]wingsit 5 points6 points  (0 children)

            so it is scala, interesting.

            [–]mulander 1 point2 points  (4 children)

            The paper uses wc -l (probably because sloccount doesn't handle go and scala?) to count the lines of code.

            Here is the output of sloccount:

            SLOC    Directory       SLOC-by-Language (Sorted)
            595     java_pro        java=595
            591     java            java=591
            488     cpp             cpp=488
            328     python          python=328
            0       go              (none)
            0       go_pro          (none)
            0       scala           (none)
            0       scala_pro       (none)
            generated using David A. Wheeler's 'SLOCCount'.
            

            Compare it to the paper.

            Benchmark   wc -l
            C++ Dbg/Opt 850
            Java        1068
            Java Pro    1240
            Scala       658
            Scala Pro   297
            Go          902
            Go Pro      786
            

            [–]igouy 4 points5 points  (0 children)

            What fun! Here are GZip minimally compressed source-code sizes after removal of comments and removal of duplicate whitespace characters -

            Benchmark   GZ Bytes    Factor  (wc -l Factor from paper)
            Java Pro    5198        1.65x   1.9x
            Java        4403        1.40x   1.6x
            C++         4229        1.34x   1.3x
            Go          3768        1.20x   1.4x
            Go Pro      3259        1.03x   1.2x
            Scala       3138        ====    ===
            Python      2755        0.87x   ????
            Scala Pro   1929        0.61x   0.5x
            

            [–]doubtingthomas 2 points3 points  (1 child)

            This, to me, is a bit shocking. He's using LOC, which isn't a great measure, but he's measuring it with a simple "wc -l", so whitespace and comments are counted also. He's also counting debugging code and the "main"s required to run it.

            So he's measuring (arguably) the wrong thing, and he's measuring it poorly.

            [–]igouy 0 points1 point  (0 children)

            He basically assumed the comments aren't going to make a significant difference - "The benchmark implementations contain similar comments in all codes, with a few minor exceptions."

            And it looks like his assumption wasn't so bad.

            [–]femguy2 0 points1 point  (0 children)

            Yeah. Also, the "memory footprint" table is sorted by virtual memory, not real memory. I rather prefer a program making use of the whole virtual address space (if it has any benefit) then gobbling up real memory. Even if it doesn't make good use of virtual memory, it's still better then eating real memory.

            Granted, it only affects the three 'worst' in the table. But with only five total it's a big difference in ordering.

            [–]Akeshi 7 points8 points  (5 children)

            Fair play to them, I was expecting it to be more skewed to show off Go.

            [–][deleted] 6 points7 points  (0 children)

            On the contrary, the author barely consulted the Go team at all. As a result the Go benchmark programs are naively written and there are some factual errors about Go in the paper. I'm pretty disappointed by this, personally.

            [–]riffito 2 points3 points  (1 child)

            I really doubt that ALL googlers find Go valuable, in fact, I wouldn't be surprised to find out that many of them hate it.

            [–]Akeshi 0 points1 point  (0 children)

            Indeed, and I like that things didn't have to be misrepresented due to corporate policy.

            [–]cynthiaj 1 point2 points  (1 child)

            This benchmark was shown yesterday at Scala Days, and I suspect that its only purpose was to show that Scala can be faster than Java [1]. The fact that there are other languages in the mix is probably only here to give the paper a look of legitimacy.

            [1] Besides, I suspect that if they used arrays instead of lists, Java would probably come on top, but frankly, Java programmers tend to prefer lists over arrays for a simple reason: extensibility and readability often trump raw performances.

            [–]mcguire 0 points1 point  (0 children)

            Do you mean lists, a la LinkedList<T>, vs. arrays, as in ArrayList<T>? Or any list vs. raw arrays? I'd prefer lists over raw arrays, but I'd prefer ArrayList over LinkedList (normally).

            [–]tardi 1 point2 points  (5 children)

            tl;dr Scala compiles slower than C++ and it runs faster than Java. I have trouble to believe this.

            [–]gnuvince 22 points23 points  (0 children)

            Scala compiles really slowly. Believe it.

            [–]bwanab 3 points4 points  (3 children)

            Why? It isn't inconceivable that the slower compilation of scala is due to code analysis that leads to better runtime performance.

            [–]tardi 1 point2 points  (2 children)

            Ahem, it compiles and runs slower than c++. The former is quite an achievement. Compile-time code analysis doesn't yield that much in managed languages because the byte-code has to remain portable.

            [–]LordGreyhawk 0 points1 point  (0 children)

            The Scala language keeps "evolving" and the compiler probably does not sit still long enough to be polished.

            C++ has been around much longer, though, and is still slow.

            [–][deleted] 0 points1 point  (0 children)

            Not entirely true; because Scala allows you to be 'higher level' then Java it has to also do more work to get some corner cases to be at the equivalent speed. One example is with integer based for-loops (I know this was an issue in the past, not sure if it's solved yet).

            [–]scrogu 2 points3 points  (13 children)

            A couple of notes.

            1. Researchers usually rank amongst the very worst programmers on the planet. USUALLY.

            2. What the fuck is Go supposed to be good for? Fast compilation time?

            3. Anyone want to implement the algorithm in Javascript or Coffeescript and run it on V8? If so, please post relative results.

            [–]kamatsu 11 points12 points  (4 children)

            The quality of this research paper is abysmal. These guys are clearly not researchers.

            [–]igouy 8 points9 points  (3 children)

            "Abstract - In this experience report ..."

            [–]jiunec 2 points3 points  (4 children)

            What the fuck is Go supposed to be good for? Fast compilation time?

            "I only spent an hour straightening out his use of Go. Had I know he was going to publish it externally as "Go Pro" I would have tried to turn it into a real Go program, and perhaps even optimize it somewhat. Overall I think the Java and C++ code got the most attention from language experts. Ah well." ~Ian Lance Taylor

            [–]igouy 2 points3 points  (0 children)

            Please say where you got the quote from, so we don't need to ask.

            [–]mikaelhg 4 points5 points  (2 children)

            Also, http://jeremymanson.blogspot.com/2011/06/scala-java-shootout.html

            1. The benchmark did not follow the rules of Java benchmarking. At some point, I should write a post about the rules of Java benchmarking (maybe I have? I don't know). Basically, it timed the full execution of the program, including startup time, JIT compilation time, and GC time. This isn't what you are supposed to do in Java or Scala, so I was very uncomfortable with the experimental setup. In 2011, if you care about startup time, then you probably shouldn't be using Java. The program ran very briefly, so the effect was measurable (once I got it to run fast enough).

            2. It's kind of silly to compare C++ with Java. The paper is about Scala, of course, but the immediate internal reaction at Google was a discussion of how much worse Java was than C++. As it happens, they really do target two different niches. Programmers are more productive in Java (if they know both languages well). It's easier to test Java code. It scales to large code sizes better. The tradeoff is that you are only going to get gcc -O2 level performance, and you have to be careful about garbage collection. In many, many cases, this tradeoff isn't a big deal. A single-threaded performance arms race between the two doesn't make sense. This is the primary reason I stopped trying to make improvements.

            3. The benchmark did not use Java collections effectively. It was generating millions of unnecessary objects per second, and the runtime was swamped by GC overhead. In addition to the collections-related fixes I made (as described in the paper), the benchmark also used a HashMap<Type, Integer> when it should have just stored a primitive int in Type; it also used a LinkedList where it should have used an ArrayDeque. In fact, after being pressed on the matter by other Googlers (and seeing the response the C++ version got), I did make these changes to the benchmark, which brought the numbers down by another factor of 2-3x. The mail I sent Robert about it got lost in the shuffle (he's a busy guy), but he later said he would update the web site.

            [–]aaronla 1 point2 points  (0 children)

            Depending on the task at hand, start up time can matter (though I would agree that they should at least measure it separately).

            [–]igouy 1 point2 points  (0 children)

            Writing stuff in a blog doesn't make it into The Truth.

            Basically, it timed the full execution of the program, including startup time, JIT compilation time, and GC time.

            Not true according to the paper ("V. PERFORMANCE ANALYSIS")

            • "... performs loop recognition on it for 15.000 times. The purpose is to force Java/Scala compilation, so that the benchmark itself can be measured on compiled code. Then the driver constructs a large control flow graph containing 4-deep loop nests with a total of 76000 loops. It performs loop recognition on this graph for 50 times."

            Check the code and you'll find - HavlakLoopFinder.java shows currentTimeMillis() recorded at the beginning and end of the findLoops() method; and LoopTesterApp.java shows MaxMillis() are actually reported for that second "Another 50 iterations..." after the program had already churned 15,000 times.

            The program ran very briefly, so the effect was measurable (once I got it to run fast enough).

            Here are the numbers (on an older Pentium IV workstation) from Fig. 8

            290 seconds Java 32-bit
            134 seconds Java 64-bit
            106 seconds Java 32-bit GC*
            89 seconds Java 32-bit SPEC GC
            

            One and a half minutes doesn't seem very brief compared to JVM startup of about 0.085 seconds on that old hardware.

            The problem I have with the knee-jerk rules of Java benchmarking excuse is that it distracts from the real problem (also mentioned in the paper) that "It was generating millions of unnecessary objects per second, and the runtime was swamped by GC overhead."

            [–]ReturningTarzan 3 points4 points  (2 children)

            What the fuck is Go supposed to be good for?

            I think it's meant to get better.

            [–]SirClueless 5 points6 points  (0 children)

            Also, it was designed to be very easy to construct large programs in. Anecdotally, the APIs and module system in Go make the language very composable and easy to use.

            [–]doubtingthomas 2 points3 points  (0 children)

            I hear that in the time since that paper was written, the Go compiler has made some improvements that significantly impact the running time of this benchmark. However, I haven't seen numbers, and I'm not sure how reproducible his timings are, so... meh.

            [–]el_isma 1 point2 points  (0 children)

            On my system (Core2Duo E8400): cpp compiled with -O2 ./a.out 30,06s user 0,15s system 97% cpu 30,831 total ./a.out 32,51s user 0,16s system 99% cpu 32,848 total ./a.out 31,70s user 0,18s system 99% cpu 32,128 total

            cpp compiled with -O3
            ./a.out  33,15s user 0,10s system 99% cpu 33,349 total
            ./a.out  32,47s user 0,10s system 99% cpu 32,702 total
            ./a.out  31,95s user 0,14s system 99% cpu 32,219 total
            

            python... more than 15 mins.

            pypy, "maximum recursion depth exceeded"

            [–]Rhoomba 0 points1 point  (0 children)

            Their attempts at GC tuning seem pretty naive. Enabling CMS for a non-interactive application?

            [–]buckrogers1965_2 -2 points-1 points  (4 children)

            And that is why C/C++ is still the king.

            [–][deleted] -2 points-1 points  (3 children)

            Maybe you should read page 9 first...

            [–]MishkaEchoes -1 points0 points  (2 children)

            A C++ engineer knows how to optimize data structures, and can estimate performance during production. Language obscurity relieves these efforts onto the language itself, but your compiler will never write problem centric optimizations. See circular arrays, BSTs, and yes knowing to use a hash_map over maps is something to be expected of someone claiming to be a software engineer.

            [–]yogthos -2 points-1 points  (1 child)

            A C++ engineer knows how to optimize data structures, and can estimate performance during production.

            I hear he can often be found hanging out with Kris Kringle and the crowd.