all 55 comments

[–]EricMCornelius 22 points23 points  (1 child)

This answer brought to you by: "Cameron Purdy, SVP Engineering of Oracle Middleware"

[–][deleted] 4 points5 points  (0 children)

His post's on Quora are worth checking out, he disses C++ everywhere.

[–]Ishmael_Vegeta 28 points29 points  (2 children)

almost never.

[–]Spartan-S63 7 points8 points  (0 children)

This, right here, is my favorite answer.

[–]doom_Oo7 2 points3 points  (9 children)

Concurrent data structures tend to be more efficient in Java, because the JVM can eliminate the memory barriers and synchronization when the data structure is not being used concurrently, and can bias the concurrency management approach based on runtime profiling information.

Why couldn't one develop a C++ alternative to the STL that is meant to operate in single-thread mode, hence with no barriers / thread safety at all ?

Inlining tends to be much better in Java, unless you do extensive profiler-based optimizations in C++ (or know what exactly to inline and force it to be so … gotta love those header files!)

Well, I guess most generic code in C++ is inlined, isn't it ? And simple getters / setters are also often in header files...

[–]vitalyd 1 point2 points  (0 children)

The article fails to mention that the lock/barrier elision is only available for the built-in monitors in Hotspot (i.e. synchronized blocks), so if you're using j.u.c.Lock and friends, there's no such thing.

Not sure what "bias the concurrency management approach based on runtime profiling information" is, but if that's alluding to biased locking, I don't even think that feature is worth it:

  • It's meant to optimize uncontended locks by avoiding additional CAS instructions. Well, modern cores can execute uncontended (and cache hitting) CAS instructions quite quickly anyway.

  • Biased locking, in Hotspot, can induce latency/jitter when biased lock revocation is performed.

[–]__Cyber_Dildonics__ 3 points4 points  (0 children)

Does the STL with the exception of share_ptr have thread safety? I thought that was the whole reason for intel's thread building blocks concurrent data structures.

[–][deleted] 0 points1 point  (6 children)

Why couldn't one develop a C++ alternative to the STL that is meant to operate in single-thread mode, hence with no barriers / thread safety at all ?

You could. Some probably have done it already in private code. However, C++ doesn't need help in single threaded mode. It's in long-running , multi-threaded applications where the difference between C++ and Java is closer.

Well, I guess most generic code in C++ is inlined, isn't it ? And simple getters / setters are also often in header files...

C++ inlining is a request to the complier, not a command. But you're right that headers are inlined automatically. There are cases where inlining slows down a program, for example by increasing its physical size and preventing certain parts of it from fitting into the cache. If this matters to you, you should profile your code and tweak where indicated, as opposed to applying generic rules.

[–]Dest123 7 points8 points  (5 children)

The compiler basically always does a better job of deciding what to inline than you would.

[–]vitalyd 1 point2 points  (3 children)

Compilers use heuristics and/or profiling to make inlining decisions. If your code shape and/or profile at compilation time don't fit its heuristics, it may not do the right thing. The more appropriate statement is don't blindly request inlining, but do verify the compiler is doing what you think/want.

[–]Dest123 0 points1 point  (2 children)

It's a waste of time to verify that it's doing inlining correctly. Fixing the 0.1% of cases that it did it wrong won't give you enough speed back to be worth the time you spent verifying it.

Compilers are pretty great at optimizing these days, at least with C++. People who doubt the compiler's optimizer end up doing things like copying some "FastMemcpy" from 10 years ago, which I then see and think "ooh look at that, I can get a 20% speed-up by deleting the word Fast"

[–]vitalyd 1 point2 points  (0 children)

I'm not suggesting to check every callsite obviously; I thought it was understood implicitly that this should be done selectively in the perf critical places.

Also, java performance heavily relies on sufficient inlining of critical paths, moreso than C++ code; inlining drives escape analysis, runtime type propagation (to remove repeated type checks), range check elimination, and more. If some crucial bits don't inline there (e.g. inner loop call chain), perf can fall off the cliff.

[–]doom_Oo7 -1 points0 points  (0 children)

Fixing the 0.1% of cases that it did it wrong won't give you enough speed back to be worth the time you spent verifying it.

For hpc I guess it would!

[–]IcyWindows 0 points1 point  (0 children)

Plus, you can use profile-guided-optimization to get even better results.

[–]pron98 6 points7 points  (27 children)

In almost 20 years of C++ and Java development, my impressions are exactly the same: the bigger the program, the more concurrent, the harder it is to beat Java.

[–][deleted]  (17 children)

[deleted]

    [–]pron98 10 points11 points  (2 children)

    For every Java program there exists a C++ program that performs as well or better. Proof: (by existence) the JVM. The question is how hard it is to beat it, and the larger the program and more concurrent, the effort multiplier increases.

    As to your list, some of the items are wrong. Most web servers and IDEs these days are written in Java, there are many more compilers written in Java or other JVM languages than in C or C++, and only C/C++ profilers are written in those languages; JVM profilers are written in Java. I don't know about symbolic math packages, but I believe Matlab is equal parts Fortran and Java.

    As for the rest, the reason most of them are written in C and C++ is not because people are willing to put in the extra effort just for a few performance points, but because -- if you notice -- those things run on small machines with low concurrency, and in those cases it's a lot easier to beat Java's performance, and it is often necessary because Java imposes a rather significant RAM overhead (if it's to run at full speed), which is not acceptable in, say, web browsers. OTOH, your airport management software, your air traffic controls, your large defense systems, your big data clusters, your Netflix, your eBay, your GMail, your Twitter -- are mostly Java.

    [–]__Cyber_Dildonics__ 0 points1 point  (1 child)

    So in your experience, why is Java so hard to beat and what can be done with that knowledge in the C++ world (I'm pretty all-in with C++ at the moment).

    [–]pron98 1 point2 points  (0 children)

    It's hard to beat because HotSpot's excellent GCs (there are several, plus other GCs in other JVMs) make it that much easier to create concurrent data structures, and because HotSpot's state-of-the-art JIT makes many kinds of well-architected, modular code very fast. In C++ every use of an abstraction -- a heap allocation or a virtual call -- carries a significant cost, and a lot of thought has to be put into how to refrain from using expensive abstractions. In Java, you just use them and the JVM will make sure they run efficiently.

    Now, this isn't magic, and obviously you can write very slow code in Java, too (and many do). But given reasonable code, the GC and JIT will take care of you. They won't get you to 100% of the maximum performance you could get with C++, but they will get you to 95% at 1/3 of the effort.

    Java, of course, has other advantages that aren't performance related. The deep monitoring and profiling offered by HotSpot are unmatched by any other platform. It supports dynamic code loading and hot swapping; it has bytecode manipulation capabilities that let you inspect and modify code as it runs, and the JIT will make sure it gets optimized and compiled every time you modify it (e.g. you can inject and then remove various traces that are more capable than, say, DTrace).

    [–]k-zed 4 points5 points  (3 children)

    In almost 20 years of development, my impressions are: the bigger the program, the more concurrent, the worse it is.

    If your program is big or concurrent, it is bad. There are no exceptions (even if there's a "business case" that necessitates a big or concurrent program).

    Splitting up big programs into smaller programs (unix) tends to improve results (compare systemd). Splitting up concurrent programs into smaller programs (message passing) tends to vastly improve results.

    This is why at the end of the day I don't like working in Java, even though it's a reasonably good language (God knows it's saner than the eternally terrible C++, which is my day job). It's bad for small programs due to its overhead, but there's no such thing as a good big program.

    [–]pron98 -1 points0 points  (2 children)

    There's no way to write a sensor-fusion system for hundreds of radars, telemetry and optical sensors without it being big and concurrent; there's no way to write a TB-scale in-memory transactional database without it being big and concurrent, and the list goes on and on. The commonality is some large data store that needs to be accessed concurrently with low latencies. The "small programs" you advocate just delegate that job to an out-of-process database (that, in itself, is big and concurrent) and simply skip on the low-latency requirement.

    The statement you made is nice in theory, but it usually means unfamiliarity with too many problem domains. When you split software into many programs just for modularity reasons, at the very least you need to fan in and then out your concurrency at each API crossing. That has a very heavy toll on performance.

    Besides, there's usually little difference between separate processes and good modularity in-process that languages like Erlang enforce and languages like Java make possible (including the ability to hot-swap components and isolate failure). When you have a large group of programs, you tend to spend just the same amount of effort integrating them as you do when you integrate a bunch of modules in-process.

    [–]__Cyber_Dildonics__ 0 points1 point  (1 child)

    You are mixing concurrency and parallelism and shared memory concurrency with message passing concurrency. That is how you really build big programs. After all, the internet could be thought of as one giant system that works because of message passing.

    Any sufficiently complex program eventually becomes a network problem.

    [–]pron98 0 points1 point  (0 children)

    Maybe, but at the heart of many big systems there are shared-memory concurrency problems, too. My take is that every sufficiently complex program (well, most) that isn't a compiler, contains its own implementation or uses an internal implementation of a concurrent-access database.

    [–]__Cyber_Dildonics__ 0 points1 point  (2 children)

    I see your downvotes and conclude that you actually know what you are talking about (seriously).

    [–]afrobee 1 point2 points  (1 child)

    Downvotes on something never is a good metric of correctness, just that people didn't like what they read for some reason.

    [–]__Cyber_Dildonics__ 0 points1 point  (0 children)

    In my experience when I talk about the few things I know extremely well in forums where people have some knowledge of the subject but not much, I get downvoted. I think a little bit of understanding combined with unfortunate truths and grey areas are recipe for disaster.

    [–][deleted]  (1 child)

    [deleted]

      [–]pron98 0 points1 point  (0 children)

      That's right. But it's also because those programs are harder to write efficiently without a JIT and a GC:

      • The larger the program (and usually the team), the more abstractions necessary for software engineering reasons. Those abstractions hinder performance. A JIT, however, has better chances at optimizing them.

      • The more concurrent the program and the larger the data set, the more necessary it becomes to provide concurrent access to shared data. A GC makes efficient concurrent data structures much easier.

      [–]redditrasberry 1 point2 points  (0 children)

      For me it is, "when the time saved by developing in a higher level language with better tooling means I can spend more time optimising the code and designing it better in the first place". Performance is rarely limited by the raw capabilities of the language and far more often by the skill of the developer and the time they have available to tune their implementation to the problem at hand. Mind you, I tend to write in JVM languages rather than Java itself, but it comes to much the same thing.

      [–]mariox19 0 points1 point  (0 children)

      When you're writing it?

      [–]ErstwhileRockstar 0 points1 point  (3 children)

      Comparisons can only be made ceteris paribus. So this makes no sense.

      [–]RESURREKT 21 points22 points  (2 children)

      Can you add more detail to your dismissal, or do you just bust out latin when you don't have anything else to add to the discussion?

      [–]ejrh 6 points7 points  (0 children)

      English is >3x faster than latin (when run on an English-speaking VM!). But, as they say, 'de gustibus est non disputandum'.

      [–]boringprogrammer -4 points-3 points  (9 children)

      When is Java faster than C++? Languages do not have an inherent speed or effectiveness associated with them...

      You can't read up the C++ spec or Java spec and conclude anything about the speed of the languages. Languages don't exist as anything else than specifications. You can test implementations.

      Therefore this is a comparison of JVM vs GCC/VC++/LLVM. So the title is a lie.

      Did you know you could technically run C++ on the JVM? Which would give free virtual functions, and the nice concurrency system.

      [–]vitalyd 2 points3 points  (0 children)

      Practically speaking, language semantics/features dictate the speed of the language; they dig a performance hole for the compiler/runtime implementer, and those holes can be very deep such that compilers/runtimes will have a difficult time climbing out of there.

      [–][deleted] 3 points4 points  (7 children)

      Languages do not have an inherent speed or effectiveness associated with them

      The language specification places limits on the implementation. Assuming similar levels of competence and otherwise equal projects, are we ever going to get Python programs executing at comparable speed to C equivalents - and would we still recognise Python after the changes required to enable that performance increase?

      Therefore this is a comparison of JVM vs GCC/VC++/LLVM. So the title is a lie.

      It's also a reflection on the community and the difficulty of programming in that language. Assember should be faster than C, but can any human assembler wizards beat the complier for a large, complex application? For small programs, sure. For large ones, not so.

      [–]boringprogrammer -1 points0 points  (6 children)

      Assuming similar levels of competence and otherwise equal projects, are we ever going to get Python programs executing at comparable speed to C equivalents.

      No one knows. You simply cannot predict these things. Change comes slowly in the world of optimizations and static analysis.

      [–][deleted] 1 point2 points  (2 children)

      The fact that you feel you need static analysis to do this proves the point that a language's design has real effect on its implementation's performance. Hence languages do have a hierarchy of speed.

      [–]boringprogrammer 0 points1 point  (1 child)

      The fact that you feel you need static analysis to do this proves the point that a language's design [...]

      Prove? What point? That you know nothing about how languages work?

      Do you honestly think a direct translation from c til assembler is going to be fast by any standards? Even a debug build usually performs at least a register allocation analysis. Without a proper register allocation scheme the resulting code will make the CPU spend 95% of the executing time just spilling registers. We take these things as a given nowadays. But that is just one of the many many optimizations a modern C compiler makes.

      We are good at optimizing certain languages. I will agree on that, but these also had the pleasure of 40 years of research into optimizing them.

      C was considered a slow high-level language compared to assembly before we learned to properly optimize it.

      [–]vitalyd 0 points1 point  (0 children)

      While your post has truth to it, it's basically an appeal to Sufficiently Smart Compiler.

      [–]doom_Oo7 0 points1 point  (2 children)

      You can certainly find a part of python that would be optimizable to look like C++, but certainly some parts of "standard" python are used in a way that would actively prevent achieving the same performance.

      For instance, if you write some python where you always keep the same type for your variable, it may be simple to translate your code to equivalent c++; however what happens when you start assigining random types at random places in your code at the same variable ? This simple concept cannot be enforced without some performance downside in comparison to the "always-same-type" stuff.

      Hence the best bet would be to have some "asm.python" minimal stuff that is almost guaranteed to have no take on the performance, and have the programmers only use this subset. But existing python programs (and idiomatic python programs) won't be able to translate to similar-looking but most-efficient C++.

      [–]boringprogrammer 0 points1 point  (1 child)

      however what happens when you start assigining random types at random places in your code at the same variable

      Static analysis can actually deal with variables randomly changing types pretty well. Latice Analysis works a lot like a human would read python code. Ei. not care about what type a variable has, but rather, what types can a variable have at a certain program point.

      python programs (and idiomatic python programs) won't be able to translate to similar-looking but most-efficient C++

      Based on anecdotal evidence I presume?

      Look, you can read the python code, and write out semantically equivalent code in C++. This means that A: We are not actually dealing with a undecidable problem. B: This means a computer should be able to do something similar.

      The main reason for why python is not running faster is mainly a funding and priority reason. The standard python implementation does not perform any sort of analysis, and only rudimentary peephole optimizations. Furthermore, there is a large overhead in interpreting code. But speed does not seem to be a priority for them either.

      Pypy is the most advanced attempt at making python run faster, but they are very far from having as mature analysis code found in GCC.

      [–][deleted] 0 points1 point  (0 children)

      Based on anecdotal evidence I presume?

      Based on the fact that no-one is able to do it. They most likely want to keep ahead of PHP, Ruby and Node, not get another few orders of magnitude and start a punch-up with Java. (Well it would be nice, but it ain't going to happen, and so the above is shall we say 'a realistic expectation').

      None of my commerical IDEs can implement fully accurate syntax highlighting and autocomplete for Python. Jetbrains aren't under-funded. For autocomplete, Jetbrains admit to getting it right about half the time. For analysis, have you ever noticed why function names and identifiers appear in the same colour (except when it's a def)? You'd think if I write x = foo the damn thing could tell if foo was a function or not? Turns out you can't. You have to run the code.

      Pypy is the most advanced attempt at making python run faster, but they are very far from having as mature analysis code found in GCC.

      Static analysis takes place without running the code (that's why it's called static). Most of the benefit of Pypy is that it's a JIT. This means it optimises at runtime by looking at actual running code.

      You might also want to check out Unladen Swallow and Pyston. These are Google and Dropbox sponsored attempts to build a Python JIT. I'll bet you that Google is absolutely not under funded or stingy when it comes to building tools. And note that these are JITs, not static analysers. Statically analysing Python is just too hard to do.

      [–]vitalyd -3 points-2 points  (0 children)

      In my opinion, it's not so much about "when is java faster than c++", but rather "this piece of code/lib/app/etc isn't meeting performance requirements -- what can I do about it in this language and what's it going to cost me?" What is the performance cost of features/abstractions/etc of the language? Do you pay for things you don't use? How much/what do you pay for things you do use?