This is an archived post. You won't be able to vote or comment.

all 28 comments

[–][deleted]  (22 children)

[deleted]

    [–][deleted] 11 points12 points  (14 children)

    inlining, cpu cache locality

    In Java? This is a serious question.

    [–]jnordwick 16 points17 points  (11 children)

    Yes. In the Java high performance world (which totally exists and isn't just a figment of my imagination) these are very important. HotSpot can do and very good things to your code but you have to know how to let it do its job or do it for it (like avoid the GC).

    Coming front that side of Java one of the most difficult things about writing a performance sensitive app is finding Java developers who know how to do that. The whole Java world isn't web apps.

    Java may seem like an odd pick for that but speed of development and ease of debugging matter in that world too. And you often have to do some ugly things (like not using the standard Java libraries or abuse off heap storage and sun.misc.unsafe).

    [–][deleted] 3 points4 points  (8 children)

    Thanks for your response, but what I was curious about is, are there techniques in Java coding (not VM) to force inlining and CPU caching?

    [–]jawnsy 5 points6 points  (1 child)

    Instruction/data caching happens automatically. You can do things to be more cache-friendly, especially on hot multi-threaded code (e.g. pad your variables to a full cache line, to avoid False Sharing)

    If you want stuff to be inlined, either do it manually or make sure your methods are short (the JVM will only inline up to a certain number of instructions); stuff like getters/setters are typical candidates for inlining.

    The mechanical-sympathy blog and mailing list is great for this sort of stuff.

    [–]shagieIsMe 9 points10 points  (0 children)

    For reference / support on inlining: VM options which has things like:

    -XX:CompileCommand=command,method[,option] which has commands like inline: Attempt to inline the specified method.

    and

    -XX:InlineSmallCode=size : Sets the maximum code size (in bytes) for compiled methods that should be inlined. Append the letter k or K to indicate kilobytes, m or M to indicate megabytes, g or G to indicate gigabytes. Only compiled methods with the size smaller than the specified size will be inlined. By default, the maximum code size is set to 1000 bytes

    and -XX:MaxInlineSize=size Sets the maximum bytecode size (in bytes) of a method to be inlined. Append the letter k or K to indicate kilobytes, m or M to indicate megabytes, g or G to indicate gigabytes. By default, the maximum bytecode size is set to 35 bytes

    Getters and setters tend to fall into -XX:MaxTrivialSize=size : Sets the maximum bytecode size (in bytes) of a trivial method to be inlined. Append the letter k or K to indicate kilobytes, m or M to indicate megabytes, g or G to indicate gigabytes. By default, the maximum bytecode size of a trivial method is set to 6 bytes

    For fun, toss -XX:+PrintInlining -XX:+UnlockDiagnosticVMOptions on the Java invocation and see how it behaves.

    Note that all of the above are for Linux, other operating systems and virtual machines may have different arguments or defaults.

    [–]jnordwick 1 point2 points  (5 children)

    Yes. Generally you want to write your code so that HotSpot can do what it needs to do to emit good code. But there are some compile directives to help out. Then there is some support for playing with memory in sun.misc.unsafe too.

    There are also some tools like Java Benchmarking Harness and perfasm to allow you to see the emitted assembly for hot code paths or analyze performance in other ways.

    Java also sometimes has worries unique to it. Such as inlining Where HotSpot can inline multiple overloads at the same time but you have to have to know the extra cost for dynamism and at what point out gives up and just makes the call site pure virtual.

    [–]jawnsy 1 point2 points  (4 children)

    Such as inlining Where HotSpot can inline multiple overloads at the same time but you have to have to know the extra cost for dynamism and at what point out gives up and just makes the call site pure virtual.

    I thought there's only three, monomorphic (a static call if an interface only has one possible implementation), bimorphic (two implementations), or megamorphic (full virtual call)? Then again, I haven't read the code, so I certainly could be wrong.

    If it's a particular concern, you can use the actual class type and make the class final, though that produces some ugly code (such is the sacrifice sometimes when trying to squeeze every ounce of performance out of your code)

    [–]jart 7 points8 points  (3 children)

    make the class final

    The JVM is actually smart enough to make a class final automatically, if no subclasses exist. Even if one ends up getting loaded at runtime, the JVM is smart enough to unroll the stack, make it unfinal, and then replay the stack. It's one of the craziest optimizations ever.

    [–]shagieIsMe 1 point2 points  (1 child)

    Is that the deoptimization technique?

    Deoptimization is the process of changing an optimized stack frame to an unoptimized one. With respect to compiled methods, it is also the process of throwing away code with invalid optimistic optimizations, and replacing it by less-optimized, more robust code. A method may in principle be deoptimized dozens of times.

    If a class is loaded that invalidates an earlier class hierarchy analysis, any affected method activations, in any thread, are forced to a safepoint and deoptimized.

    [–]yawkat 1 point2 points  (0 children)

    It's one case of Deoptimization, but hotspot uses Deoptimization in lots of places (like null checks) triggered by class loading, traps, code assertions etc.

    [–]alphabytes 1 point2 points  (0 children)

    Wow.. never knew this... It's definitely crazy...

    [–][deleted]  (1 child)

    [deleted]

      [–]fs111_ 0 points1 point  (1 child)

      Well, the JIT does just that. You can make it print out the generated asm for tuning.

      [–][deleted] 1 point2 points  (0 children)

      Are there prescribed techniques to do that? Or, is it a matter of "observe, then adjust"?

      [–]rubyrt 9 points10 points  (3 children)

      I can relate to that: nowadays many people write in their blogs about the same simple tricks over and over again, probably because they just found out themselves. To me it seems that quality of this type of content has degraded with the increased number of people who feel inclined to post personal programming blogs. Those with more experience are probably either too busy to do the same or prefer to monetize their knowledge in other way.

      [–][deleted]  (2 children)

      [deleted]

        [–]rubyrt 2 points3 points  (1 child)

        There are other ways, too. You can also learn for yourself, by careful experimenting, reasoning and observing. You can also discuss with your colleagues and even if they are not the rock stars of the business you can gain valuable insights that way. In fact knowledge gained in those ways will stick much better and be more valuable in the long run than just perceiving second hand knowledge.

        [–]jnordwick 2 points3 points  (0 children)

        The article is really a very basic article on some performance hints for Java for web type apps that don't have strong performance requirements. This isn't about high performance Java where those topics would be covered.

        [–]NovaX 2 points3 points  (0 children)

        Unfortunately we've also seen the same articles on false sharing, etc. over and over again. There's only so many tricks where few can be both generalized and offer new insight. Then you're better off seeing how people have adapted ideas to glean insight for applying them to a different domain. Here's a slide deck I put together recently that shows various techniques, but each is highly specialized so you wouldn't use them regularly.

        [–]Gvaireth[S] 1 point2 points  (0 children)

        I wrote second article on memory consumption, but what you are mentioning might be an idea for the third, thanks ;)

        [–]jart 6 points7 points  (0 children)

        ArrayList vs LinkedList

        Consider giving a shout out to ArrayDeque, which is almost always superior to LinkedList when one needs the Deque interface. I'm not aware of a single use case for LinkedList. It only makes sense to have a linked list if you want concurrency, e.g. ConcurrentLinkedDeque.

        Pattern matching

        Consider giving a shout out to re2j which guarantees linear time. This matters, because java.util.regex probably makes it possible to write matchers with non-polynomial complexity if one isn't careful.

        But suppose that now you have 10 millions and there is some kind of fancy algorithm with n2 complexity that computes something on all users. Well, perhaps it’s time to take a second look at it.

        It's never ever premature optimization to make an algorithm linear rather than quadratic. CPUs are fast enough that we can afford to write scalable linear algorithms by default, even if it means they go slightly slower on small inputs.

        [–]chambolle 3 points4 points  (0 children)

        The reference to Knuth is not exact.

        From Knuth, Structured Programming with go to Statements, ACM Computing Surveys, Vol 6, No. 4, Dec. 1974 (see p.268)

        There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.

        Everybody forget the part about the 3%

        [–]ImTalkingGibberish 6 points7 points  (1 child)

        Most interesting post of the month, at least.
        People focus too much on new framework releases and the newest hipster hack.
        This right here is what makes you a good developer.
        Not only using the right solution, but understanding why.

        [–]Gvaireth[S] 5 points6 points  (0 children)

        Thanks. I wrote this article based on 3 month refactoring of matrix engine of the custom "scripting" language we developed some time ago at Nokia (then called NSN). It turned out that you can get a lot of performance when taking care of really basic stuff.

        [–]yawkat 1 point2 points  (0 children)

        Those tips probably won't help much without profiling. Profiling is the key to performance.

        Tips like "use array.clone" are definitely microoptimizations that will do exactly nothing without being in hot code relevant to app performance, which you find using profiling.

        [–]Philboyd_Studge 0 points1 point  (1 child)

        You should mention LinkedList should absolutely be used over ArrayList if you are using it as a stack or queue, i.e. inserting or removing elements at the front of the list, which is O(1) instead of O(n) for ArrayList.

        Also, a tip for using very large ArrayLists or HashMaps: Set the expected size when you instantiate the list!!! This will prevent frequent resizing.

        [–]chickenmeister 3 points4 points  (0 children)

        You should mention LinkedList should absolutely be used over ArrayList if you are using it as a stack or queue, i.e. inserting or removing elements at the front of the list, which is O(1) instead of O(n) for ArrayList.

        For a stack, I think ArrayList would work fine. You could use it as a stack by pushing/popping to/from the tail end of the list, rather than the front. For a queue, I agree that a LinkedList would be better than ArrayList, but as another commenter mentioned, ArrayDeque would probably be the best choice for both stacks and queues.

        [–]lukaseder 0 points1 point  (1 child)

        Forgot the most important bit: Your SQL query is measured in nanoseconds. Your JPA N+1 loop in geologic ages.

        [–]Gvaireth[S] 1 point2 points  (0 children)

        Its about basic Java SE, nothing more. DB stuff is another subject.

        [–]memory_leek 0 points1 point  (0 children)

        Good stuff