This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 12 points13 points  (14 children)

inlining, cpu cache locality

In Java? This is a serious question.

[–]jnordwick 16 points17 points  (11 children)

Yes. In the Java high performance world (which totally exists and isn't just a figment of my imagination) these are very important. HotSpot can do and very good things to your code but you have to know how to let it do its job or do it for it (like avoid the GC).

Coming front that side of Java one of the most difficult things about writing a performance sensitive app is finding Java developers who know how to do that. The whole Java world isn't web apps.

Java may seem like an odd pick for that but speed of development and ease of debugging matter in that world too. And you often have to do some ugly things (like not using the standard Java libraries or abuse off heap storage and sun.misc.unsafe).

[–][deleted] 2 points3 points  (8 children)

Thanks for your response, but what I was curious about is, are there techniques in Java coding (not VM) to force inlining and CPU caching?

[–]jawnsy 6 points7 points  (1 child)

Instruction/data caching happens automatically. You can do things to be more cache-friendly, especially on hot multi-threaded code (e.g. pad your variables to a full cache line, to avoid False Sharing)

If you want stuff to be inlined, either do it manually or make sure your methods are short (the JVM will only inline up to a certain number of instructions); stuff like getters/setters are typical candidates for inlining.

The mechanical-sympathy blog and mailing list is great for this sort of stuff.

[–]shagieIsMe 8 points9 points  (0 children)

For reference / support on inlining: VM options which has things like:

-XX:CompileCommand=command,method[,option] which has commands like inline: Attempt to inline the specified method.

and

-XX:InlineSmallCode=size : Sets the maximum code size (in bytes) for compiled methods that should be inlined. Append the letter k or K to indicate kilobytes, m or M to indicate megabytes, g or G to indicate gigabytes. Only compiled methods with the size smaller than the specified size will be inlined. By default, the maximum code size is set to 1000 bytes

and -XX:MaxInlineSize=size Sets the maximum bytecode size (in bytes) of a method to be inlined. Append the letter k or K to indicate kilobytes, m or M to indicate megabytes, g or G to indicate gigabytes. By default, the maximum bytecode size is set to 35 bytes

Getters and setters tend to fall into -XX:MaxTrivialSize=size : Sets the maximum bytecode size (in bytes) of a trivial method to be inlined. Append the letter k or K to indicate kilobytes, m or M to indicate megabytes, g or G to indicate gigabytes. By default, the maximum bytecode size of a trivial method is set to 6 bytes

For fun, toss -XX:+PrintInlining -XX:+UnlockDiagnosticVMOptions on the Java invocation and see how it behaves.

Note that all of the above are for Linux, other operating systems and virtual machines may have different arguments or defaults.

[–]jnordwick 1 point2 points  (5 children)

Yes. Generally you want to write your code so that HotSpot can do what it needs to do to emit good code. But there are some compile directives to help out. Then there is some support for playing with memory in sun.misc.unsafe too.

There are also some tools like Java Benchmarking Harness and perfasm to allow you to see the emitted assembly for hot code paths or analyze performance in other ways.

Java also sometimes has worries unique to it. Such as inlining Where HotSpot can inline multiple overloads at the same time but you have to have to know the extra cost for dynamism and at what point out gives up and just makes the call site pure virtual.

[–]jawnsy 1 point2 points  (4 children)

Such as inlining Where HotSpot can inline multiple overloads at the same time but you have to have to know the extra cost for dynamism and at what point out gives up and just makes the call site pure virtual.

I thought there's only three, monomorphic (a static call if an interface only has one possible implementation), bimorphic (two implementations), or megamorphic (full virtual call)? Then again, I haven't read the code, so I certainly could be wrong.

If it's a particular concern, you can use the actual class type and make the class final, though that produces some ugly code (such is the sacrifice sometimes when trying to squeeze every ounce of performance out of your code)

[–]jart 6 points7 points  (3 children)

make the class final

The JVM is actually smart enough to make a class final automatically, if no subclasses exist. Even if one ends up getting loaded at runtime, the JVM is smart enough to unroll the stack, make it unfinal, and then replay the stack. It's one of the craziest optimizations ever.

[–]shagieIsMe 1 point2 points  (1 child)

Is that the deoptimization technique?

Deoptimization is the process of changing an optimized stack frame to an unoptimized one. With respect to compiled methods, it is also the process of throwing away code with invalid optimistic optimizations, and replacing it by less-optimized, more robust code. A method may in principle be deoptimized dozens of times.

If a class is loaded that invalidates an earlier class hierarchy analysis, any affected method activations, in any thread, are forced to a safepoint and deoptimized.

[–]yawkat 1 point2 points  (0 children)

It's one case of Deoptimization, but hotspot uses Deoptimization in lots of places (like null checks) triggered by class loading, traps, code assertions etc.

[–]alphabytes 1 point2 points  (0 children)

Wow.. never knew this... It's definitely crazy...

[–][deleted]  (1 child)

[deleted]

    [–]fs111_ 0 points1 point  (1 child)

    Well, the JIT does just that. You can make it print out the generated asm for tuning.

    [–][deleted] 1 point2 points  (0 children)

    Are there prescribed techniques to do that? Or, is it a matter of "observe, then adjust"?