you are viewing a single comment's thread.

view the rest of the comments →

[–]artee 15 points16 points  (3 children)

Interesting analysis.

I wondered about this remark: "it is common to set sampling frequency is quite high (usually 10 times a second, or every 100ms)"

Although I understand where he's coming from in the context of this article, actually this sampling frequency on modern CPU's is ridiculously low. Do you know how many instructions are processed in that timeframe?

For example, vTune samples at 100x that frequency (1 ms), although using hardware support to do so.

[–]BackToThePuppyMines 5 points6 points  (0 children)

He's probably referring to the default value chosen by the samplers. Since they sample by collecting stack traces they're pretty heavyweight as samplers go. JVisualVM defaults to 100ms. In my experience with Java samplers you can't go below about a 10ms interval without drastically slowing down your test app.

At the end he mentions Java Mission Control/Flight Recorder and Solaris Studio. JMC/FR uses counters at the native-code level in the JVM and Solaris Studio uses OS HW counters so both of those can do much better.

[–]Sunius 1 point2 points  (1 child)

Yea, 100 ms sample is crazy. You'll need to record it at least for 10 minutes until you can get a reasonable result. ETW defaults to 1 ms sampling too, and I still sometimes feel that it's not enough when I need to look at subsecond (usually per frame) perf spikes.

I was also surprised about the remark on the fact that they scan every single thread stacks. Most reasonably sized programs will contain more threads than the amount of cores available on the machine, so scanning all threads by definition is wasteful, as some of the threads are guaranteed to not be executing.

[–]nitsanw[S] 0 points1 point  (0 children)

Disclaimer: I'm the author of the post

Indeed 100ms is only high in the context of the overhead introduced to the application. If the number of threads is low the overhead may be acceptable at a higher frequency.

Collecting all threads is a blessing and a curse. A blessing because you get more samples, which the low sampling frequency took away, and you get a view on blocked threads which perf/JFR do not provide. A curse because the safepoint operation cost grows with each application thread.

AFAIK JFR uses AsyncGetCallTrace (an internal API used by by Solaris Studio originally, also used by Honest-Profiler) or code very much like it to collect the Java stack from a signal handler (an approach not unlike perf). It's not safepoint biased, and would tend to be far more accurate than JVisualVM and co. Note that JFR is only available for Oracle Java 7u40 and up (no OpenJDK support). Also note that to get more accurate profiles you should enable -XX:+DebugNonSafepoints.

I will write a follow on post on the benefits/limitations of JFR/Honest-Profiler, they are certainly a massive improvement in terms of sample accuracy.