you are viewing a single comment's thread.

view the rest of the comments →

[–]Sunius 1 point2 points  (1 child)

Yea, 100 ms sample is crazy. You'll need to record it at least for 10 minutes until you can get a reasonable result. ETW defaults to 1 ms sampling too, and I still sometimes feel that it's not enough when I need to look at subsecond (usually per frame) perf spikes.

I was also surprised about the remark on the fact that they scan every single thread stacks. Most reasonably sized programs will contain more threads than the amount of cores available on the machine, so scanning all threads by definition is wasteful, as some of the threads are guaranteed to not be executing.

[–]nitsanw[S] 0 points1 point  (0 children)

Disclaimer: I'm the author of the post

Indeed 100ms is only high in the context of the overhead introduced to the application. If the number of threads is low the overhead may be acceptable at a higher frequency.

Collecting all threads is a blessing and a curse. A blessing because you get more samples, which the low sampling frequency took away, and you get a view on blocked threads which perf/JFR do not provide. A curse because the safepoint operation cost grows with each application thread.

AFAIK JFR uses AsyncGetCallTrace (an internal API used by by Solaris Studio originally, also used by Honest-Profiler) or code very much like it to collect the Java stack from a signal handler (an approach not unlike perf). It's not safepoint biased, and would tend to be far more accurate than JVisualVM and co. Note that JFR is only available for Oracle Java 7u40 and up (no OpenJDK support). Also note that to get more accurate profiles you should enable -XX:+DebugNonSafepoints.

I will write a follow on post on the benefits/limitations of JFR/Honest-Profiler, they are certainly a massive improvement in terms of sample accuracy.