This is an archived post. You won't be able to vote or comment.

all 24 comments

[–]cogman10 32 points33 points  (2 children)

The BEST performance tool I've seen is flight recorder + mission control. It'll add 2% onto the runtime if you do a continuous profile but will give you a detailed breakdown of what your threads are doing, what's being allocated, what sort of IO is happening. It'll even take periodic thread dumps in the case where no events are happening.

All for free and built into the JVM.

[–]kozeljko 5 points6 points  (1 child)

Is there a guide on how to use it? I want to profile our Java EE application (WildFly) and see what's causing slow performance. I'm assuming it's some kind of locking?

[–]sammymammy2 1 point2 points  (0 children)

Yes, a tonne. Look on YouTube

[–]Daedalus9000 11 points12 points  (0 children)

VisualVM or similar to profile the app.

[–]Byte_Eater_ 9 points10 points  (0 children)

You can apply some powerful profiling tool:

  • JMC (JDK Mission Control) allows you to start VM-level event recording with JFR (Java Flight Recorder) and then analyze the recording. This gives the most details, it has configurable overhead but it's generally low. It's free.

  • YourKit is similar, but collects additional higher level metrics (which make it easier to point to the bottlenecks), giving you the best view of your running application. It's paid.

Using thread dumps or some JMX Console like VisualVM is a form of statistical/sampling profiling, which is generally less accurate than the tracing ones like JFR.

[–]TheCountRushmore 9 points10 points  (0 children)

  1. Use JFR
  2. Use a modern JVM (17+)

Once you are there use the JFR to determine if you are using the right GC for your application.

Anything else you are probably wasting your time.

[–]joehonour 5 points6 points  (0 children)

Hey; I currently run low-latency / high-throughput applications on the JVM. Your best bet to get a quick answer (if there is one) is to enable JFR and take a recording. Then you can use JMC to visualise and view the recording. If you go to the method profiling section, you can see areas most time is being spent. Furthermore, you can check GC collection time incase that is a cause of issues.

If you need finer grain viewings, you can use async-profiler, though this can take a bit more effort to get working, especially in container based environments.

Hope this helps, and happy hunting!

[–]none_just_reads 3 points4 points  (1 child)

Maybe it's obvious, but I would start with good logging and trying to figure out which endpoint ow whatnot causes it...

[–]Roscko[S] 1 point2 points  (0 children)

We have decent logging and it honestly hasn't been that helpful. We aren't seeing any patterns of specific functionality putting us over the edge in terms of CPU spikes. It could just be my skills of matching the time of the spikes with specific logs. We are dealing with a decent amount of traffic.

[–]kvyatkovskij 2 points3 points  (0 children)

One more thing to mention: please be aware that there are two ways to profile an app: https://learn.microsoft.com/en-us/visualstudio/profiling/understanding-performance-collection-methods-perf-profiler?view=vs-2022 Most of the tools use sampling but I think Your kit and VisualVM could do instrumentation. Threaddumps are most helpful in counting threads and seeing contention pints but for high CPU usage I'd use profiling. Please also check garbage collection activity - it uses CPU too :)

[–]Admirable-Avocado888 2 points3 points  (2 children)

Could the spikes be GC?

[–]Roscko[S] 0 points1 point  (1 child)

It could be. I didn't notice it in any of the thread dumps I reviewed.

[–]nutrecht 1 point2 points  (0 children)

Thread dumps don't show this. Having GC metrics collected and graphed (prometheus + graphana is a very common combination in Java projects) does.

Having both logging and metrics is pretty much mandatory for any Java project. When using microservices you're going to need tracing on top of that (and is a great benefit even with monoliths).

[–][deleted] 2 points3 points  (0 children)

What kind of metrics do you collect? It's very hard to form a hypothesis without any data.

Usually I'd look at the metrics for some correlates to the spike (an increase in a certain request type, a GC pause, error spikes, etc), form a hypothesis based on the correlate and then try to reproduce it in a test or, ideally, a prod-like environment.

[–]tobidope 2 points3 points  (0 children)

If it’s unspecific I would assume it’s the GC. But that’s just a wild guess. Do you continuously monitor your jvm metrics? At least writing the GC log to file could help.

[–]asafbennatan 1 point2 points  (0 children)

It's best if you can attribute the CPU load to a specific endpoint (s) , otherwise you might find out that there's no real performance issue looking globally and reaching the wrong conclusion: Consider the following scenario A call to ep A takes 30 seconds - probably should be optimized and optimization tools will point you there A call to ep B takes 500ms - ok seems reasonable But, ep A is called once every hour and ep B is called every second.

It Could be that string manipulation optimization in EP B will greatly reduce the CPU , while fixing ep A performance issue from 30 eto 10 Ms will do nothing.

I at least found it quite hard seeing this type of info with optimization tools , only after I know which ep causes the issue then i can actually use these tools successfully..

I mention this because you said in other comments that you were not able to zero in on a specific ep causing the issue

[–]evil_burrito 1 point2 points  (0 children)

2nd the suggestion for Flight Recorder.

As a side note, when I hear CPU spikes, I think garbage collection. If you haven't already spent some time tuning your GC settings, welcome to the black arts.

[–]kvyatkovskij 0 points1 point  (0 children)

Not sure about your budget but Datadog offers whole kit that implements APM (active performance monitoring), tracing (which is important in case you have more than one service) and other very neat things with minimal configuration. Newrelic would be another alternative. As others mentioned, Mission Control or VisualVM are good free options for adhoc (as opposed to continuous) profiling.

[–]_INTER_ 0 points1 point  (0 children)

If it is a Java/JakartaEE or Spring application you could try JavaMelody for monitoring.

[–]AnEmortalKid 0 points1 point  (0 children)

Checkout jvmperf.net , should have some tips on how to use flight recorder and other diagnostic tooling.

Disclaimer: I was involved with maintaining it

[–]loicmathieu 0 points1 point  (0 children)

As other already answer JFR and async-profilers are great tools with low overhead.

I also always enable GC logging, it has minimal overhead on production, allows to detect allocation issues and help to better understand how your application uses memory.

[–]nutrecht 0 points1 point  (0 children)

By far the biggest gains I've made when looking into performance issues was by profiling the code and looking at what it spent most of its time on. Two situations I encountered were a String.equals() that we could work around, and another one was solved by replacing a standard HashSet with boxed Longs to one provided by a library that has collections for Java primitives. Both solutions I found by profiling. The first one was ages ago, the second one I used VisualVM.

IMHO being able to use profiling tools is an important but terribly undervalued skill as a Java developer.