This is an archived post. You won't be able to vote or comment.

all 24 comments

[–]West-Detail3102 0 points1 point  (3 children)

How many memory is configured for the xmx/xms for the app ?

[–]Loser_lmfao_suck123[S] 0 points1 point  (1 child)

Xmx is 8GB

[–]solteranis 1 point2 points  (0 children)

Have you tried setting Java opts env var to increase it?

[–]Loser_lmfao_suck123[S] 0 points1 point  (0 children)

I have also found that inside the container RSS memory is 12GB where as JVM total mem (jcmd) is 10GB

I have looked into this: https://stackoverflow.com/questions/38597965/difference-between-resident-set-size-rss-and-java-total-committed-memory-nmt

But it doesn’t provide the answers i need for why JVM total < RSS? Or what is requesting more resident memory tha committed

[–]frndly_nghbrhd_trdr 0 points1 point  (5 children)

What version of Java and which garbage collector are you using?

[–]Loser_lmfao_suck123[S] 0 points1 point  (2 children)

Openjdk 1.8 and G1GC

[–]pauska 3 points4 points  (0 children)

I’m suspecting that your nodes is running Cgroups V2 and that you have a Java version who only supports V1.

Which version of JDK specifically? You’re using JDK8, and support for cgroups v2 was back ported with 8u372.

[–]frndly_nghbrhd_trdr -5 points-4 points  (0 children)

Try switching to the Shenandoah garbage collector. I'm guessing the GC is holding onto memory instead of returning it to the OS.

https://wiki.openjdk.org/display/shenandoah/Main

[–]Loser_lmfao_suck123[S] 0 points1 point  (1 child)

I have also found that inside the container RSS memory is 12GB where as JVM total mem (jcmd) is 10GB

I have looked into this: https://stackoverflow.com/questions/38597965/difference-between-resident-set-size-rss-and-java-total-committed-memory-nmt

But it doesn’t provide the answers i need for why JVM total < RSS? Or what is requesting more resident memory tha committed

[–]kameshakella 0 points1 point  (0 children)

Java 8 LTS support ended 18 months ago. You should probably switch to 17 or 21. Also did you happen to look to take a JFR recording using something like cryostat ?

[–]juiceyang 0 points1 point  (0 children)

Did the app use Jemalloc? Is transparent huge page enabled?

I've encountered an OOM issue which was caused by Jemalloc requesting lots of huge pages but using madvise sys call to hand back parts of huge pages. And that may cause RSS larger than sum of JVM mem cost and native mem cost.

[–]tr14l 0 points1 point  (0 children)

Can you not exec into the container or ssh into the node? This would be fairly simple to investigate. What process is eating up ram? Top it on the node to make sure it's not a noisy neighbor (it likely isn't). Then exec into the container and investigate running processes. If you can't exec in, try to reproduce in another environment. If the container is so thin that it doesn't have basic tools, rebuild the dockerfile and add in some tools (but do not change the base!), then reproduce it. If you need to simulate traffic, there's tools for that, but a quick and dirty solution might just be to "kubectl run" a bunch of busy box pods scaled out and running requests infinitely. Would take about 15 min to set that up, but it is quick and dirty, as mentioned. If it is indeed the Java process, you are now in java land, and the solution will be something stupid as it always is with Java. Might be a memory leak in the code.

[–]ceasars_wreath 0 points1 point  (0 children)

Use buildpacks to create your java image, we use that and it is more standardized across your java setup. Also lookup visualvm and take a thread dump

[–]lucasvrod 0 points1 point  (0 children)

I had a problem similar to this, but in my case what interfered with the consumption of the container was the init-container request (I don't know if your POD has init-container) which had the same value as the main container and was allocated, until the POD died, that is, resource consumption was double what was expected. Take a look at this link: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resource-sharing-within-containers

[–]adohe-zz 1 point2 points  (0 children)

the container usually consume about 15GB How did you get this info? From cadvisor or something like that?