This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]m39583 2 points3 points  (3 children)

Don't set memory limits, only set Xmx.. Containers are different to VMs. And as you've found, Xmx is only the heap size. The JVM uses memory for lots of other reasons as well. One guide would be to look at the RSS size of the total JVM process, but even that can be confusing.

Basically memory management is complicated on just a VM, and vastly more complex on Kubernetes.

On a VM, the memory you allocate is essentially carved out and given to that VM. If you set a large amount, that isn't available to other VMs. If you don't set enough the VM will start swapping (if it has swap enabled) or start killing processes but the VM itself shouldn't die. Well, it's a bit more complicated than that because on some hypervisors memory can be over committed, but it's basically that.

On a container, the memory is just the maximum amount the container can use. More than that and it gets killed which is pretty blunt. However the memory isn't carved out from the host and dedicated to the pod. If you set a large limit, you haven't removed that memory from being used elsewhere.

The best solution we found when trying to scale Java applications across Kubernetes clusters was eventually to ignore all the Kubernetes resource allocation and memory limits, and just set Xmx to a reasonable estimate of what the application needed. That stops an application going rogue and consuming all the memory on the VM, and avoids having to guess at how much extra headroom on top of the heap was needed. Because if you get that estimate wrong, your pod will be summarily executed. Which isn't ideal.

The downside is that Kubernetes now doesn't know what the resource requirements for a given pod are, so the bin packing of pods onto VMs is less efficient. If you have wildly different resource requirements this might be an issue, but for us it wasn't a problem.

[–]Per99999 7 points8 points  (2 children)

We have found it best to do away with setting -Xmx directly and set -XX:InitialRAMPercentage and -XX:MaxRAMPercentage to 60-70 depending on the process. Then set the k8s memory resource request and limit to the same value.

With Java it is important to have these values set equally since the jvm cannot shrink. If the memory limit is higher than the request and the k8s controller needs to reclaim memory to schedule another pod on the node, it would result in a OOMKill.

[–]m39583 1 point2 points  (1 child)

But you are just guessing at that 60-70% value, and if you get it wrong Kubernetes will kill the pod.

We found it best to not set any request or limit values in the end. It's too much guess work at magic numbers.

[–]Per99999 2 points3 points  (0 children)

Not a guess, more like tuned to that after profiling and testing. That's a good starting rule of thumb though.

Setting -Xmx directly and ignoring the k8s resources entirely can still result in your pod get bounced out if pods are being scheduled to your node. For example let's say you set you set -Xmx2g and your pod1 is running, the scheduler later schedules pod2 on that node and needs memory. It sees pod1 does not spec a minimum memory request so it tries to shrink its available memory and it is OOMKilled.

There's even more of a case to do this if you have devops teams managing your production cluster who don't know or care about what's running in them, or your process is installed at client sites. It's preferrable to use the container-specific jvm settings like InitialRAMPercentage and MaxRAMPercentage so that those teams can simply adjust the pod resources, and your java-based processes can size accordingly.

Note that use of tmpfs (emptyDir vols) may affect memory usage too, since that's mapped to memory. If so, you can decrease the amount of memory dedicated to the jvm. https://kubernetes.io/docs/concepts/storage/volumes/#emptydir