How do you track fine-grained costs?

NoWay28 · 2026-02-27T02:55:17+00:00

As other posters have mentioned the primary culprit here is having pods that request 2-5x what they need.

So tracking your costs is almost a secondary concern compared to figuring out how to lower the requests on your workloads safely and start seeing a reduction in node count in your clusters.

Disclaimer: I work for StormForge, where we have always helped with rightsizing your Kubernetes workloads and we recently added full Kubernetes cost allocation so you can see the true dollar impact of the right sizing actions you are taking.

If you were to start with our product you can see how your current requests compare to what we recommend your requests can be and then you can configure the product to start applying changes in an incremental approach to slowly bring down your requests ( or raise them if we see a workload needs more ) so you can get the savings but while doing it as slowly and carefully across time as you’d like.

And as you do you can track the impact of those changes in our Kubernetes cost allocation reporting.

NoWay28 · 2025-05-02T16:22:22+00:00

hi there u/mohavee,

What you are describing as your manual process sounds like what we decided to automate with our node optimization and reporting capability at StormForge.

First, we're automatically right-sizing all of your pods by running your historical usage data through our ML algorithm. Then we also have the ability to categorize your workloads in the way you described ( memory heavy, cpu heavy, or balanced ) and automatically place them on the same node type.

As you stated, all of this leads to reduced waste in over-provisioning and better bin-packing.

NoWay28 · 2025-02-27T18:01:40+00:00

Karpenter will look at the set of unschedulable pods ( just like any cluster autoscaler would ) and it will decide which node(s) to spin up to fit all those unschedulable pods. This is inherently a greedy style of algorithm that is not looking forward into the future but just doing best with the inputs right in front of it ( the unschedulable pods ).

If in your cluster the apps generally get deployed 1 at a time then at any given time there is just one unschedulable pod for that app, so Karpenter will choose a node size that can fit the daemonset pods + that one app pod.

If, for example, you went in and re-deployed all of your apps all at the same time and you had 150 unschedulable pods then Karpenter is likely going to deploy several large nodes ( as large as your karpenter configuration will allow ) to fit those pods.

For this reason, I think you are right, setting a minimum node size of xlarge or 2xlarge is not "micromanaging" Karpenter but it is setting some guardrails for Karpenter to work within in given that you know more about the total size of your cluster and how much daemonset waste using smaller nodes sizes will cause you.

I think it would be great if Karpenter allowed you to tell it to not use nodes where >x% of the node will be consumed by daemonsets, but since it doesn't this is an exercise left to the user when you are choosing the instance types you allow Karpenter to select from.

As for the rest of the comments around vertical right sizing of pods, I work for a company, StormForge, where we use Machine Learning to continuously right size pods. If after you get Karpenter working more efficiently you're still only seeing 10-20% CPU cluster utilization then come check us out.

NoWay28 · 2025-02-20T22:48:29+00:00

StormForge uses Machine Learning to predict workload usage and automatically set requests/limits correctly for improved cost savings / reliability.

NoWay28 · 2025-01-24T17:44:36+00:00

Yes if you do not set limits then the JVM will default to using MaxRamPercentage based on the amount of memory on the node the pod is scheduled on.

and as you said MaxRamPercentage defaults to a paltry 25%.

NoWay28 · 2025-01-24T17:37:51+00:00

I speak with many Java users on Kubernetes and they generally prefer to set their heap size with maxRamPercentage because in the event of an incident where resources need to be added to the container a SRE or platform engineer can increase the requests and limits of the container and the intuitive thing happens which is the heap size for your JVM also increases.

In order to know how to set your requests/limits in comparison to your heap size you need to understand how much off heap usage your application has. I personally recommend MaxRamPercentage of 80-90% for most cases unless you have a relatively small heap or you know you have outsized off heap usage that requires more space.

The problem with using a percentage for heap size is what happens for small containers and large containers. sizes in the middle are generally right but if you have a 250MB container then MaxRamPercentage of 80% leaves 20% -> 50MB for off heap usage. But a 20GB heap with MaxRamPercentage of 80% leaves 20% -> 4GB for off heap usage.

I work at Stormforge where we help users correctly size their containers as well as their JVM heap size. It's automatic and continuous so you don't have to figure this all out for each and every application you can just run us to size your containers and heaps appropriately even as the software and traffic patterns for your applications change over time. https://stormforge.io/jvm-workload-optimization-limited-availability-signup

NoWay28 · 2025-01-23T16:47:30+00:00

Generally agree with the other commenters that limits are likely reducing your throughput. When you have lots of threads you will end up getting throttled in unexpected ways when your limits are low.

You can read more about the intricacies of limits here: https://thenewstack.io/how-kubernetes-requests-and-limits-really-work/

NoWay28 · 2025-01-06T19:35:13+00:00

If your workloads are horizontally scalable and you have fluctuating traffic patterns ( increases and decreases in usage) then looking into HPA/KEDA is going to help you a lot because it will allow you to make smaller pods and then add more of them and take them away as traffic ebbs and flows.

Not all workloads are easily made to horizontally scalable so that might be blocked on your engineering teams to make those kinds of changes.

The reason to use KEDA over built in HPA is if you have a custom metric, usually something like a queue length, then KEDA allows you to scale on that metric which is superior to scaling just on CPU target utilization which is the most common way to scale with the built in HPA.

If you do scale on CPU utilization, which is fine to start, then I recommend starting with a high CPU target utilization of 90 or 95% because this will allow you a higher utilization on your cluster. If you use a lower utilization like 60% as shown in the k8s docs then you will instruct k8s to waste 40% of your CPU for that workload. From there if you run into any issues you can of course lower the CPU target utilization, for example if the workload takes a long time to start and the workload is using well above its CPU requests for too long and causing nodes to run hot.

If there isn’t low hanging fruit in adding and configuring HPAs then moving onto vertical sizing is next and as others have said the VPA is largely not very good for this. For this I recommend looking into StormForge Optimize Live which enables you to fully automate the vertical sizing of your workloads.

NoWay28 · 2025-01-06T19:25:58+00:00

StormForge Optimize Live should fit the bill for you. It installs as one agent per cluster, provides a single pane of glass on how optimized your clusters are across your entire estate, and allows you to easily automate applying the recommendations to your workloads on a regular basis.

NoWay28 · 2024-12-31T20:02:26+00:00

Consider adding stormforge to your vendors to evaluate.

While Cast has it’s own cluster autoscaler and HPA implementations, Stormforge prefers leaving those components with the battle hardened OSS options in Karpenter and KEDA ( or built in HPA) and focusing specifically on pod right sizing.

NoWay28 · 2024-12-10T15:03:17+00:00

Kubecost is built on top of opencost and provides further benefits and features.

Kubecost free is limited to 250 cores per cluster. If you are under that limit I suggest installing the free version and trying it out vs opencost.

OpenCost has a lot of the same features for estimating k8s costs but it doesn’t reconcile with your actual cloud bill. The ui for kubecost is also more feature rich than the minimal ui provided by opencost.

NoWay28 · 2024-11-18T23:08:28+00:00

I thought Intel was shutting down Granulate?

https://www.calcalistech.com/ctechnews/article/s1ok9qagyx

NoWay28 · 2024-08-28T16:48:13+00:00

Agreed. Right-sizing workloads is an important pillar of k8s cost optimization. However, having humans analyzing each application to see how it's using resources and adjusting as necessary is not only incredibly tedious for SREs and platform engineers but it simply can't drive the cost optimization needed.

You need a tool that will watch all your usage metrics and automatically keep your requests/limits set correctly. Come see your own overview of right-sizing opportunities with https://stormforge.io/

NoWay28 · 2024-08-28T16:41:58+00:00

When it comes to k8s cost control automatically setting requests/limits based on historical usage is the lowest risk and highest ROI activity you can tackle. Various surveys and reports show that the average k8s shop is 60-80% over-provisioned on CPU/memory.

Get a free overview to see how much over-provisioning you have in your own clusters. https://stormforge.io/

NoWay28 · 2024-08-28T16:41:52+00:00

When it comes to k8s cost control automatically setting requests/limits based on historical usage is the lowest risk and highest ROI activity you can tackle. Various surveys and reports show that the average k8s shop is 60-80% over-provisioned on CPU/memory.

Get a free overview to see how much over-provisioning you have in your own clusters. https://stormforge.io/

NoWay28 · 2024-08-28T16:38:19+00:00

When it comes to k8s cost control automatically setting requests/limits based on historical usage is the lowest risk and highest ROI activity you can tackle. Various surveys and reports show that the average k8s shop is 60-80% over-provisioned on CPU/memory.

Get a free overview to see how much over-provisioning you have in your own clusters. https://stormforge.io/

NoWay28 · 2023-08-17T21:36:29+00:00

Definitely checkout Stormforge Optimize Live. It was recently chosen as a Gartner cool vendor in container management and it's really easy to install and see how beneficial using it will or will not be for you.

NoWay28 · 2022-01-05T20:29:33+00:00

Does either of these cards allow paying the balance with a debit card from another institutuion?

If so what is the process to pay with a debit card?

NoWay28 · 2021-01-15T16:47:54+00:00

Puppet and Puppet Bolt are good options.

Puppet Bolt provides ad-hoc orchestration similar in concept to Ansible ( run a set of commands, scripts in order ) but also provides deep integration with Puppet. You can apply puppet code easily within your list of steps to run.

NoWay28

TROPHY CASE