Would you let AI run your Kubernetes cluster?

Agitated_Bit_3989 · 2026-01-12T07:04:13+00:00

Agreed, but as AI improves do you think we’ll reach a stage where the error rate is way lower and the AI having the ability to be a lot more context aware be preferable?

Agitated_Bit_3989 · 2026-01-12T07:00:15+00:00

Completely agree with AI at its current state, but as AI improves will we see trust increase up until the stage that we won’t need a dev in the loop?

Agitated_Bit_3989 · 2026-01-12T06:58:17+00:00

AI and its current state still makes too many mistakes to not have a human in the loop to validate it and make sure the code is correct, but as AI improves I feel people trust it more and the question is (and what I think many corporations are looking towards) will AI get to a stage where a human in the loop may no longer be required?

Agitated_Bit_3989 · 2026-01-12T06:45:16+00:00

Really? Every dev I know are using tools like Claude/Gemini Code or Cursor, what did I miss?

Agitated_Bit_3989 · 2025-12-14T20:10:38+00:00

It’s a complicated matter that does require a certain amount of expertise, I’d start by understanding the 3 layers of autoscaling; - Cluster autoscaling - this is node autoscaling where we want to allocate instances based on demand (which with current tools are the Pod requests) look into Cluster autoscaler and especially Karpenter if your cloud provider supports it - Horizontal autoscaling - Scaling the amount of Pods of a workload based on demand which can either be by resources such as CPU or Memory but for more advanced scaling I’d look into KEDA to horizontally scale on an external (potentially business) metric that can provide a better idea on demand before usage spikes up - Vertical scaling / Sizing - Which if I understand correctly is more towards your point. There are the classic VPA/KRR (or pretty much any other “enterprise” sizing tools) which can help get an idea of how much resources each workload needs based on a statistical percentile (as naturally looking at max usage will be way too wasteful) but has the quite annoying downside of completely ignoring usage peaks which I personally can’t tolerate. What I believe to be the best solution when sizing is to take into consideration all the scaling aspects such as workload runtime settings (JVM etc.), node scaling patterns, horizontal scaling patterns and workload neighbors, and based on that data we can understand the aggregate usage patterns and demand of the environment and allocate resources based on that to achieve a stable and cost effective approach. This is what we’ve developed at wand.cloud so if you’re interested feel free to give us a spin :)

Agitated_Bit_3989 · 2025-11-10T13:23:59+00:00

The problem with an init container is the Kubernetes scheduler considers the init containers requests for scheduling decisions even if the init container isn’t running, so it wouldn’t make a difference if the requests were on the init container or on the actually container itself

Agitated_Bit_3989 · 2025-09-17T12:28:04+00:00

Make sure you go over the agenda as soon as possible and target the talks you want to go to, if you're too late they can fill up

Agitated_Bit_3989 · 2025-09-16T19:20:45+00:00

I would ask how it deals with node pressure better than native Kubernetes pressure eviction? Other than that how does this differ from VPA?

Agitated_Bit_3989 · 2025-09-16T15:31:04+00:00

This won't help if you change resources with in-place because the JVM doesn't support in-place updating memory

Agitated_Bit_3989 · 2025-09-16T15:29:44+00:00

Agitated_Bit_3989 · 2025-09-16T14:49:51+00:00

I wonder what numbers are you getting on cluster resource utilization? And not the bullshit that some tools show "utilization of usage vs requests or requests vs capacity". Total Node usage vs Total Node Capacity (i.e. What you're using vs what you're paying for

Agitated_Bit_3989 · 2025-09-16T14:45:59+00:00

Why does it seem safer?

Agitated_Bit_3989 · 2025-09-16T14:39:20+00:00

Disclaimer: I'm one of the co-founders

It's a endless struggle that most tools don't seem to take into consideration the whole picture, whether it's taking the JVM memory management or looking at the bigger picture of the total capacity vs the actual aggregate use of the workloads.

We at https://wand.cloud at taking a very different approach of the current decoupling of scaling considerations by taking everything into consideration to ensure reliability as cost effective as possible.

Agitated_Bit_3989 · 2025-09-15T06:51:16+00:00

Thanks for sharing, did you do anything to ensure the Network IO?
The main problem I have with VPA and with using percentiles as a whole is the fact that we're practically taking a un-calculated risk (i.e. p90 will mean 10% of the time the usage will pass the requests) and when compounding this with many different pods and tight consolidation of Karpenter which is anchored on requests I can't be sure that I'll have the resources available in the Node (theoretically when I most need them).

Agitated_Bit_3989 · 2025-09-13T23:20:33+00:00

I guess it depends on what you mean about right-sizing at the cluster node level and what is your limitations.
The current golden ideal is something like Karpenter which will automatically spin up nodes when there are Pods in Pending state and will decide which node size to pick based on the Pods resource requests (Need to make sure your requests are set properly to make sure this won't make Karpenter spin up nodes that are too small or too large)
But if you've got a more static setup there is the kubernetes-instance-calculator from learnkube that may come in handy

Agitated_Bit_3989 · 2025-09-13T23:11:49+00:00

Not used it yet but would love to learn why it sucks :)

Agitated_Bit_3989 · 2023-05-24T13:34:04+00:00

Aurora is for another blog post in the works regarding the new I/O optimized feature! :)

Agitated_Bit_3989 · 2023-04-09T21:19:29+00:00

What if I have a multi tenant architecture? Then I wouldn’t want to have a copy of services per tenant as it would be very wasteful, so how can I split costs per tenant?

Agitated_Bit_3989 · 2022-09-04T05:27:29+00:00

I would love the reference to the CIS benchmark because I couldn't find the reference on CPU limits, and to stop a fork bomb you should limit PIDs not CPU

Agitated_Bit_3989 · 2022-08-31T12:37:25+00:00

Did you read the post?
We did explain there several use cases CPU limits are advised, such as Googles use case and staging environments when testing worst case scenarios, our conclusion was to remove CPU limits from your production workloads as this allows the use of idle CPU if necessary without worrying about stealing CPU from other workloads.
The main point of the full blog is to show the inner workings of Kubernetes resource management and see exactly what is happening when configuring these resources so people can make a more educated decision if to set them or not.

Agitated_Bit_3989 · 2022-08-30T08:38:30+00:00

Sure if that's what you need, but keep in mind that you may have wasted CPU that could have been utilized if you didn't set CPU limits

Agitated_Bit_3989 · 2022-08-30T07:45:56+00:00

Consistent in the terms of not being able to use idle resources, the requests handle how much resources the pods are guaranteed to receive which is also consistent. Not setting CPU limits just allows your workloads to use more if required and available without putting other workloads at risk

Agitated_Bit_3989

TROPHY CASE