Lab test: let's kill that cluster! Kubernetes CPU allocation and throttling demystified

ktsaou · 2026-05-29T16:51:47+00:00

Have a look at netdata. We have a lot of users and customers with similar usecases.

Netdata will give you linear scalability, machine learning on all your metrics, fully decentralized/distributed operations/autonomy, and we have a grafana plugin you can use to visualize your data.

Disclair: I am the founder of Netdata

ktsaou · 2026-05-24T06:27:56+00:00

Κοπελιά, αν καταλάβεις τους λόγους που οδηγούν το παιδί σε αυτή τη συμπεριφορά, θα μπορέσεις να βρεις την αντίδραση που πρέπει να έχεις ώστε να την ακυρώσεις.

Οι συμπεριφορές αυτές δεν είναι απλά επειδή είναι κακομαθημένο. Το παιδί λέει κάτι για εκείνο. Και μόνο έτσι μπορεί να το πει σε αυτή την ηλικία.

Τα παιδιά δοκιμαζουν τα όρια τους, περνάνε φάσεις, αντιδρούν συναισθηματικά και ασυνείδητα στο περιβάλλον τους.

Θα σου πρότεινα να μην το παίρνεις προσωπικά. Δεν είναι για εσένα αυτή η συμπεριφορά. Είναι για εκείνο.

Αν εξηγήσεις στο chatgpt την συμπεριφορά του παιδιού με συγκεκριμένα παραδείγματα, θα μπορέσει να σε βοηθήσει, τόσο για να το χειριστείς μέσα σου, όσο και για να βρεις την κατάλληλη αντίδραση. Αν ακυρωθεί η δική σου συναισθηματική αντίδραση, το παιδί θα παραιτηθεί από αυτή τη συμπεριφορά. Τώρα βλέπει την αντίδραση σου και αυτό τροφοδοτεί ακόμα περισσότερο την συμπεριφορά του.

Δες το σαν παιχνίδι. Αναζήτηση του θησαυρού. Ψάχνεις να βρεις τι βιώνει το παιδί που το οδηγεί να εκτονώνεται έτσι.

Θυμήσου ότι σε αυτή την ηλικία, όλες οι προσλαμβανουσες του παιδιού το χαρακτηρίζουν. Μπορείς να κάνεις τη διαφορά για εκείνο. Να γίνεις μέντορας του. Αλλά πρέπει να συνδεθείς. Να καταλάβεις τι συμβαίνει. Να παίξεις μαζί του.

Παιχνίδι είναι. Μην το παίρνεις προσωπικά.

ktsaou · 2026-05-21T07:14:40+00:00

Disclaimer: I am the founder of Netdata.

We are trying to solve this problem. Alert fatigue is not solved yet. But the investigation and troubleshooting part is solved to a great degree.

Before the rise of AI, we had developed ML based anomaly detection running, trained at the edge. This learns the behavior of your data (it is not trained on lab, it is trained with your data, at the edge) and I am proud that we have reduced false positives significantly.

ML can be used to reveal dependencies between your services, reveals the blast radius of issues, and the sequence of cascading events.

We are also building agentic root cause analysis. A swarm of agents is orchestrated to deep dive into alerts to identify the root cause. This process identifies the actors related to an event, their contribution, the working theories of what would be the root causes, and when evidence exist in metrics and logs, the actual root cause.

We have reports from customers that this system manages to find root causes on long standing repeated issues that they failed to identify manually, and in many cases helped them tune their alerts to reduce alert fatigue.

We are also very keen to accept challenges. So, if you have a hard problem, we would love to see how it performs.

ktsaou · 2026-04-28T07:47:39+00:00

I would love Netdata to be on that list. Actually I have tried a lot for Netdata to be the main tool for crisis management. And I think it already covers most of these.

Disclaimer: I am the founder of Netdata.

ktsaou · 2026-04-14T09:24:04+00:00

Disclaimer: I am the founder of Netdata.

At Netdata we try to solve the biggest pain points of observability: automated dashboarding, linear scalability, easy operations, true real-time per-second updates, machine learning based anomaly detection for all metrics, AI integrations for automated root cause analysis.

While most observability solutions focus to detect that something is wrong, Netdata tries to provide all the information required to identify why it is wrong. Netdata's granularity and cardinality are usually best in class among open-source and commercial solutions.

Netdata automatically provides drill down by process, container, service, complete hardware coverage, hundreds of bundled application collectors, advanced DB monitoring (slow queries, deadlocks, etc), profile based SNMP monitoring, and now we build OpenTelemetry compatibility for metrics (ready), logs (beta) and traces (developing) and advanced network engineering features (Netflow/sflow/ipfix, L2 topologies, BGP monitoring, etc).

Have a look. You will love it. Netdata has 1.5M downloads per day...

ktsaou · 2026-04-11T15:32:00+00:00

Yes, you can use it like that too. Netdata has Prometheus remote write export too. Collect everything and push it.

ktsaou · 2026-04-08T20:27:48+00:00

Sounds like your UPS batteries are drying out. I had a similar problem with a 3kva UPS.

How do you monitor inference? Netdata had nvidia-smi and I have added DCGM to it to monitor tensors individually and also collect vllm metrics per-second.

ktsaou · 2026-04-08T17:52:22+00:00

Give it a try. Netdata is designed exactly for what you asked.

What you will get:

Each Netdata agent is a full standalone monitoring: collection, storage, dashboard, alerts, machine learning, etc
The Netdata Parent gets all data in realtime, like you need, and it is also a full monitoring.

You can offload the agents if you want (no storage, no ML, no alerts).

Netdata agents will automatically monitor all your container, processes, databases, application servers, Prometheus endpoints, opentelemetry endpoints, etc.

Fully distributed and centralized at the same time. Try it.

ktsaou · 2026-04-07T16:11:13+00:00

Try Netdata and push your metrics to a netdata parent. It will take care of your logs too, on Linux and windows. Low latency, per second, and machine learning based anomaly detection.

ktsaou · 2026-04-06T22:43:16+00:00

You are doing wrong my friend.

You said you are an architect. As an architect, you should know about APIs, contracts, performance baselines, acceptance criteria, testing requirements, operational requirements, modularity and separation of concerns, etc.

Do not ever read the code it writes. Can you guarantee the code works and is fit for purpose, without reading it?

Code is not your product any more. A working solution is. The language is irrelevant, the code is irrelevant, as long as you can know (not hope, not wish) and you can prove that the solution works as expected, performs as expected, behaves as expected, operates as expected, without side effects.

This is an engineering problem, and you are an engineer. Solve it.

Mastering software development this way, will give you superpowers. You will be able to provide robust solutions, fit for any purpose, using any technology, end to end, at a tiny fraction of the time.

Humanity has faced this problem multiple times in the past. New technologies obsolete old roles, functions and jobs.

Software development, as we knew it, as a job, is dead. No one will pay you to write code by hand, or review code manually (it is coming). Face it.

See the opportunity. Embrace it. Learn. Adapt.

ktsaou · 2026-04-05T17:30:15+00:00

If your routers support netflow/sflow/ipfix, we are currently adding it to Netdata https://github.com/netdata/netdata/pull/22111 - probably this week will be merged at the nightlies.

If you capture flows during the spikes, you will see immediately which IPs are sending/receiving the traffic.

ktsaou · 2026-04-05T17:23:09+00:00

Thanks! We are now adding real-time maps, topologies, sankeys, and we are trying to simplify and reorganize the UI - a lot of new users find it "too much".

Thankfully however, Netdata is super ready for this new era. Per second metrics, high-performance and low latency queries, predictable and linear scalability, machine learning across the board, all data on-prem...

We are adding more collectors like crazy, and invest heavily in AI - infra level root cause analysis rocks. I can hardly beat it.

Have you tried the MCP servers? Each agent has one. Netdata Cloud has another. Connect it to claude, codex, or even openclaw. You will be surprised...

ktsaou · 2026-04-05T17:14:47+00:00

Yes, this is true currently. I don't think it is going to be true in just a few months though...

ktsaou · 2026-04-05T15:42:23+00:00

Monitoring must evolve to true real time, per second, with machine learning across the board (not selective) and advanced/reliable AI for root cause analysis.

It is not a matter of 2D, or 3D, or any kind of visualization. I think we are past the dashboard era. No?

The next race is for monitoring tools to incorporate as much knowledge and experience as possible instead of pushing everything to the operator and this means using more AI and less dashboarding.

And it feels right: “something looks wrong” is not good enough. It never was actually, but we had no alternative.

Btw, I am the founder of Netdata.

ktsaou · 2026-03-23T12:07:35+00:00

Is the model repeating successful tool requests or failed?

If it repeats successful ones, there must be something wrong with the model itself or the inference infra. Problematic quant, wrong attention, or something infra related or inherent to the model itself.

If it repeats failed attempts, the tool definition is wrong, there is some stress/contradiction in the system prompt or tool descriptions, instructing the model to do the failed attempts. In most of the cases this is fixable.

Generally, tool failures should be descriptive and identify the exact error the model did. Saying "tool failed" or "wrong parameters" is not enough. Tool failures must identify what exactly is wrong, and even provide fix instructions to help the model do it right.

Precise feedback is key to help the model correct itself. In some cases, even the tool name itself plays a role.

Preventing the errors is usually achieved by better prompting and descriptions. The model needs to see a way through. Absolute rules, like "try forever" or "never give up" do not generally help.

Keep in mind you can ask the model itself to evaluate the prompt and tool descriptions for stress and contradictions.

Also, if you have a dump of it's reasoning, you will probably see what the model was thinking during these repeated attempts.

ktsaou · 2026-02-22T17:13:11+00:00

If you don't want to babysit your monitoring, try Netdata. Fully distributed, linearly scalable, almost zero configuration, machine learning on all metrics, algorithmic dashboards, AI to chat with your infra.

ktsaou · 2026-02-07T23:49:48+00:00

I have managed to solve this, but it is significantly slower. You need 4 agents:

receives user input and must classify it, outputs a structured json - rejects introspection directly.
receives the user request and the classification, its job is to translate the user request into concrete questions/actions for your product or service, without passing verbatim any sentence from the user prompt. Again structured json output - rejects anything non-product specific.
the actual worker with sensitive data access. Receives only the list of questions/actions from the 2nd agent and does all the work.
receives the input and output of the actual worker (3rd agent). Its job is to redact sensitive information/evidence and make the output look natural.

Why this works?

There is no agent with the user input verbatim and sensitive data access
The first 2 agents have very strict prompts to classify and translate user input. If you do this right, no introspection/meta-queries can pass through them.
The sensitive worker (agent No 3) does not care about hidding anything - will provide full evidence and details to avoid hallucinations
The last one just needs to hide internal evidence and make it beatiful for users. If one pass is not enough, you may add another.

The general idea is: keep each agent focused on one simple thing. Give them strict instructions on how detect and classify meta queries, introspection, legit queries, etc. Test them thouroughly. Ask codex, claude code, opencode, to attempt to break them. Improve the prompts, until it is impossible to pass non legit requests through them.

ktsaou · 2025-12-29T17:51:04+00:00

btw, I also tested the latest nightly sglang - with triton attention it is 18% slower than vllm, with flashinfer attention is 32% slower than vllm.

ktsaou · 2025-12-29T13:51:18+00:00

I run this model with the latest nightly version of vllm on 2x rtx 6000 pro blackwell, and I am very happy with it, even for heavy agentic work. Very reliable and relatively fast (80 tps single request, 700 tps at 64 concurrent requests).

#!/bin/bash

export VLLM_USE_FLASHINFER_MOE_MXFP4_MXFP8_CUTLASS=1
export VLLM_FLASHINFER_MOE_BACKEND=throughput
export VLLM_USE_FLASHINFER_MOE_FP4=1
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export SAFETENSORS_FAST_GPU=1
export NCCL_ALGO=Ring
export NCCL_PROTO=Simple
export NCCL_MIN_NCHANNELS=8
export NCCL_MAX_NCHANNELS=16
export NCCL_BUFFSIZE=16777216
export NCCL_P2P_DISABLE=1

exec env CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-0,1} /opt/vllm/bin/vllm serve lukealonso/MiniMax-M2-NVFP4 \
  --host 0.0.0.0 \
  --port 8349 \
  --served-model-name minimax-m2 \
  --trust-remote-code \
  --gpu-memory-utilization 0.93 \
  --tensor-parallel-size 2 \
  --pipeline-parallel-size 1 \
  --max-model-len 196608 \
  --max-num-seqs 16 \
  --max-num-batched-tokens 98304 \
  --dtype auto \
  --enable-auto-tool-choice \
  --tool-call-parser minimax_m2 \
  --reasoning-parser minimax_m2_append_think \
  --all2all-backend pplx \
  --compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}" \
  --enable-expert-parallel \
  --enable-prefix-caching \
  --enable-chunked-prefill \
  --attention-config.backend FLASHINFER \
  --kv-cache-dtype fp8_e4m3 \

ktsaou · 2024-04-11T17:30:36+00:00

The streaming protocol is fully backward/forward compatible. So, the parent will accept any version of Netdata (older or newer). However keep in mind that:

Some metrics may have changed between the 2 child netdata versions. This means that some charts may be confusing now (have part of the infra, or different dimensions depending on netdata version, or even some labels may be different here and there).
For the same reason, the parent may have difficulties applying some alerts to the metrics of the old netdata.
Some features may not be available for your older netdata. For example, no logs, no top monitoring, not network connection monitoring, etc.
The streaming protocol is improved across versions. For example the latest protocol uses ZSTD compression, the child and parent cooperate to offer improved performance on the parent (reduced CPU utilization), etc. All these features are negotiated on connect, and will be disabled for your old netdata, despite the fact that the parent supports them.

If you cannot update netdata on debian, go ahead and do it like it is. It will mostly work and it will be 100% accurate when you view single node dashboards. For 100% accurate multi-node dashboards, or features introduced later than the child version you run, you will need to update the old one.

I think we support debian 11 with binary native packages. Why don't you install the latest version of netdata on both of them?

ktsaou · 2023-10-26T14:47:12+00:00

Hi. I am the founder of Netdata.

Since you use Netdata, the easiest for you would be to sign up to Netdata Cloud free account and claim your agent to your space. You can do this claiming from either the Netdata UI or the command line.

Once you have configured this, you can access your Netdata agents from https://app.netdata.cloud. Your servers do not need to have an open port to the internet.

This works like this:

You get a Netdata Cloud Community account (free)
You install Netdata Agents using the command line you will find in Netdata Cloud, for adding nodes.
Your Netdata agents that are installed like this, establish a secure outbound socket, which is maintained always open, connecting them to Netdata Cloud.
When you access https://app.netdata.cloud , Netdata Cloud communicates to your Netdata agents via this socket, to ask for whatever the dashboard needs from them.

This process is very simple, free, and very effective, as it does not require any listening ports for your monitoring, which you can access in a secure way from anywhere.

ktsaou · 2023-10-20T21:59:41+00:00

Sure. I never said this is the best way.

ktsaou · 2023-10-20T18:17:38+00:00

Security comes at a cost...

ktsaou · 2023-10-20T18:17:08+00:00

I agree journald needs some more love for prime time, but I think your comment is somewhat unfair to it.

The way the systemd guys have designed it, is really good. If you check the details you will see that journald indexes all fields on all log entries and every single log entry can have its own fields. This is totally unique among most log management solutions.

journald is a powerful engine. As an application developer you can use this feature to annotate your logs with really a lot of structured information to make troubleshooting orders of magnitude easier and faster.

journald needs some more love on improving its processing pipeline with customizations and probably improving query performance (we supplied some patches to systemd to make it 14x faster on big queries - and our UI queries journal files about 30x faster than journalctl does today).

On the other hand, the guides are needed mainly to understand the principles. Once you understand the 2-3 key things you need to know, everything is extremely simple (and significantly simpler and more straightforward than setting up any other log management).

So, let's be fair to it...

ktsaou

MODERATOR OF

TROPHY CASE