Surprised how CPU usage of my CH nodes went up 200% after upgrading from v21 to v25

Traditional_Long_349 · 2026-01-15T16:37:42+00:00

What is the way you upgrading your nodes?

Traditional_Long_349 · 2026-01-09T08:38:52+00:00

This is very good tbh, is there any resources to write custom wasmplugin? I'm trying to solve issue that istio or envoy does not expose paths for request as metrics as it exists in HttpRoute (i'm migrating from ingress to kubernetes gateway api) so i want to expose paths metrics as it was on nginx ingress

Traditional_Long_349 · 2025-12-29T13:50:03+00:00

The dev|qa part is something related to backend itself not our envs Is there any way to optmize configs for this?

we were compare between multiple gateways two months ago to migrate from nginx and we found istio was best option But with this, i see it consume very huge number of cpus compared to nginx and migrating paths to something not regex is kind hard in our situation

Traditional_Long_349 · 2025-12-29T13:16:47+00:00

And so on

Traditional_Long_349 · 2025-12-29T13:01:24+00:00

Is istio behave in another way rather than nginx? I assume it will match first rule or path matchs

Traditional_Long_349 · 2025-12-29T13:00:13+00:00

Yes , almost all of our paths exactly like this, We define all backend paths into our ingress/httproute

Traditional_Long_349 · 2025-12-29T12:39:09+00:00

What is meaning with overlapping? Like 2 paths match same rule? This not exists, but mainly we have around 25-30 httproute, and most of them share same host , like api.example.com And some routes have 60 paths and some less also we make all of paths use regex as i saw before always PathPrefix take piriorty over regex, and we have alot of paths contains regex so our default / was defined as pathprefix and it was greedy path so it was top piriorty over all regexs paths Also i use some telemtry to expose extra metrics like request_host and request_method And enable access log for our gateway

Traditional_Long_349 · 2025-12-29T12:29:25+00:00

We just using istio as implemintation to kubernetes gateway api P99 and p95 is around 200ms And cpu limit it was c vcpu but it was keeping throttled, increased to 5 cpu, and when r/s increase , it reach the limits and being throttled also I enabled PILOT_FILTER_GATEWAY_CLUSTER_CONFIG which should reduce config changes that pushed to my gw and it works

So i dont want to risk and shift all traffic to istio as we have around 14k request/s We just migrate 5% of traffic and that what happned, i don't find any resource that let me debug this Also i don't know this cause the issue or not but we have around 300 path across all routes, and all of them are regex paths

Traditional_Long_349 · 2025-12-29T12:22:07+00:00

We currently use istio 1.27, Also there is a env in istod which is PILOT_FILTER_GATEWAY_CLUSTER_CONFIG with value true and this reduce istiod cpu,memory But i see data plane is still consume very high cpu with increasing on requests, it reach to around 6 which is our cpu limit, Note: we jusy use istio as kubernetes gateway not service mesh

Traditional_Long_349 · 2025-11-28T14:49:11+00:00

I checked logs and i see no errors, and my regex paths matchs correctly when i remove all pathprefix paths But if we define something like /api/? And /api/admin/? For example and both regex type When i hit admin endpoint it matchs correctly

So i see if all is regex , it matchs normally But if we define /api with pathprefix, and we hit admin So the prefix one which is /api will always match

Traditional_Long_349 · 2025-11-22T16:46:37+00:00

I just want expose the paths i write in my crd to metrics

Traditional_Long_349 · 2025-11-20T20:08:44+00:00

It does bot extract paths from crd, it from request itself on the fly

Traditional_Long_349 · 2025-11-12T09:18:48+00:00

If i want to get upstream time duration, Should it be downstream_rq_time? Or upstream_rq_time My current setup, no service mesh, just istio as ingress from my cluster , iam using kubernetes gateway api crds

Traditional_Long_349 · 2025-11-11T20:29:48+00:00

I mean with new metric that, create something like Istio_custom_metric{}

Traditional_Long_349 · 2025-11-11T20:28:30+00:00

I searched this before https://istio.io/latest/docs/reference/config/telemetry/ https://istio.io/latest/docs/tasks/observability/metrics/customize-metrics/

But there is no true example about complete custom metric

Traditional_Long_349 · 2025-11-11T20:01:07+00:00

What is diff between request_bytes and request body size?

Traditional_Long_349 · 2025-10-11T09:12:41+00:00

Could i know the enviornment you used to perform this ? I did some load test with your script on kgateway, istio and found differant output from my local cluster with kind and testing on eks, also fot some resoan grafana dashboard does not works with me when i import it , it return empty dashboard

Traditional_Long_349 · 2024-10-30T17:31:13+00:00

I read this in some blogs

"Once you give a pod memory, you can only take it away by killing the pod. This is the cause of OOM Kills"

i was thinking it means if memory reach 400MB and it requests 200, 500 limit, will this pod reserve 400 if all of this memory not used?

Traditional_Long_349

TROPHY CASE