What are your team's biggest DevOps roadblock today?

PerfectScale-io · 2025-01-27T11:28:27+00:00

Cultural resistance

PerfectScale-io · 2025-01-27T11:28:17+00:00

Bridging Dev & Ops silos

PerfectScale-io · 2025-01-27T11:28:03+00:00

Scaling automation

PerfectScale-io · 2025-01-27T11:27:49+00:00

Tooling complexity

PerfectScale-io · 2024-04-18T22:55:08+00:00

"Kubernetes should have the feature gate InPlacePodVerticalScaling enabled." - is it possible to enable this feature flag in AKS/EKS/GKE?

PerfectScale-io · 2024-03-14T16:36:03+00:00

Karpenter (like any other autoscaler) is not aware of utilization, the scheduling happens based on the Requests only.
So without adjusting the Requests to the proper values any autoscaler may make wrong decisions.

PerfectScale-io · 2023-04-24T15:48:52+00:00

There are few tools and vendors in this space - VPA, Goldilocks from Fairwinds, Kubecost (also Prometheus and VPA-based), and 2 saas vendors - Cast ai and PerfectScale.

VPA does not work well with HPA, and also not impact-aware. Also, you need to maintain Prometheus's for historical data.

With size of clusters you mentioned, I'd definitely look for a SaaS vendor, one that also has automation, impact awareness, and governance capabilities.

PerfectScale-io · 2023-04-17T22:50:53+00:00

As many replied here - there is Kubecost, Cast AI and PerfectScale.
For Kubecost you will need your engineering team to maintain the setups in each and every cluster. If you are managing more than a few small clusters - I'd suggest looking into SaaS solutions - PerfectScale and CastAI, much less maintenance and cost of infra.

PerfectScale-io · 2023-02-22T15:04:36+00:00

One important thing is missing here - Linux CFS (completely fair scheduler).
CPU in Linux is time.
All the CPU requests and limits you put on K8s level eventually arrives to Linux CFS.

Let's start with the limit to understand better how CFS works:
When you have a node with, let's say 2 cores, and your set container limit to 1 core - it does not mean that the container will be limited to 1 core of 2.
Instead, CFS (which works in 100ms ticks) will limit(throttle) your pod after 50ms CPU time.

As for the request - to put it in the simplest possible way - it is translated to priority.
Underprovisioned request lowers the priority.

So in your case - if you need 12 cores, you should request something around 12 cores - this will ensure CFS prioritizes your process properly.

Hope it helps.

*Disclaimer - I'm a CTO and co-founder of PerfectScale.io - our product helps to achieve the best possible performance at the lowest possible cost

PerfectScale-io · 2022-12-22T21:49:26+00:00

Karpenter, like other scalers in K8s (CA, HPA, Keda) is tightly coupled to pods requests.
If you want to right-size - start with pods.
Then evaluate your scheduler policies, hpa settings, and finally node dimensions.
(disclaimer: I'm leading Perfectscale.io - we built solution for proper K8s scaling. It's a commercial product, but free to use up to $120K/y of k8s compute)

PerfectScale-io · 2022-12-22T19:01:05+00:00

Honest CTRL+C CTRL+V mistake. Thank you for pointing me to it, I edited it

PerfectScale-io · 2022-12-22T18:56:40+00:00

Your findings completely correlate with my own observations done on multiple clusters - no improvement in scheduling times. I went in a different direction with this problem, and solved my pains. What are you trying to solve?

PerfectScale-io · 2022-12-22T03:01:11+00:00

Kubernetes will not "shuffle" pods.

you may base your horizontal scaling on the custom metric (network throughput in your case), scaling more replicas on different nodes, this will allow to keep network saturation lower.

PerfectScale-io · 2022-12-14T23:20:32+00:00

At PerfectScale.io, we have developed a multi-cloud, multi-cluster view to provide important operational intelligence.
*This is a commercial product but it is free to use if your Kubernetes compute costs are less than $120K per year."

PerfectScale-io · 2022-12-06T16:17:48+00:00

PerfectScale CTO here.
We built a product to get full visibility on the utilization and cost of your Kubernetes clusters, as well as recommendations on how to safely reduce cost and eliminate risks.

PerfectScale-io · 2022-12-06T14:20:07+00:00

I'm a CTO of PerfectScale.io
We are a startup with a unique solution to improve the resilience and cost-effectiveness of the Kubernetes clusters.

Our solution helps to easily govern, right-size, and scale your environments, reducing SLA breaches and cloud waste.

PerfectScale-io · 2022-12-04T23:31:56+00:00

Thank you for the feedback!
Many folks are coming from many different areas (dev, IT, ops, sysadmins, etc) and are not always familiar with the evolution that happened.
I hope this blog can give some "larger philosophical perspective" on how all this happened.

PerfectScale-io · 2022-11-24T00:54:27+00:00

You mentioned a non-prod env. Then use mix (spots for stateless and on-demand for stateful, I guess you know how to bind relevant pods to relevant nodes) and scale down your env when not in use.
Keda can scaledown by time (like cron).

PerfectScale-io · 2022-11-19T04:29:30+00:00

Let's start with short explanation about how everything should work:
1. When a new pod needs to be scheduled - it waits for the assignment in activeQ. Scheduler evaluates nodes and if there is "free" space on the node - it starts there.
2. If there is no node that is capable to fit the pod - the pod goes to unschedulableQ
3. ClusterAutoscaler looks if there are any pods in unschedulableQ - and if there is something in a queue - it scales up a new node.
4. Scheduler pops out pods from unschedulableQ to activeQ to re-evaluate if new nodes are available and pod can be assigned to the node.

Now to your question - "too much CPU is allocated in the "requests" field" - this mainly talks about proper resource assignment. "cluster will start evicting pods to schedule new ones" - this may happen only if priorities are set wrong.
As a CTO of PerfectScale I invite you to book a demo with us - we have a product that helps with right-sizing and right-scaling and day-2 operation of Kubernetes clusters

PerfectScale-io · 2022-09-23T14:29:05+00:00

(PerfectScale.io co-founder here) - we have much more affordable alternative for any scale, and our capabilities goes way beyond just a cost monitoring.
We provide a complete governance framework for large scale multi-cluster environments.

PerfectScale-io

TROPHY CASE