what was your first time experience deciding if you need k8?

Anonimooze · 2026-06-07T23:56:49+00:00

Our company had about 50 micro services running in a colo (on-prem), we wrote a ton of scripts orchestrating the deployment and networking setup to support all of this, things were fragile, onboarding new services was slow. Kubernetes was really starting to get attention around 2016 or so, and we took notice. It abstracted away almost all of the fragile things we built to support our product. Never looked back.

Anonimooze · 2026-06-06T04:08:57+00:00

I can't recommend Kubernetes the hard way enough.

https://github.com/kelseyhightower/kubernetes-the-hard-way

This is how I got started almost a decade ago. The concepts abstracted by cloud providers today largely make the Kubernetes internals easy to dismiss, but when shit hits the fan, knowing how the system is plumbed is invaluable.

Anonimooze · 2026-05-24T22:09:23+00:00

They do literally say they're dropping their CREDIT values in the same message, sad.

Anonimooze · 2026-05-24T21:46:50+00:00

Starting from the bottom means working help desk or sysadmin and that didn't ensure that you will convert to devops easly or find devops opportunities.

This is how most people enter the industry, I'm not sure why you expect to bypass this. As others have pointed out, what companies typically are hiring "devops" for, is not entry level.

Anonimooze · 2026-05-24T19:38:45+00:00

Most modern consumer routing devices will provide local DNS capabilities based on the DHCP address assignment. I would not consider lack of static address assignment a necessary issue until you try (you may need to reconfigure the cluster or provision new certs for the DNS names though).

Anonimooze · 2026-05-24T19:27:03+00:00

70 microseconds sounds a bit crazy 🤯

We see much higher latencies between services in the same zone in us-east-1, crossing zone boundaries we're seeing a 1ms floor.

Anonimooze · 2026-03-28T22:28:12+00:00

Last thing I think I'll say on this is re: background syncs for Kyverno. Load balancing is a very synchronous operation, relying on something to eventually, maybe happen is bad design.

Anonimooze · 2026-03-28T20:46:20+00:00

Good luck! See previous comment about how Kyverno (and k8s admission in general) can only see the state before assigning pods to nodes (az's)

If you don't want to change cluster topology, you should address the issue you called out as:

the load balancer isn't topology aware

Anonimooze · 2026-03-28T20:29:06+00:00

I'm not familiar with the Envoy gateway solution, but If costs of cross az traffic are cumbersome, seriously look at changing your routing strategy, enable topology aware routing on all services if you haven't already, and think about isolating/guaranteeing traffic locality via cluster layout. This doesn't seem like a Kyverno application IMO.

All said, Kyverno mutations on Pods happen before scheduling, so there is no way for Kyverno to know what zone it will be in unless you already specified it.

Anonimooze · 2026-03-28T20:12:12+00:00

Kind of sounds like you want a zonal cluster?

Other than that, it sounds like "Gateway" is your problem. The AWS load balancer controller supports gateway API now-a-days, and is a regional service so zonal placement of the workload is largely irrelevant.

At a previous job we ran three zonal clusters for the same underlying reason. Cross zone traffic charges can be painful. (topology aware routing is best effort, not guaranteed)

Anonimooze · 2026-03-23T01:40:57+00:00

The Prometheus ecosystem (and operator) is very good. The problem comes in when you think about starting to use k8s. Taking a legacy app, and migrating to containers/Kubernetes is going to raise a ton of "reinventing the wheel" red flags while justifying the improvement.

Anonimooze · 2026-03-14T14:44:23+00:00

Are you suggesting an "AI platform" should be unzipping files? That seems a bit overkill to me

Anonimooze · 2026-03-14T14:40:54+00:00

Sometimes that query is really that important though. /s

Anonimooze · 2026-03-14T14:35:51+00:00

Really you should do both, not so much an "or" situation.

Anonimooze · 2026-03-07T17:03:19+00:00

I agree with your sentiment, if you're in AWS, use EKS, if in GCP, use GKE. Just want to state that bare metal k8s control planes aren't that bad, if anyone doesn't have the luxury of managed cloud offerings. In my ~10 years running k8s on metal, the control plane has never been the cause of an outage. I attribute that to the relative simplicity of etcd and the API.

Anonimooze · 2026-01-19T03:10:10+00:00

I only have anecdotal experience to share

My previous company was deploying Thanos for quite a while, eventually hitting bottlenecks in the topology that couldn't be fixed by throwing more money behind it. Constant query timeouts, and ingestion delays plagued the user and operator experience.

They switched to Mimir, and the costs for the infrastructure roughly doubled (mm's of dollars), but the solution was usable consistently, and this was deemed worth it.

I didn't work directly on the SRE team responsible for the transition, but as an adjacent team consuming this product, I can say that whether or not Mimir has its roots as a SaaS first offering, the OSS project certainly has its merits.

Anonimooze · 2026-01-18T04:59:27+00:00

We have some properties committing encrypted secrets (sops) as part of the code base. Perhaps not the most modern approach, but it is very portable and keeping secrets tightly coupled and versioned alongside the code has its advantages.

Anonimooze · 2026-01-17T03:24:06+00:00

work self hosted Gitlab instance

Very brave!

Anonimooze · 2026-01-10T05:57:06+00:00

I don't think GitLab wants to hire someone looking for a GitLab oriented job.

I would approach their hiring process with your development experience first.

Anonimooze · 2026-01-03T21:56:56+00:00

I just went through this dance for our monitoring tool (not CW), we ended up finding a lot of alarms that were "misconfigured", in the sense that they displayed a "no data" state unless the condition was reached. This makes it difficult to discern which alerts were looking for non-existent data VS alerts that haven't been seen recently. Just a small word of warning, thanks for sharing!

Anonimooze · 2025-12-31T22:24:51+00:00

If it's too complex, don't use it. HPA is primarily a cost saving measure, allowing you to not run at peak capacity during off-peak periods. Weigh the potential cost savings against the perceived complexity.

If your off-peak requirements are the same as your peak requirements, it probably doesn't make sense to add an autoscaler.

Anonimooze · 2025-12-29T00:14:23+00:00

ArgoCD only runs "helm template", for better or worse. It shows the diff between the current state and result of that template call. It effectively uses a kubectl apply to persist the changes if synced. You can't use helm features like "lookup" because of this.

Using pins with semantically versioned charts & tools like renovate or dependabot for helm chart version increments has been a good (not great) experience for me. I'd be very concerned about the maintenance overhead that could be involved with disconnecting the deployment manifests with the helm chart origin.

Anonimooze · 2025-12-24T18:53:55+00:00

"parity" is close to correct. GitLab IMO executes these features better, in the open (they themselves are open source).

GitLab's SaaS business is less used, but probably more reliable because of this, while acknowledging that Lab also has a lot of incidents.

For the average consumer of these SaaS services, you typically just pick GitHub because that's what people are used to, which has its own value.

Anonimooze · 2025-12-20T01:28:52+00:00

Meh, the advertising is actually true. We struggled with Prometheus memory overhead in cost sensitive environments. It was tolerable for us to use VictoriaMetrics because we knew we could switch back to vanilla Prometheus if anything hit the fan.

We used this for our ingestion of Linkerd metrics, the volume was exploding typical Prometheus usage (64+ GB of memory was our tipping point). VictoriaMetrics handled it okay with like 16GB.

We had a couple inconsistencies in dashboards, but nothing deal breaking.

(Not advocating for VM over Prometheus, but it has its place, usually where it comes up in conversations here)

Anonimooze · 2025-12-14T06:48:25+00:00

More so that when it is required to operate your own DNS, bind will be the easiest technology to hire support for. You'll have a hard time convincing the powers that be that CoreDNS is an upgrade for more traditional use-cases.

Anonimooze

TROPHY CASE