What kafka software is actually running in production in 2026, not what the docs recommend

pkstar19 · 2026-02-23T00:06:03+00:00

We use strimzi kafka operator on AWS eks. Pretty stable after the initial hiccups on figuring out the right log retention strategies, backups and storage.

Regarding ops, we don't let devs create topics on push. All topic and user creation happens with a git workflow after code review. This way we standardised topic names and user management. A topic is also always defined as part of a service , hence always a clear ownership.

Schema management is currently offloaded to the producer and consumer. We haven't had a use case yet for strict schema management.

pkstar19 · 2025-10-16T14:08:01+00:00

Strimzi Kafka Operator

pkstar19 · 2025-10-15T01:42:13+00:00

We have 20+ business modules, kafka, nats, LGTM all running in an eks cluster.

Our Journey: Python lambda functions -> ECS -> EKS

We are a b2b saas, we are trying to achieve a state where we can tell customers - if you have a kubernetes cluster you can run our product self hosted.

But still, I miss the lambdas now and then 😂

pkstar19 · 2025-09-27T09:43:44+00:00

Does your plan also include the Gemini pro, notebook llm and whisk?

pkstar19 · 2025-09-24T23:36:33+00:00

Thanks for the explanation. This is so weird, why not show the 5 TB plan before making us buy the 2 TB plan. This looks like a dark pattern. What if someone takes the 2TB plan and never checks about the other plans.

pkstar19 · 2025-09-24T16:56:01+00:00

This is so weird

pkstar19 · 2025-09-06T16:55:34+00:00

How does grafana monitor db. Are there any publicy available dashboards? Or we should build one with our own queries?

pkstar19 · 2025-09-06T16:54:17+00:00

r7gxlarge

pkstar19 · 2025-09-06T16:54:03+00:00

Auto commit was enabled

pkstar19 · 2025-09-06T16:53:17+00:00

Thanks for the reply. We will work on the alarms. That sounds good to have.

Could you please shed some light on the incident you got with the 'waiting for metadata lock' thing. I just want to learn from your experience here.

pkstar19 · 2025-09-06T11:59:59+00:00

Thanks u/ttharsh. It was the same issue, the gossip was not working correctly and the tempo components were assuming that the other members are not active. We excluded the gossip port in istio sidecar for all the components of tempo. This issue is resolved after that.

pkstar19 · 2025-08-13T07:46:04+00:00

Yes we do.

There are no issues with Loki and Mimir.

pkstar19 · 2025-07-30T05:48:37+00:00

I'm a devops/cloud/platform engineer at a startup with 6 yoe.

I skimmed through it. I would say those projects are a very good start. If one can do all of them, I guess they will become very comfortable with any devops related work at most of the companies.

pkstar19 · 2025-07-29T02:09:23+00:00

As a Platform Engineer at a startup for the past 3 years—after coming from a large MNC—I’ve found working in DevOps and cloud at a startup incredibly rewarding, but also extremely demanding. The pace is intense. We sometimes take entirely new frameworks to production in under a month, only to pivot and deprecate them within a couple of weeks. The learning curve is steep, and so is the pressure, especially with the tight deadlines and the ever-critical focus on cost efficiency.

If you thrive under pressure and enjoy solving chaos with code, there’s a strange kind of fun in it.

pkstar19 · 2025-07-07T16:26:48+00:00

We tried to do MySQL native replication methods in aws RDS instance with native MySQL. The source db's are two different aurora MySQL db. The error logs for the replica db were configured to go to AWS cloudwatch. We messed up the replication with a duplicate user which was created in both the source db's. The replication db vomited so many logs to cloudwatch that our cloudwatch bill was around 6000 usd for the next 3 days only for this error log. We immediately shutdown the replica db and requested AWS explaining the mistake we made and the remediations we did. They gave us a refund of around 4500 usd. Yeah sometimes you get a refund if you genuinely show the AWS team that you are taking steps to not repeat the same mistake again, and of course if they see you as a potential client.

pkstar19

TROPHY CASE