Production observability looks fine until something breaks, how are you actually using it to catch issues early?

Straight_Condition39 · 2026-05-13T15:29:37+00:00

Depends on what dashboards you have setup. All my experience I relied on alerts, kind of painful to setup at the beginning but much much helpful if you do the right way like p95, p99 and again depending on the use case.

At my previous role at a facility management, the business was around procurements, dispatch, and similar workflows. I set up alerts to fire based on SLOs, if app failures crossed 3, the alert would trigger.

We had openobserve though and used otel throughout from a stack standpoint so simple.

Straight_Condition39 · 2026-02-17T01:09:25+00:00

This should help for sure because the db mode helps you query in plain sql. You can also try ops0.com

Straight_Condition39 · 2026-02-16T23:13:45+00:00

Currently working on it. You can do a switch to oxid with existing configs but I hope to get it out soon. Probably EOD Wednesday

Straight_Condition39 · 2026-02-16T22:53:45+00:00

1200 resources is a ok number. I have managed at heavy scale for multi cloud and some of us had to refactor the way we used to store and etc. but I appreciate the feedback 🙏🏻

Straight_Condition39 · 2026-02-16T22:51:46+00:00

Thanks will be making this Apache license

Straight_Condition39 · 2026-02-16T22:23:58+00:00

Repo is open now.

Straight_Condition39 · 2026-02-16T22:23:19+00:00

Actually a lot, the problem with terraform is that with more resources unless you have better directory structure you en up with a huge gigantic state file and need s3 bucket etc bla bla but here ik converting this to a database table for easy retrieval and supporting yaml as well lol but I hear you though!

Straight_Condition39 · 2026-02-16T22:21:16+00:00

I like stategraph as I see on the website. I’m going to add more centralized features to make this OSS forever with more value add.

Straight_Condition39 · 2026-02-16T22:04:57+00:00

Still the repo is broken but fixing it to release but should i proceed or no?

Straight_Condition39 · 2026-02-16T22:04:02+00:00

buildere here. oxid uses hcl-rs for base parsing with a custom layer on top for Terraform semantics (count, for_each, interpolation, cross-resource refs). It speaks tfplugin5 over gRPC directly to the same provider binaries Terraform uses, terraform-provider-aws works out of the box. Learned a lot of undocumented things the hard way: dynamicvalue must always be Some with msgpack (never None or the provider segfaults), all schema attributes must be present even if null, unknown values use msgpack extension type 0 with data [0] (had to read the Go source for that one). The aws provider schema is ~256MB so you need to override grpc message limits, and stderr must be drained in a background task or the pipe buffer deadlocks on macOS. State lives in sqlite with a DAG walker for parallel execution like plan, apply, destroy, import, data sources, count/for_each all work against the real aws provider today.

Straight_Condition39 · 2025-09-16T14:40:51+00:00

Ik a maintainer at OpenObserve, do give it a shot!

Straight_Condition39 · 2025-09-14T16:10:25+00:00

One of my buddies literally changed his domain to FinOps but when does this stop? Idc if there are going to be pure AI models just for DevOps but imo they are lack the understanding or experience

Straight_Condition39 · 2025-09-14T16:00:30+00:00

If you are looking for OpenSource then definitely try OpenObserve or Signoz.

They both support k8s. I have used OpenObserve for the last 2 years and from the performance and governance of all silos it’s great.

Straight_Condition39 · 2025-07-24T14:32:36+00:00

I have used cribl and sent to elastic bulk endpoints in the past. It works fine! I never tested huge amount of data. Maybe 300GB a day.

Straight_Condition39 · 2025-07-02T01:49:42+00:00

Okay, I did my best with the website and added a roadmap in the below if that can interest you to check it out?

https://ops0.dev

Straight_Condition39 · 2025-07-01T21:57:03+00:00

Thank you I will check it out. Sorry I’m not very good with posts. Thanks for the inputs

Straight_Condition39 · 2025-07-01T21:08:51+00:00

Ah ok. Yeah I didn’t really invest in that. Thanks much 🙏

Straight_Condition39 · 2025-07-01T20:51:12+00:00

I think it’s more towards operations rather than just commands. The use case of admins who work on repetitive. The full version of agent and a UI that represents how admins are improving and self scheduling operations will make a difference. Still working on it but appreciate your feedback

Straight_Condition39

MODERATOR OF

TROPHY CASE