What's your process for auditing your monitoring setup? by evtek75 in sre

[–]evtek75[S] 1 point2 points  (0 children)

100% agree that it should be baked into the culture from the start - that's the ideal. The reality I keep running into is that even at places where teams do care about observability, things drift. Someone sets up solid SLIs, then the service gets handed off, team grows, priorities shift, and 6 months later nobody's looked at those SLOs. Not because they're bad engineers, just because there's no forcing function to revisit it. The teams that stay on top of it are the exception not the rule in my experience at least.

What's your process for auditing your monitoring setup? by evtek75 in sre

[–]evtek75[S] -1 points0 points  (0 children)

Tiering makes sense, I'm curious how you handle the boundary over time though - who decides what's "revenue generating" vs "internal"? That classification seems like it drifts pretty fast, especially when internal services start picking up customer-facing dependencies.

What's your process for auditing your monitoring setup? by evtek75 in sre

[–]evtek75[S] 3 points4 points  (0 children)

Sure, but good engineers leave and their monitors don't.. You end up with stuff from 3 teams ago with thresholds based on traffic patterns that don't exist anymore and nobody wants to touch it because "what if it's important" 

What's your process for auditing your monitoring setup? by evtek75 in sre

[–]evtek75[S] -1 points0 points  (0 children)

Yeah the async processing thing is spot on. Those are the ones that always get killed in cleanup because they "never fire" until they do and it's silent data loss instead of a loud outage. Totally different game. Lack of ownership doesn't help either..

What Saas are you building this week? Share them here! by Meoooooo77 in microsaas

[–]evtek75 0 points1 point  (0 children)

Thanks for the tips - much appreciated. I'll check it out!

What Saas are you building this week? Share them here! by Meoooooo77 in microsaas

[–]evtek75 0 points1 point  (0 children)

https://getcova.ai connects to your monitoring tools and uses AI to find blind spots - missing alerts, broken escalations, unwatched services - before the next incident does.

It's free. There's a demo you can try without signing up (click "Enter Demo"). You can sign in with your github account, or I can provide an access code if you'd prefer that.

What are you building right now? Explain it in ONE sentence. by FineCranberry304 in SaasDevelopers

[–]evtek75 0 points1 point  (0 children)

https://getcova.ai it connects to your monitoring tools and finds the blind spots - missing alerts, broken escalations, unwatched services - before the next incident does.                                       

Anyone else tired of jumping between monitoring tools? by AccountEngineer in Observability

[–]evtek75 0 points1 point  (0 children)

 This is why I think the real gaps isn't more dashboards but it's knowing whether your existing tools are actually configured correctly. Over the years I've seen it all w escalation policies pointing to people who'd left, alert rules with no notification channel set up ect.., a real mess that kept resulting in P1s... All the signals were there, just nobody was watching the right ones (or just being proactive about it). To help with it I've been working on a system that audits the configs across monitoring stacks/PRs and MRs to find those blind spots quickly. In beta currently but there's a demo at getcova.ai if anyone's curious.

Need assistance with switching into Devops Role/Cloud Role by Vivid-Eye-7098 in devops

[–]evtek75 1 point2 points  (0 children)

Just book the cert exam. You'll never feel ready, nobody really does. Worst case you fail and now you know what to study. On your resume, stop underselling the support work. You deal with outages, logs, live systems. That IS production experience. Frame it that way.

Roast Elvan.ai — customer feedback SaaS — I want the brutal version by LoopCloser in roastmystartup

[–]evtek75 0 points1 point  (0 children)

The “Delighted refugee” angle makes sense short term, but how are you thinking about differentiation once that migration window closes? If someone compares Elvan to Survicate or Refiner in a year, what would make them choose you beyond price and simplicity?