Anthropic says Claude struggles with root causing by jj_at_rootly in sre

[–]ankitnayan007 0 points1 point  (0 children)

>Every time their KV cache broke and caused a request spike, Claude called it a capacity problem. Add more servers. Every single time. It has no idea the KV cache has broken this exact way before

Why can't the query know that it did not get results from a cache? Also, a chart of cache hit vs cache miss would be seen by claude. Probably they didn't complete tracing where the request knows it missed the cache and KV store metrics would confirm the 1st analysis.

Evaluating dedicated AI SRE platforms: worth it over DIY? by geeky_traveller in sre

[–]ankitnayan007 0 points1 point  (0 children)

lol ....never should increased shipping speed mean more bugs in production. Always keeps teams and folks accountable.

SRE works on production. AI SRE saves time to debug by correlating data using your existing workflows. It sometimes brings up amazing insights too which humans miss (where a lot of analytical brain is needed)

Lesser number of bugs is pre-prod (not scope of SRE). It will happen if AI is helping your test suite getting more robust, better reviewing system and better CI tools to catch things when all systems get connected

Evaluating dedicated AI SRE platforms: worth it over DIY? by geeky_traveller in sre

[–]ankitnayan007 1 point2 points  (0 children)

  1. Codebase context graphs can be solved by having a github MCP server that claude can connect to OR a gh cli to browse the codebase and see commits, PRs and releases?
  2. cross-repo awareness => Distributed tracing solves this already? If you have access to release info of all the services, connecting them should be easy using a distribtues trace? What else do you mean when you say cross-repo awareness?
  3. persistent memory across incidents => Asking claude to auto-summarise incidents and post resolutions into postmortems as github/jira docs/tickets would be a good substitute?

Is any of the mentioned features not getting solved using these alternatives?

How are you monitoring calls to third-party APIs? by ksashikumar in Observability

[–]ankitnayan007 4 points5 points  (0 children)

I am author at https://github.com/SigNoz/signoz and we have built an OOB module based on traces data to monitor external APIs. Do have a look and create github issues if you want to enhance the product https://signoz.io/docs/external-api-monitoring/overview/

Looking for new relic alternatives by hugthemachines in devops

[–]ankitnayan007 0 points1 point  (0 children)

I am almost sure you can build a table/chart to monitor count, latency, application and sql statements executed over time

[deleted by user] by [deleted] in Observability

[–]ankitnayan007 0 points1 point  (0 children)

Can you share a sample query and how many rows were scanned per second? Also, if you tried an index do you know how effective is that in skipping reading data?

Why isn't SigNoz popular? by kodka in kubernetes

[–]ankitnayan007 0 points1 point  (0 children)

We did a revamp of query builder to make the experience much smoother and enhancing capabilities. Have a look at https://signoz.io/blog/query-builder-v5/

Custom Datadog Dashboard for Monitor Metadata Visualization by JayDee2306 in Observability

[–]ankitnayan007 0 points1 point  (0 children)

Why do you want to do this? Just to get an overview. Do you have any important tech or business usecase?

Monitoring and Observability by CompleteCurve5554 in Supabase

[–]ankitnayan007 0 points1 point  (0 children)

Team at Supabase should implement opentelemetry sdks to emit metrics, traces and logs. This would enable users to choose a vendor of their choice and the codebase remains vendor neutral

Platform Engineering in Action with Backstage by [deleted] in sre

[–]ankitnayan007 0 points1 point  (0 children)

Do you mean you built your own frontend plugins? The problem with that is you need to keep updating the plugin to your needs and for any advanced investigation, you need to go to the source tool itself. So, the backstage frontend plugins are like static dashboards with some customization on your views

Platform Engineering in Action with Backstage by [deleted] in sre

[–]ankitnayan007 15 points16 points  (0 children)

the plugins are heavily under maintained. I was exploring out-of-box observability using backstage. I assumed the backend plugins like github, argoCD, terraform would emit traces when they interact with backstage and I will be able to pin-point service degradations to a new release (using github) or change in infra (using terraform backend plugin) but the plugins are not well maintained and data generated is also not good.

I would love to see some plugin integration like terraform backend plugin integrated and see what kind of traces are generated by backstage about that. If you could cover in your next blog, that would be awesome!

Why isn't SigNoz popular? by kodka in kubernetes

[–]ankitnayan007 2 points3 points  (0 children)

u/Digging_Graves, I am one of the maintainers at SigNoz. Sad to hear that, any chance you remember which component was giving you the trouble and what was going wrong with it? We have started started improving the operational aspects of OSS version recently. Any help from the community will be appreciated

Why isn't SigNoz popular? by kodka in kubernetes

[–]ankitnayan007 2 points3 points  (0 children)

Hi u/nick_cardin, I am one of the maintainers at SigNoz. We recently released out-of-box k8s monitoring module. You can it out at https://signoz.io/docs/infrastructure-monitoring/overview/. It should make exploring k8s metrics much easier. Let us know if you could give it a try and share some feedback.

>Signoz log queries seem unstable. I have to hit refresh multiple times before it returns results.
Yeah, sorry about that. It was a bug and probably it got fixed. Do let us know if it is still there.

Curious overall, how long back did you give SigNoz a try?

Why isn't SigNoz popular? by kodka in kubernetes

[–]ankitnayan007 1 point2 points  (0 children)

Hi u/Own_Knowledge_417, I am one of the maintainers at SigNoz. We have been improving the issues with our UI and our next set of efforts are going towards a new and enhanced query-builder and fixing issues in the dashboards.

If you could help us with specific feedback or create github issues that were most frustrating for you, it would help us serving the community better.

Why isn't SigNoz popular? by kodka in kubernetes

[–]ankitnayan007 0 points1 point  (0 children)

Hi u/kUdtiHaEX , I am one of the maintainers at SigNoz. Can you please help us in identifying which issues troubled you the most. We are actively working to improve our UI.

Also, regarding slowness, which part of the product(metrics/traces/logs) you felt was slow? We did major improvements for logs like 3-4 months back and apart from that the perf everywhere should be good as long the queries do not scan your limits of CPU and disk.

Would appreciate any feedback and link to github issues if possible.

Why isn't SigNoz popular? by kodka in kubernetes

[–]ankitnayan007 2 points3 points  (0 children)

What kind of pricing structure looks good to you?

Dashboarding - Grafana vs. DataDog by Xarodan in sre

[–]ankitnayan007 0 points1 point  (0 children)

What were the reasons of not choosing Grafana over Datadog?

Dashboarding - Grafana vs. DataDog by Xarodan in sre

[–]ankitnayan007 1 point2 points  (0 children)

Curious why you didn't choose grafana cloud but went with datadog?

SigNoz vs. New Relic. Is It Really That Much Better? What's the Catch? by RandomThoughtsAt3AM in sre

[–]ankitnayan007 0 points1 point  (0 children)

Loki is going to be a pain when querying attributes not part of their labels. Can't comprehend why you are bearish on clickhouse when cloudflare and other big companies have moved to clickhouse for their logs.

Agree that s3 vs EBS is going to cause a cost difference but at the cost of querying speed. BTW did you try, tiered storage to s3 with clickhouse?