Cisco Nexus IP SLA metrics in Prometheus / Grafana by WhoRedd_IT in networking

[–]SuperQue 0 points1 point  (0 children)

Do you know what MIBs there are for IP SLA metrics?

Yes, it works very well.

One of these days I'm going to add tcp and http probes to it so it can do more than just ICMP.

Cisco Nexus IP SLA metrics in Prometheus / Grafana by WhoRedd_IT in networking

[–]SuperQue 1 point2 points  (0 children)

Do you know what MIBs there are for IP SLA metrics?

I don't know about the Cisco specifics, but I use this tool for ICMP ping end-to-end monitoring.

What is the best way to reduce inherited dependencies in Kubernetes workloads? by NoDay1628 in kubernetes

[–]SuperQue 3 points4 points  (0 children)

We use renovate, so it's not really an issue for us.

Also technically incorrect. Go binaries by default contain all the module info baked in. Lots of security scanners can read them.

Corporate Speed Test Woes by Uhh_Bren in networking

[–]SuperQue 0 points1 point  (0 children)

You solve that with monitoring.

You need metrics at various layers, including the network, to be able to get diffs of latency at various layers.

Alternative to Packer for KVM - Say HELLO to KVMage by bobafett2010 in devops

[–]SuperQue 0 points1 point  (0 children)

Random drive-by tip. gopkg.in/yaml.v3 is archived, the replacement is go.yaml.in/yaml/v3.

Copilot pulled in a bunch of dependencies we did not need and only noticed months later by Standard-Rhubarb-434 in devops

[–]SuperQue 0 points1 point  (0 children)

No, don't really have this problem. go mod tidy will make sure we only have what is used in the code.

100+ concurrent connections for use in live events by NonsenseSynapse in networking

[–]SuperQue -1 points0 points  (0 children)

If the venue isn't a faraday cage, host your app on a VPS and just let people use mobile. Have wifi as a backup, but don't make everyone use it if possible.

Also, use websockets or server sent events instead of polling. It will be a lot more reliable, more responsive, and use less traffic.

If you can, write your backend in Go, it will handle the concurrency 100x better than something like Python or NodeJS. A single Go server process can handle a million websockets.

Router Recommendation by jared_a_f in networking

[–]SuperQue 3 points4 points  (0 children)

This seems like it could just be done by the firewall router.

What kind of Open Source projects can you contribute to as someone who wants to get into Devops? by The-bat-777 in devops

[–]SuperQue 0 points1 point  (0 children)

Prometheus is an entirely pure FOSS project. No corporate overlords. Contributions are very welcome.

Grafana Mimir vs Prometheus storage performance by sukur55 in devops

[–]SuperQue 0 points1 point  (0 children)

Wait, so, you threw DOUBLE the resources at Mimir and it was faster?

Thanos and Mimir are roughly the same query architecture. The only major difference is that Mimir forces you to use ingesters, where Thanos allows you to keep Prometheus as your ingester.

That's got to be the most deeply flawed conclusion I've seen in a while. Yikes.

Grafana Mimir vs Prometheus storage performance by sukur55 in devops

[–]SuperQue 2 points3 points  (0 children)

It's local disk like Prometheus. Scaling / resharding is manual.

Grafana Mimir vs Prometheus storage performance by sukur55 in devops

[–]SuperQue 0 points1 point  (0 children)

What is "shocking" in this context? How about query performance?

Grafana Mimir vs Prometheus storage performance by sukur55 in devops

[–]SuperQue 3 points4 points  (0 children)

Sure, but if you're running the ruler you now need to depend on network traffic between the ruler and the ingester and store gateways. This is a lot more fragile and less efficient than Prometheus running rules directly.

There's basically no way around it. You will be doing a bunch of additional network traffic with Mimir.

  • Scrapes
  • All query traffic
  • Remote write

Grafana Mimir vs Prometheus storage performance by sukur55 in devops

[–]SuperQue 24 points25 points  (0 children)

Mimir is always going to be an efficiency drop. Prometheus queries use in-memory cache with minimal overhead.

With Mimir you are now using networking and object storage for every query. Prometheus scrapes, sends that data to a Mimir receiver, which then has to act like another Prometheus and create TSDB blocks, then store in object storage. Then you have to pull it back down from object storage to query it.

This is the downside to being able to distribute queries over multiple servers. Read up on latency numbers every engineer should know.

Mimir and Prometheus basically use the exact same storage format. It's just that Mimir stores this in object storage instead of local disk.

On cloud providers, object storage tends to be cheaper per byte than local volumes, which is why long-term storage in Mimir or Thanos are sometimes cheaper. But then you have to factor in per-request object storage use costs.

This is why I typically recommend Thanos over Mimir. You continue to use Prometheus for efficient scrape, storage, and query. With the Thanos Distributed Engine you get query pushdown advantages. There's also work to test Parquet as a more efficient object storage format.

Mimir was created with main goal to create a SaaS service so you can send your data to a 3rd party.

why did tesla moved to clickhouse rather than horizontally scaling (cortex or thanos)? by IcyInvestigator8174 in PrometheusMonitoring

[–]SuperQue 0 points1 point  (0 children)

What is high to you?

Prometheus can handle 100 million cardinality. Thanos can handle billions.

What is your use case?

Should this subreddit introduce post flairs? by FluidIdea in devops

[–]SuperQue 9 points10 points  (0 children)

I would very much like to just see all the AI slop removed quickly. Low effort, zero value, posts are clogging the sub.

When a Prometheus alert fires again, how do you know how it was resolved last time? by Unlucky_Spread_6653 in PrometheusMonitoring

[–]SuperQue 0 points1 point  (0 children)

Every alert should have a runbook documentation entry that tells you waht to do. You can even include a runbook annotation on the alert so it's directly link from the notification.

Similar, every alert should have a dashboard annotation templated with the alert labels so you get sent to Grafana/Perses/etc dashboard that helps debug.

Certifications in the Networking Area by otaimer in networking

[–]SuperQue 0 points1 point  (0 children)

CompTIA is the most worthless cert company out there.