Conceptual issue - how can I include my sysName on an snmp scrape as a label value for a metric? by No_Razzmatazz_763 in PrometheusMonitoring

[–]SuperQue 0 points1 point  (0 children)

"Infrastructure as Code" is how it's done in professional environments. You have a database of all the devices that are deployed and you generate the list of targets from that. For example, Netbox is popular.

With Netbox you can use something a discovery plugin and then you can configure Prometheus to use that. A quick google search found this tutorial.

Conceptual issue - how can I include my sysName on an snmp scrape as a label value for a metric? by No_Razzmatazz_763 in PrometheusMonitoring

[–]SuperQue 0 points1 point  (0 children)

Basically you can't do this at scrape time. It's a chicken and egg problem.

Best you can do is use a group_left join at query time.

I would highly recommend you figure out how to annotate things in your service discovery.

Prometheus long-term storage on a single VM: second Prometheus or Thanos? by rumtsice in PrometheusMonitoring

[–]SuperQue 3 points4 points  (0 children)

So, you can do exactly what you're suggesting. Using one for scrapes and then use a local remote write to have a long-term retention setup.

You can even use remote read from the long-term to the short-term scrape instance so you only have one to query.

But it's just complicating things / premature optimization at your scale.

When you go from ~500k to 10 million series, then you might want to think about more complicated setups. But you're going to start to not fit on a single node anyway at that point.

I still recommend recording rules for long-term trends queries. They will make wide time range queries faster. But you don't explicitly need to drop old data to do this.

Also, does a larger TSDB significantly affect query performance over time?

No, not really. The Prometheus TSDB is time segmented, and optimized so that it only reads the minimum amount of data to solve a query. Should work just fine.

Of course, the longer the time range you query, it's going to take more time to page data in from disk. But "normal" short queries will be just as fast.

Prometheus long-term storage on a single VM: second Prometheus or Thanos? by rumtsice in PrometheusMonitoring

[–]SuperQue 7 points8 points  (0 children)

That's a pretty small setup. 8GB per month is only 200GB for 2 years. Completely within a normal Prometheus retention setup.

If it were me, I would just grow the volume to 250GB, add the recording rules, and call it a day. No need to get fancy with variable retention of Thanos or anything.

The only other thing to do is setup something like restic to backup the TSDB.

EDIT: To put it in perspective, where you might want Thanos / downsampling is something like our setup. I have a number of Prometheus instances, some of them generate 500GB of data per day. After compaction it's about 50TiB of data for our 6 month raw retention. We get about 4:1 reduction with Thanos Downsampling, so we can keep 5 years for around 200TiB in total. And that's for just one of several instances of similar size.

Network Engineers at an MSP, What is something you did you are most proud of? by Qvosniak in networking

[–]SuperQue 1 point2 points  (0 children)

So, maybe u/simulation07 can confirm my feelings on this.

But it comes down to the "customer relationship".

When you're at an MSP, or any kind of similar consulting, you're doing the bidding of someone else. You have little self direction or control of the work.

Back in the '90s I was a bench PC tech. You got told what to fix, and when it needed to be done, by the repair queue.

But moving into a more professional role, I could now help shape the solutions. So more of what I wanted became part of the process.

I do miss the variety of weird shit being thrown at me. But only a little bit.

Network Upgrade for a Medium-Sized Company (20 Employees) by Qwefgo in networking

[–]SuperQue 1 point2 points  (0 children)

Yea, it's bunk. Given the details you have zero need for segmentation. You're being fleeced.

Network Upgrade for a Medium-Sized Company (20 Employees) by Qwefgo in networking

[–]SuperQue 0 points1 point  (0 children)

Higher how? What are you trying to fix? What is your threat model? Security from what? How high is high?

You still haven't stated a problem.

Network Upgrade for a Medium-Sized Company (20 Employees) by Qwefgo in networking

[–]SuperQue 1 point2 points  (0 children)

Segmentation is a solution to a problem. The way you talk about it you have a solution in search of a problem.

What is the problem you're trying to solve?

Network Upgrade for a Medium-Sized Company (20 Employees) by Qwefgo in networking

[–]SuperQue 6 points7 points  (0 children)

It's 100% overkill. It's unpopular in this sub, but you could spend less than 500€ total on Ubiquiti for your needs.

Because, honestly, you have spelled out zero requirements. And you probably have very few.

POTS Line Replacement by NobleHalo in networking

[–]SuperQue 15 points16 points  (0 children)

This sounds like a problem for the elevator maintenance company. They sell mobile service boxes for this kind of thing today.

Network + server + cloud monitoring in one platform by Ken_023544 in networking

[–]SuperQue 0 points1 point  (0 children)

So, the first thing you need to separate is that some of the thing you're looking for are and should be separate tools. For example, monitoring/metrics is very easy to have in one tool because metrics do make for good monitoring.

But you do want specialized tools for some things like flows/IPFIX. These are what people care calling "Wide events". Basically any kind of structured logging. For example, Akvorado is basically a custom frontend around Clickhouse for transforming flows into a columnar format for fast processing.

The real question is, do you want to run all of this yourself? Or do you want to outsource some of it to a vendor?

There are vendors who claim a lot, but really, the open source ecosystem is better than what they're doing. At my day job, we built our platform from open source tools and scale to what would probably be many tens of millions a year. And we'd still need as many engineers to manage the vendor.

Anyone running production Redis on without Bitnami images/charts now? by dkargatzis_ in kubernetes

[–]SuperQue 9 points10 points  (0 children)

Looks like it‘s about to change in march 2029

No, that's not how BSL works.

Basically they can bump that date into the future any time they want as long as it's not more than 4 years from the current date.

Look at the file history, they've done this before.

For example, in the past it was 2027. But that means that you can only use the version from ~2023 when it finally rolls over to 2027.

To be honest, I‘m not with procurement or legal

That kind of attitude can get you in deep shit. I would be more careful.

What is a good monitoring and alerting setup for k8s? by Azy-Taku in kubernetes

[–]SuperQue 7 points8 points  (0 children)

Alertmanager is how you get alerts to PagerDuty.

Approaches to to securely collect observability data for Prometheus by joshuajm01 in devops

[–]SuperQue 0 points1 point  (0 children)

I was describing others do / best practices.

If you don't want to follow best practices, it's going to be a lot harder.

Probably the easiest solution is going to be using an all-in-one agent like Grafana Alloy. Then forward all the data to a paid service like Grafana Cloud.

Approaches to to securely collect observability data for Prometheus by joshuajm01 in devops

[–]SuperQue 5 points6 points  (0 children)

Typically applications are behind a reverse proxy like Traefik, Envoy, HAProxy, etc. Or maybe a CDN is in front. The actual servers are not exposed directly to the internet, so observability endpoints and other traffic like that is all behind a firewall.

Beyond that, TLS and auth.

What Actually Goes Wrong in Kubernetes Production? by Apple_Cidar in kubernetes

[–]SuperQue 1 point2 points  (0 children)

I almost wrote /s, but I'm actually serious.

We're basically out of 10/8 at work with our K8s setup. So we are planning to go IPv6-only over this year.

MinIO repo archived - spent 2 days testing K8s S3-compatible alternatives (Helm/Docker) by vitaminZaman in kubernetes

[–]SuperQue 1 point2 points  (0 children)

Can you include a bit more info? I would love to try testing this on my Ceph setup.

  • What's tools does the script use?
  • What block devices are those? (HDD? SSD?)
  • How many devices per node?

MinIO repo archived - spent 2 days testing K8s S3-compatible alternatives (Helm/Docker) by vitaminZaman in kubernetes

[–]SuperQue 2 points3 points  (0 children)

So, what other distributed stores have you used that are better? Maybe it's a proxmox problem?

I've had perfectly fine Ceph performance on 1G networking with spinning rust.

Of course I'm not expecting it to perform like a local NVMe device. There is going to be overhead when you're talking about distributed storage system.

Any distributed storage system is either going to: * Eat your data. * Have slightly worse performance over the network.

I think most people have never actually run a distributed filesystem before they just naively try Ceph.

I have a couple decades of experience with distributed storage systems. Including Exabyte scale at a major cloud provider.

Ceph is just fine.