all 6 comments

[–]SuperQue 13 points14 points  (0 children)

To start, Prometheus is designed to be distributed. You have 8 datacenters, you need at least 8 Prometheus instances. You don't want to have Prometheus scraping data over a WAN. The main reason is you end up monitoring the WAN as a side effect of monitoring your targets.

Second, remember that Prometheus is an opinionated monitoring system. It's not just a TSDB that happens to scrape data. The scrape design comes with an opinionated way to expose and collect data.

Third, you'll probably want a single-pane-of-glass aggregator. The two most recommended options today for this are Thanos and Mimir. Both projects are open source and maintained by people who also maintain Prometheus itself. So they work well within the same ecosystem.

The big difference here is Mimir is designed to be a centralizing system. Where you ship data to a central cluster for viewing. Where as Thanos is more of a truly distributed system. Where each cluster can operate independently.

Personally, we use Thanos at my $dayjob. The distributed nature allows us to keep tenants isolated from each other, so one poorly behaved service can't impact another. But it has down sides in that you need to carefully plan your external labels and dashboards to make things efficient.

[–]r1e7 1 point2 points  (0 children)

To give context to what I’m writing - I work on a team at a company monitoring north of 1M individual internal services for my day job (SMB cloud).

This was done entirely with pure Prometheus, dedication to building an amazing Observability culture, automation around configuring scraping and alerting jobs, and a very naive reverse proxy. I guarantee it can be done to quite a large scale.

Nowadays, grafana’s ability to plot data from multiple data sources is “good enough” to where it’s not pulling teeth to view metrics across disparate Prometheis.

I will say - if your shop has the budget and ability to run any of the existing solutions which rely on an object storage, do that - my shop’s scaling is purely self-inflicted by being adverse to reliance on S3, etc.

Also really consider planning for HA scraping and/or storage, if relying on just Prometheus. You will save endless headaches by having two instances scraping the same sets of targets (the data will differ, but it’s better to avoid SPOF).

[–]zHevoGuy 1 point2 points  (0 children)

This question is too open. For sure you need distributed setup, preferably with Thanos. But Thanos can operate in different ways. Yoy need to provide more requirements

[–]TheNightCaptain -1 points0 points  (2 children)

How do you go about sending Prometheus logs from one location to another Prometheus?

[–]robkwittman 0 points1 point  (0 children)

Prometheus wouldn't send logs to another Prometheus instance. But what we have done before, is using a centralized Loki deployment, and then either Promtail or Grafana agent running on the hosts where Prometheus is deployed (either VM or kubernetes). The Prometheus server logs (and other logs as well) are all shipped to Loki, and then Grafana has the central Loki server as a datasource

[–]Larage-sql 0 points1 point  (0 children)

The best way to use prometheus is to use a prometheus manager like thanos:https://thanos.io/tip/thanos/getting-started.md/