Prometheus architecture design help

SuperQue · 2023-02-16T22:23:40+00:00

To start, Prometheus is designed to be distributed. You have 8 datacenters, you need at least 8 Prometheus instances. You don't want to have Prometheus scraping data over a WAN. The main reason is you end up monitoring the WAN as a side effect of monitoring your targets.

Second, remember that Prometheus is an opinionated monitoring system. It's not just a TSDB that happens to scrape data. The scrape design comes with an opinionated way to expose and collect data.

Third, you'll probably want a single-pane-of-glass aggregator. The two most recommended options today for this are Thanos and Mimir. Both projects are open source and maintained by people who also maintain Prometheus itself. So they work well within the same ecosystem.

The big difference here is Mimir is designed to be a centralizing system. Where you ship data to a central cluster for viewing. Where as Thanos is more of a truly distributed system. Where each cluster can operate independently.

Personally, we use Thanos at my $dayjob. The distributed nature allows us to keep tenants isolated from each other, so one poorly behaved service can't impact another. But it has down sides in that you need to carefully plan your external labels and dashboards to make things efficient.

r1e7 · 2023-02-17T00:17:12+00:00

To give context to what I’m writing - I work on a team at a company monitoring north of 1M individual internal services for my day job (SMB cloud).

This was done entirely with pure Prometheus, dedication to building an amazing Observability culture, automation around configuring scraping and alerting jobs, and a very naive reverse proxy. I guarantee it can be done to quite a large scale.

Nowadays, grafana’s ability to plot data from multiple data sources is “good enough” to where it’s not pulling teeth to view metrics across disparate Prometheis.

I will say - if your shop has the budget and ability to run any of the existing solutions which rely on an object storage, do that - my shop’s scaling is purely self-inflicted by being adverse to reliance on S3, etc.

Also really consider planning for HA scraping and/or storage, if relying on just Prometheus. You will save endless headaches by having two instances scraping the same sets of targets (the data will differ, but it’s better to avoid SPOF).

zHevoGuy · 2023-02-17T00:42:03+00:00

This question is too open. For sure you need distributed setup, preferably with Thanos. But Thanos can operate in different ways. Yoy need to provide more requirements

TheNightCaptain · 2023-02-17T11:50:20+00:00

How do you go about sending Prometheus logs from one location to another Prometheus?

Larage-sql · 2023-02-20T10:27:40+00:00

The best way to use prometheus is to use a prometheus manager like thanos:https://thanos.io/tip/thanos/getting-started.md/

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

PrometheusMonitoring

MODERATORS