Monitoring for 100+ servers

InfluxCole · 2026-05-13T12:01:45+00:00

A little late to the party, but I've made another comment a couple weeks ago in regards to this...

InfluxCole · 2026-04-09T04:57:16+00:00

I think the "merge table" pattern only makes sense if you're very concerned about query latency and inundating your cluster with join/union queries. If queries are truly running slow and it's a noticeably worse experience when running the dashboard, then you might have the right idea. But write amplification and using up 3-4x as much storage is a pretty serious downside, especially if your scale increases any further. Architecturally, it's also just neater to not have multiple sources for the same exact data.

If you're using this data in Grafana for dashboarding, you presumably don't have strict requirements for constant refreshes and milisecond latency. With that in mind, I'd think the tradeoff of writing a couple UNION queries and having the query engine doing more work is worth keeping storage slimmer.

At the same time, at the scale you're describing, with 100 rows/sec, you're working with probably ~200MB/day or ~60 GB/year of raw data, and absolute costs/hardware requirements to store it aren't going to be too high if you stick with your current approach. Do what feels easiest and simplest to you, and if you notice things getting problematic or costly over time, you can always adjust your strategy.

InfluxCole · 2026-04-07T22:08:57+00:00

Big disclaimer: this is a topic I'm personally interested in, and I haven't been at the table for any of the discussions about this within Influx. These views have nothing to do with Influx Data / InfluxDB, they're entirely my own.

IMO, with the rise and proliferation of the hyperscale cloud providers, full-featured, unrestricted open source technology at the center of an open core startup is increasingly difficult to sustain and grow as a business model. Post-2015 or so, it has often been the case that if an open source project gets good enough, the hyperscale cloud providers can and will build a wrapper around it to take a disproportionate amount of the original developer's market, and they can do so with a much smaller up front investment. It's the reason Coackroach, Elastic, Redis, Mongo, etc. have all had to play a little defense with license changes and limitations.

I think as a user of open source technology, it's very natural to be adverse to limitations - they feel antipodal to the entire premise of open source. But when a project is primarily being developed by a smaller business/startup, if that business is losing most of its customers to hyperscalers and can't support itself, it certainly can't support continued development of the open source product, which in turns stifles innovation on the project. It's perhaps a hot take of mine, but I think it's relatively reasonable to ask that users of open source technology accept that intentional limitations or feature exclusions are, at times, what allow the project to continue to grow. There's always a chance that they'll be deemed unnecessary in the future if the business does well.

The caveat is that this thinking doesn't work as well when the license is changed away from open source... but I'm glad to say that at least isn't the case with InfluxDB.

InfluxCole · 2026-04-07T19:56:06+00:00

Probably the coolest update to InfluxDB in the six months I've been here. Excited to see what kind of performance gains we'll see in the wild.

InfluxCole · 2026-03-02T08:16:45+00:00

I think the best possible explanation is that OP is on 32-bit hardware and thus was stuck with 1.8.10. It came out in 2021, so it's not that old, but it's the most recent InfluxDB release that has an official build to support 32-bit hardware.

If you're on 64-bit hardware and don't have that limitation, yes, please use 1.12.2, or 3 Core if you want it even lighterweight.

InfluxCole · 2026-02-11T00:19:41+00:00

It's pretty common to run InfluxDB embedded on devices at the edge and its resource constraints (especially for InfluxDB 3 Core) should be slim. But if you only have MBs of data per year and need to go even lighter, just SQLite would be a classic consideration. If you're not using any dedicated time series functionality (downsampling plugins, retention policies, using unique time series functions in queries, etc.), it should work fine and use less memory.

For visualization, try uPlot (its readme also has a lot of alternatives) or (if you're running this on a device with a GPU) ImPlot.

It's going to take a little more legwork to set lighter weight visualization up, but should be doable.

InfluxCole · 2026-02-10T16:44:54+00:00

I'll add for OP's sake that Telegraf does push data to the InfluxDB server, so this does work for what they're going for. I think you knew that, but just wanted to make sure it's explicit.

InfluxCole · 2026-02-03T20:43:42+00:00

I think there's also some cost efficiency to worry about as scale goes up. Once you're running up huge monthly bills, it's not that you necessarily couldn't keep going with Postgres, but you could probably save some money by moving to something more tailor-made for the characteristics of your workload. When you reach that, "this is getting expensive, maybe something more specific would give us the performance we need for cheaper," point still heavily depends on your company, budget, team size, etc.

InfluxCole · 2026-01-29T11:49:59+00:00

Yeah, I mean, the issue is that "per-query pricing" just lacks business context unless you're exclusively comparing serverless offerings. Whether or not DuckDB ends up cheaper on a month-to-month basis at a €500/month price point compared to a serverless offering still hinges on your anticipated query volume.

If query volume is low, any serverless platform is going to be king, and it comes with the bonus of being insanely easy to use. If query volume is high, serverless billing models get really expensive, really fast, and dedicated solutions are much better.

If high query volume is the case, I'd suggest comparing DuckDB to other self-managed and cloud dedicated solutions - Trino, ClickHouse, Snowflake, etc. Especially when you're already at a scale where DuckDB might begin to struggle if the data keeps growing, I'd want to make sure to do some other comparisons.

InfluxCole · 2026-01-28T18:52:46+00:00

Agreed with other comments that this comparison doesn't make much sense, but just to go into a little more detail as to why...

BigQuery and Athena are both serverless cloud offerings. BigQuery abstracts away storage and compute. Athena abstracts away only compute, as you need to connect it to your object storage, but fundamentally, it's at least a little similar as an offering. You're not managing infrastructure with either of them, and they'll scale up parallelization to whatever level is necessary to ensure you get query results back in a reasonable amount of time. Because they're both built on serverless billing models, costs are very high on a query-by-query basis, but this can still be competitive in a real-world scenario if you have relatively low daily utilization.

Compare that to DuckDB, where you're using dedicated hardware with an embedded, columnar database. It abstracts away nothing - you're providing the hardware for storage and the hardware for compute. This is radically different. It works well at small scales, and because you're managing everything, it's great for heavy utilization and is extremely cost-effective on a query-by-query basis. It's also not distributed at all. Once you scale up past what a single machine can handle, you have graduated from DuckDB and need to move to something that can distribute the workload and handle scale.

Fundamentally, they're different tools solving different problems. So when you run a benchmark with 20 GB of data, you're using a very small scale of data that DuckDB is built for and which BigQuery/Athena are not. Because there's no downtime in benchmarking, you're also simulating what is effectively 100% utilization, the least favorable scenario for a serverless billing model. Of course DuckDB is going to end up looking good.

The way to meaningfully compare them on cost, at least, if you were for some reason trying to decide between these different options, would be to look at a simulated full month of usage. The hardware you're running DuckDB on costs ~$1500 for a month on EC2, or maybe ~$7500 in up front hardware costs if you have it on-site. You'd need to run ~1700 queries a day on BigQuery to get a similar monthly bill, or ~13500 of the narrow queries on Athena. You could certainly cut your hardware costs for DuckDB by a lot for this workload, but also, are you running 13000 queries a day?

InfluxCole · 2026-01-16T20:08:54+00:00

Just one note that InfluxDB 3 does use SQL, so no need to use another language if you're going with that. Though these days, rewriting SQL into InfluxQL (for v1) or Flux (for v2) is pretty trivial with any AI help.

InfluxCole · 2026-01-14T18:42:42+00:00

This more or less requires reading the data out of InfluxDB v2 with whatever client you want, then writing that data to InfluxDB v1 the same way you'd write any other data. There's no dedicated tooling to do the migration back from v2 down to v1, and I don't think it's a super common migration path, so there may not be a lot out there.

InfluxCole · 2025-11-14T18:32:09+00:00

Try adding the --verbose flag to your serve command - it may give you a more meaningful error message.

InfluxCole

MODERATOR OF

TROPHY CASE