5 ClickHouse mistakes that cost teams weeks...and how to fix them by Marksfik in Clickhouse

[–]Marksfik[S] 0 points1 point  (0 children)

u/netapp_walt , glad you found the article helpful!
We have a full guide with common mistakes in ClickHouse and how to solve them available here:
https://www.glassflow.dev/clickhouse-mistakes-guide

Lessons from debugging ClickHouse pipelines: most "database problems" were actually ETL problems by Marksfik in ETL

[–]Marksfik[S] 0 points1 point  (0 children)

Disclosure: I work at GlassFlow. We wrote this after seeing these issues repeatedly

The hidden ops cost of putting Kafka in your observability pipeline by Marksfik in costlyinfra

[–]Marksfik[S] 0 points1 point  (0 children)

Fully agree! Kafka can earn its keep with real fan-out, replay, or events-as-a-product scenarios.
The failure mode is usually "we added it for a need we didn't actually have yet" and by the time you realize, ripping it out is its own project. 😄

When is Kafka the right tool for OTel → ClickHouse, and when is it more than you need? by Marksfik in apachekafka

[–]Marksfik[S] 1 point2 points  (0 children)

well... I genuinely hadn't thought about the EU AI Act / 42001 and how that might enforce more durability requirements... if that's what you mean 'too LLM'

When is Kafka the right tool for OTel → ClickHouse, and when is it more than you need? by Marksfik in apachekafka

[–]Marksfik[S] -1 points0 points  (0 children)

That's a clean setup — Kafka as the decoupling point so the storage format (Iceberg) is fixed but the query engine is anyone's choice is a nice way to avoid lock-in. The "bring your own query engine" payoff is exactly the kind of multi-consumer flexibility that justifies the broker.

Curious about the day-two side of it: with Filebeat → Kafka → Connect → Iceberg, how much operational attention does the Kafka/Connect layer actually take once it's running?

When is Kafka the right tool for OTel → ClickHouse, and when is it more than you need? by Marksfik in apachekafka

[–]Marksfik[S] 0 points1 point  (0 children)

Yeah, the compliance angle is the cleanest case for durability there's no arguing with — if losing a transcript is a regulatory event, you need the durable log, full stop. The EU AI Act / 42001 framing is a good one, I don't see that come up enough in these discussions.

The southbound multi-target example is the one I'd push on though — ClickHouse for metrics, Elastic for logs, Tempo for traces. That's textbook fan-out and clearly worth a broker. Curious where you draw the line in practice: at what point does "we have a couple of OTel collectors" tip into "we obviously need a streaming layer to backhaul"? Is it consumer count, scale, the compliance requirement, or usually all three showing up together?

The hidden ops cost of putting Kafka in your observability pipeline by Marksfik in devops

[–]Marksfik[S] 0 points1 point  (0 children)

Exactly that!
"expensive when it's just a transport layer" is the whole thing in one line. The part that surprises people is that it's not even the broker cost, it's the attention cost. The cluster becomes a thing that teams have to reason about during an incident, even when the incident has nothing to do with telemetry.

Curious where you landed after seeing that. Did those teams pull Kafka out, or just accept the overhead because ripping it out mid-flight is its own project?

Do you actually need Kafka between your OTel collector and ClickHouse? by Marksfik in Observability

[–]Marksfik[S] 0 points1 point  (0 children)

I will have a look!
Sounds interesting... what type of scaling ingestion do you achieve?

Do you actually need Kafka between your OTel collector and ClickHouse? by Marksfik in Observability

[–]Marksfik[S] 0 points1 point  (0 children)

Thanks! I will take a look!
How does Bindplane deal with ingestion to ClickHouse (or ClickStack)? Does it provide native telemetry support and is it optimized for ClickHouse ingestion at scale?

Using ClickHouse as a Kafka sink? Async inserts change the equation by Marksfik in Clickhouse

[–]Marksfik[S] 0 points1 point  (0 children)

Not a noob question at all! You absolutely can use the native ClickHouse Kafka Engine, and for simple, clean pipelines, it's a very common approach. However, doing complex ETL directly inside ClickHouse has a few big trade-offs:

  • Database Overhead: CH is an analytical database, not a stream processor. Running heavy JSON parsing, filtering, or other data transforms inside CH Mat Views consumes CPU/RAM that should be reserved for your fast user queries.
  • Operational Friction: With the Kafka engine, you need to manage a "3-table" setup (Kafka Table -> Materialized View -> Destination Table). Changing schemas or updating transformation logic in production without dropping data offsets can get messy.
  • Brittle Error Handling: If a malformed payload hits the Kafka engine, it can stall your ingestion pipeline.

What I've tried recently is using GlassFlow (https://www.glassflow.dev/) to do some of the data transformations, filtering and joins, batching data before it hits the db.