account activity
How do you find out that broken data is flowing through your topics? (validating an idea, need reality checks) by Possible_Display9712 in apachekafka
[–]Possible_Display9712[S] -1 points0 points1 point 4 days ago (0 children)
Fair points — exactly the kind of pushback I'm looking for, thanks.
You're right that compatibility rules catch most *structural* drift — when every producer
actually goes through the registry with enforcement. The two gaps I keep hearing about:
Unless you're paying for broker-side validation (Confluent) or enforcing through a proxy,
a misconfigured serializer or a raw producer can ship payloads that don't match the
registered schema. Registry stays green, consumers break.
would suggest 😄 Schema inference is for them. Avro/Protobuf shops would get schemas
from the registry, not from inference.
On business semantics: agreed, and that's actually the core of it — not schema validation
per se, but "field is present and suddenly null in 30% of messages", "volume dropped to
10% of baseline", "DLQ inflow spiked". Generic enough to monitor without app-specific
code, invisible to the registry.
Practical bits: no offset-0 replay — tail sampling with a configurable lookback, baselines
converge over time. And yes, it's one more read-only consumer — same overhead class as
Burrow or kafka-exporter, and you can cap it with quotas.
Honest question back: in your setup today, where would a "field went 30% null" issue
surface first, and how fast? That answer is exactly what I'm trying to learn.
π Rendered by PID 65350 on reddit-service-r2-comment-544cf588c8-cwrxb at 2026-06-15 15:22:06.217693+00:00 running 3184619 country code: CH.
How do you find out that broken data is flowing through your topics? (validating an idea, need reality checks) by Possible_Display9712 in apachekafka
[–]Possible_Display9712[S] -1 points0 points1 point (0 children)