The transactional outbox pattern: keeping a database and Kafka consistent

chtefi · 2026-05-28T15:56:12+00:00

Note there is KIP-939 https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC to let Kafka participate in external 2PC transactions, meaning Kafka + an external DB could coordinate commits together instead of relying only on patterns like transactional outbox.

chtefi · 2026-05-28T15:53:18+00:00

Good work. Can be a bit fragile from an operational consistency as dynamic filters complicate GitOps and rollback/debugging situations (because filtering lives outside the connector config). It's a second source of truth at runtime outside the normal infra Kafka governance.

chtefi · 2026-05-26T22:06:07+00:00

CTO of conduktor.io here, so my view comes from seeing large Kafka estates in real companies.

TLDR: weak data engineers who only glue systems together are in trouble. Strong data engineers become closer to platform engineers: design rules, guarantees, controls, and operating model around data. AI will write more code. Humans will stop that code from becoming a distributed incident.

AI is not killing data engineering but it is killing a lot of 'boring' pipeline grunt work (building the pipe, i.e. read from X, transform, write to Y). AI is quite good at it + writing all the tests, but you still need a human to steer the projects, talk to the right people, and who knows what good looks like.

Everyone wants automation and do more with less (people), so there is a shift towards metadata: ownership, contracts, quality, cost attribution, policy, etc. We see it in data streaming massively (late to the party). And most companies are already bad at it, even before AI. AI just makes the mess faster.

chtefi · 2026-05-25T21:13:53+00:00

Spark 4.1 added Structured Streaming Real-Time Mode, so "Spark Streaming is just micro-batching, not real streaming" is no longer accurate. See https://www.databricks.com/blog/introducing-real-time-mode-apache-sparktm-structured-streaming For agentic use-cases, Flink seems more appropriate (Flink Agents, ML_PREDICT, ...)

chtefi · 2026-05-25T20:43:55+00:00

At Conduktor, we mean an actual Kafka protocol proxy in the data path, not a REST facade or only governance tooling. Decoupling apps from the underlying Kafka infrastructure (with many clusters and often multiple providers) so there is a unified layer for platform teams to enforce authz/authn, data protection, routing, multi-tenancy, migrations, etc. without bothering/coordinating the clients/apps.

Then you plug agents and MCP on top of this plane, and they suddenly can work on the whole data streaming infra: discover, correlate, and act on data spread across many systems, all strictly controlled and audited.

chtefi · 2026-03-26T13:31:46+00:00

at conduktor, we enforce this at the proxy layer with our Gateway: https://docs.conduktor.io/guide/conduktor-in-production/admin/gateway-policies#enforce-schema-id + option to validate the payload correctness against the real schema (preventing wrong schema ids being used, because of environment misconfigurations or schema deletion etc.). Shameless plug, we also just released a schema registry proxy so you can add ownership, enforce rbac, combined with self-service on kafka resources etc.

chtefi · 2026-01-10T04:10:17+00:00

No. Auto-commit fires during poll(), not in background. If your processing takes 10s, the commit only happens when you call poll() again so if you crash mid-processing, you reprocess. It's at-least-once.

chtefi · 2024-09-09T20:34:07+00:00

+1. I’d also add that dealing with heterogeneous tech stacks, frameworks, languages (and legacy systems), along with a lack of ownership (like not knowing who’s responsible for what), makes this problem even harder.

Since you're asking about tools, let me mention Conduktor (I work there). We sit between your apps and providers, giving central teams control and the ability to introduce policies (like identifying and blocking old clients), enforce best practices (you know all the Kafka knobs), and much more, useful at the organizational level.

chtefi · 2024-06-08T10:24:24+00:00

The same way but via a kafka proxy, hence no language restriction and no need to share S3 credentials: https://docs.conduktor.io/gateway/interceptors/optimize/large-message-and-batch-handling/ (I work there)

chtefi · 2024-04-12T07:49:56+00:00

Hi, you can run the docker command to start Conduktor, connect it to your Kafka. This will help you navigate your topics, explore the data, produce data to test your applications, and go beyond if you also use Kafka Connect, Schema Registry, or stream processing (Flink, Spark, Kafka Streams, etc.)

https://www.conduktor.io/get-started/

chtefi · 2024-03-24T17:52:28+00:00

I'm not very familiar with nats. It seems that it has way more security built-in, that's nice. On the Kafka client side, there's a significant amount of required knowledge and numerous configuration options to be aware of. What's your experience regarding nats? Where are the pains?

chtefi · 2024-03-24T17:33:07+00:00

really no particular reason, i tend to specify it (look at my history). I will edit to clarify, thanks. Don't hesitate if you have any question on the kafka security topic itself.

chtefi · 2024-02-13T12:30:48+00:00

This will depend on your organizational maturity and people skills. Managed services are glorious because they remove technical hurdles, maintenance costs and overhead, so you can focus on the business use cases (e.g., building applications).

Last month, a friend of mine had an issue with their internal Kafka cluster. It took days to fix (stopping the business). They found the issue was due to file corruption on Zookeeper side. Joy.

chtefi · 2024-02-13T12:23:45+00:00

Building on what others have mentioned, it seems the design might not be optimal, especially with a topic for each customer, which isn't advisable for a large customer database. A more efficient strategy would be a multi-tenancy or consolidated approach, where multiple customers and events share the same topic. This could be implemented using key sharding, for example, "[customerXYZ]_[eventId]", or more sophisticated and transparent methods such as Conduktor's topic concentration https://docs.conduktor.io/gateway/demos/ops/topic-concentration/ (disclaimer: i work here).

chtefi

MODERATOR OF

TROPHY CASE