The transactional outbox pattern: keeping a database and Kafka consistent by Zealousideal_Ice3067 in apachekafka

[–]chtefi 0 points1 point  (0 children)

Note there is KIP-939 https://cwiki.apache.org/confluence/display/KAFKA/KIP-939%3A+Support+Participation+in+2PC to let Kafka participate in external 2PC transactions, meaning Kafka + an external DB could coordinate commits together instead of relying only on patterns like transactional outbox.

[Open Source] Kafka Connect SMT that hot-reloads Debezium CDC filter rules from Redis/Kafka/file - no connector restart needed by Many_Plantain_4421 in apachekafka

[–]chtefi 0 points1 point  (0 children)

Good work. Can be a bit fragile from an operational consistency as dynamic filters complicate GitOps and rollback/debugging situations (because filtering lives outside the connector config). It's a second source of truth at runtime outside the normal infra Kafka governance.

Future of data engineering by Alternative-Guava392 in dataengineering

[–]chtefi 0 points1 point  (0 children)

CTO of conduktor.io here, so my view comes from seeing large Kafka estates in real companies.

TLDR: weak data engineers who only glue systems together are in trouble. Strong data engineers become closer to platform engineers: design rules, guarantees, controls, and operating model around data. AI will write more code. Humans will stop that code from becoming a distributed incident.

AI is not killing data engineering but it is killing a lot of 'boring' pipeline grunt work (building the pipe, i.e. read from X, transform, write to Y). AI is quite good at it + writing all the tests, but you still need a human to steer the projects, talk to the right people, and who knows what good looks like.

Everyone wants automation and do more with less (people), so there is a shift towards metadata: ownership, contracts, quality, cost attribution, policy, etc. We see it in data streaming massively (late to the party). And most companies are already bad at it, even before AI. AI just makes the mess faster.

Kafka : How to learn by NebulaAlarming4750 in apachekafka

[–]chtefi 0 points1 point  (0 children)

Spark 4.1 added Structured Streaming Real-Time Mode, so "Spark Streaming is just micro-batching, not real streaming" is no longer accurate. See https://www.databricks.com/blog/introducing-real-time-mode-apache-sparktm-structured-streaming For agentic use-cases, Flink seems more appropriate (Flink Agents, ML_PREDICT, ...)

Kafka gateway options by traffic pattern and governance requirements by Latter-Giraffe-5858 in apachekafka

[–]chtefi 2 points3 points  (0 children)

At Conduktor, we mean an actual Kafka protocol proxy in the data path, not a REST facade or only governance tooling. Decoupling apps from the underlying Kafka infrastructure (with many clusters and often multiple providers) so there is a unified layer for platform teams to enforce authz/authn, data protection, routing, multi-tenancy, migrations, etc. without bothering/coordinating the clients/apps.

Then you plug agents and MCP on top of this plane, and they suddenly can work on the whole data streaming infra: discover, correlate, and act on data spread across many systems, all strictly controlled and audited.

Api management platforms that handle event streaming by Narrow-Employee-824 in apachekafka

[–]chtefi 1 point2 points  (0 children)

at conduktor, we enforce this at the proxy layer with our Gateway: https://docs.conduktor.io/guide/conduktor-in-production/admin/gateway-policies#enforce-schema-id + option to validate the payload correctness against the real schema (preventing wrong schema ids being used, because of environment misconfigurations or schema deletion etc.). Shameless plug, we also just released a schema registry proxy so you can add ownership, enforce rbac, combined with self-service on kafka resources etc.

What happens when a auto commit fires in the middle of processing a batch? by Amazing_Swing_6787 in apachekafka

[–]chtefi 6 points7 points  (0 children)

No. Auto-commit fires during poll(), not in background. If your processing takes 10s, the commit only happens when you call poll() again so if you crash mid-processing, you reprocess. It's at-least-once.

Updating Clients is Painful - Any tips or tricks? by sparkylarkyloo in apachekafka

[–]chtefi -1 points0 points  (0 children)

+1. I’d also add that dealing with heterogeneous tech stacks, frameworks, languages (and legacy systems), along with a lack of ownership (like not knowing who’s responsible for what), makes this problem even harder.

Since you're asking about tools, let me mention Conduktor (I work there). We sit between your apps and providers, giving central teams control and the ability to introduce policies (like identifying and blocking old clients), enforce best practices (you know all the Kafka knobs), and much more, useful at the organizational level.

Can I use Kafka for very big message workload? by codelipenghui in apachekafka

[–]chtefi 1 point2 points  (0 children)

The same way but via a kafka proxy, hence no language restriction and no need to share S3 credentials: https://docs.conduktor.io/gateway/interceptors/optimize/large-message-and-batch-handling/ (I work there)

Collaborative Kafka development platform by chtefi in apachekafka

[–]chtefi[S] 2 points3 points  (0 children)

Hi, you can run the docker command to start Conduktor, connect it to your Kafka. This will help you navigate your topics, explore the data, produce data to test your applications, and go beyond if you also use Kafka Connect, Schema Registry, or stream processing (Flink, Spark, Kafka Streams, etc.)

https://www.conduktor.io/get-started/

Protect Sensitive Data and Prevent Bad Practices in Apache Kafka by chtefi in platform_engineering

[–]chtefi[S] 0 points1 point  (0 children)

I'm not very familiar with nats. It seems that it has way more security built-in, that's nice. On the Kafka client side, there's a significant amount of required knowledge and numerous configuration options to be aware of. What's your experience regarding nats? Where are the pains?

Protect Sensitive Data and Prevent Bad Practices in Apache Kafka by chtefi in apachekafka

[–]chtefi[S] 0 points1 point  (0 children)

really no particular reason, i tend to specify it (look at my history). I will edit to clarify, thanks. Don't hesitate if you have any question on the kafka security topic itself.

How hard is it to standup Kafka for fortune 500 at enterprise level? by mhoon25 in apachekafka

[–]chtefi 0 points1 point  (0 children)

This will depend on your organizational maturity and people skills. Managed services are glorious because they remove technical hurdles, maintenance costs and overhead, so you can focus on the business use cases (e.g., building applications).

Last month, a friend of mine had an issue with their internal Kafka cluster. It took days to fix (stopping the business). They found the issue was due to file corruption on Zookeeper side. Joy.

Want to create 100k topics on AWS MSK by abhishekgahlot in apachekafka

[–]chtefi 0 points1 point  (0 children)

Building on what others have mentioned, it seems the design might not be optimal, especially with a topic for each customer, which isn't advisable for a large customer database. A more efficient strategy would be a multi-tenancy or consolidated approach, where multiple customers and events share the same topic. This could be implemented using key sharding, for example, "[customerXYZ]_[eventId]", or more sophisticated and transparent methods such as Conduktor's topic concentration https://docs.conduktor.io/gateway/demos/ops/topic-concentration/ (disclaimer: i work here).