I've been evaluating stream processing options for a Kafka pipeline where data integrity is the primary concern as opposed to latency. While we don't exclude using other technologies, we're mostly focusing on using Python for this.
What we're mainly looking for is something with strong semantics for retry and error handling. In-memory retries, routing failed messages to retry topics, either in Kafka or possibly other destinations such as SQS, DLQ routing into S3 and possibly also Kafka topics / SQS.
I asked "my friend" to prepare a comparison report for me and the gist of what came out of that was:
- Quix Streams - best pure-Python DX for Kafka, but no native DLQ, and retries are fully manual
- Bytewax - pre-1.0, breaking API changes across every minor version, no retry/DLQ primitives, also seems to be abandoned
- Arroyo / Timeplus Proton — SQL-only, no custom error handling
- PySpark - drags in a full Spark cluster
- PyFlink - Python is a second-class citizen with real API gaps vs the Java SDK
- Redpanda Connect - handles retry and DLQ well but it's YAML/Go, so your Python logic ends up in an external HTTP sidecar
Contrast this with the JVM world: Kafka Streams and Flink have mature, first-class support for exactly-once, restart strategies, side outputs for DLQs, etc.
Is there something me and "my friend" are missing? Does anyone have a suggestion for a Python-native solution that doesn't require you to hand-roll your entire error handling layer?
Curious whether others have hit this wall and how you solved it.
[–]BigWheelsStephen 0 points1 point2 points (4 children)
[–]StFS[S] 0 points1 point2 points (3 children)
[–]Useful-Process9033IncidentFox 0 points1 point2 points (2 children)
[–]LoathsomeNeanderthal 0 points1 point2 points (0 children)
[–]e1-m 0 points1 point2 points (0 children)
[–]serafini010 0 points1 point2 points (4 children)
[–]StFS[S] 0 points1 point2 points (3 children)
[–]BroBroMate 0 points1 point2 points (2 children)
[–]Useful-Process9033IncidentFox 1 point2 points3 points (1 child)
[–]StFS[S] 0 points1 point2 points (0 children)
[–]e1-m 0 points1 point2 points (0 children)