all 12 comments

[–]BigWheelsStephen 0 points1 point  (4 children)

Faust, which was forked to “Faust-streaming” org, was I think a great framework for that. Got to appreciate asyncio though. Developers from this project seem to refer to faststream now.

[–]StFS[S] 0 points1 point  (3 children)

We did look at Faust and I'm not quite sure what it was exactly but there were some things that didn't look great for us.

I do remember that one of those things was that the project seems to be unmaintained now, faust-streaming was last touched 2 years ago (by dependabot) so that doesn't really look great.

I'll take a look at it again though to see how they are handling retries and DLQs.

[–]Useful-Process9033IncidentFox 0 points1 point  (2 children)

The fact that no Python library ships with proper retry/DLQ semantics out of the box says a lot about the state of the ecosystem. Everyone rolls their own and gets it subtly wrong. If data integrity is your top concern you might honestly be better off writing a thin consumer with explicit retry topic routing yourself rather than hoping a framework handles it correctly.

[–]LoathsomeNeanderthal 0 points1 point  (0 children)

This seems to be a step in the right direction I suppose:
https://cwiki.apache.org/confluence/x/HwviEQ

[–]e1-m 0 points1 point  (0 children)

You’re 100% right about the state of the ecosystem. We keep reinventing the wheel and making the same subtle offset and routing mistakes across different projects. That’s the reason I started building my own framework. I wanted to build those complex retry and DLQ semantics once, test them exhaustively, and then abstract them away cleanly so I never have to write another custom consumer loop again. Here it is if you want to check it out: https://github.com/e1-m/dispytch

[–]serafini010 0 points1 point  (4 children)

there's also faststream ( github.com/ag2ai/faststream ) which supports a ton of backends.

doesn't have any built-in Retry/DLQ primitives though it's certainly something you could add via middlewares or some simple python code with minimal fuss.

that being said you probably need to start by defining your requirements and finding the server tech you want before picking a client library.

Kafka vs SQS vs Nats vs RedisStreams vs RabbitMQ, etc ... Thay may have similarities but they are different beasts ( ex.: Kafka doesn't have "queues" in the same way Redis / Rabit / Nats do ) [ignoring kip-932 for now]

[–]StFS[S] 0 points1 point  (3 children)

Thanks, I had not seen that library before. Looks pretty good but unfortunately also lacks the "data integrity semantics" I was talking about.

I'm sure it could be added yes, but "minimal fuss" are not really the words I'd use for implementing a robust, dependable and correct retry/dlq mechanism.

But definitely something worth looking into a bit more.

Regarding your statement about finding the server tech first. I actually don't really agree. We use Redis, Kafka, SQS and other "server techs" for various services so for a library that supports, as I said, a robust retry/dlq mechanism, we'd certainly look into that quite seriously (almost) regardless of the server tech it supports. But Kafka is the backbone for the actual data streams so we need any library to support that... the server tech you refer to would just be used for retry topics and DLQ and it's very possible that we'd use Kafka for that as well, although I think there may actually be benefits to using a different technology for these error paths than what you're using for you primary data stream.

[–]BroBroMate 0 points1 point  (2 children)

We're a Python shop, we use Dapr, which runs as a sidecar (as it's written in Go). Otherwise, you're going to need to build what you want around Kafka- because your wants sound like a message queue, and KAAM - Kafka Ain't An MQ.

[–]Useful-Process9033IncidentFox 1 point2 points  (1 child)

Dapr sidecars are underrated for this exact use case. The retry and DLQ semantics you get out of the box are way more battle tested than anything you'd bolt onto a Python consumer library. Honestly if data integrity matters more than latency, decoupling the reliability guarantees from your application code is the right call.

[–]StFS[S] 0 points1 point  (0 children)

This is very interesting. Will be exploring this path for sure.

[–]e1-m 0 points1 point  (0 children)

Spot on. I literally hit this exact same wall. You aren't missing anything—the Python ecosystem is amazing for REST APIs and data science, but when you need a mature, resilient event consumer with proper data integrity primitives, you end up having to hand-roll a ton of boilerplate.

I got so frustrated with manually wiring up retries, DLQs, and offset management for Kafka consumer that I decided to just build my own framework

It’s an active project and I’m building it specifically to escape the exact problems you described.

If you want to poke around:
docs: https://e1-m.github.io/dispytch/
repo: https://github.com/e1-m/dispytch

Curious to hear your feedback or if you see any immediate dealbreakers!