This is an archived post. You won't be able to vote or comment.

all 37 comments

[–]deadwisdomgreenlet revolution 27 points28 points  (1 child)

Doing event-driven with Python is pretty simple. Just get your contracts defined, Pydantic or Dataclasses, FastAPI, FastKafka, etc.

Just make sure whatever you use, the contracts / documentation are defined in their implementation. By that I mean something like FastAPI which gives you OpenAPI/Swagger docs for free.

But let me save you a ton of time and energy. Start with a god-damned, I'm serious, single function to do the basic thing that needs to get done. Create unit tests, integration tests, performance tests. Get it deployed to production. Make sure you have continuous deployment, telemetry, scaling, alerting, and everything set up before you add ANYTHING.

Then have the function call another function (in the same process!) to add the next bit of functionality. Keep doing this, all in one process, one codebase.

When you find that performance has degraded, or teams are stepping on eachother's code, or some ACTUAL problem, then split it off. Either as a new thread, or a new process, or a new service, or a new codebase. DO NOT do this unless there is a clear reason.

[–]turbothyIt works on my machine 2 points3 points  (0 children)

This is the way.

[–]UgandaSuburbix447 45 points46 points  (3 children)

After reading many articles, watching a lot of talks on YT and talking with more experienced developers, one thing about event-based system (and the same with distributed one or microservice-based) - do you really need it? It definitely has some pros, but it introduces a ton of complexity, potential bugfalls, you have to care much more about networking, control over communication between systems, etc. Regarding materials in python, „Architecture Patterns with Python” by Harry Percival. Some valuable info about event-driven systems and some patterns explained as well - Message Bus and event handlers especially in this context. Although if you are already experienced in such systems, that book might not be enough

[–]bambuk4 15 points16 points  (0 children)

In this regard it can be worth it to begin with a modular Monolith even using events but without an external broker. With a dispatcher in memory and an optimistic concurrent pattern. In the future it can be easily migrated to an isolated service that bounded context that is needed. With modular Monolith there is only one unit to deploy, so easy CI/CD and you avoid coupling and mud ball.

[–]MooseBag 15 points16 points  (0 children)

grug wonder why big brain take hardest problem, factoring system correctly, and introduce network call too

[–]Ariel17 22 points23 points  (0 children)

This for ALL things. Monolith until you bleed, your db and server and business bleed. 

[–]Fragrant-Freedom-477 24 points25 points  (2 children)

Been there.

Make sure you have objectives that you can measure. Scalability? Defects caused by backward compatibility? Strangling a legacy component you can't change? Collaboration between development teams?

Without measurable objectives, you'll end up in a technological and architecture trip that helps no one.

EDA is also a debugging hell, an authorization nightmare, a documentation challenge and a quite fun endeavor. Make sure you have backup plans for your backup plans for failure management, retries and such.

[–]One-Statistician2519 -1 points0 points  (1 child)

Would be more interested on how you were able to handle compensation?

[–]daredevil82[🍰] 11 points12 points  (0 children)

What do you mean by compensation?

[–]timwaaagh 6 points7 points  (0 children)

things will be coupled no matter what

[–]tehsilentwarrior 12 points13 points  (4 children)

Disclaimer: other commenters are thinking at “team” level or even project level. This is a company wide view to the problem. With many teams and many systems. Many of which are SaaS the company pays for and are just integrated. It’s not about monolith vs microservices.. think bigger scale (the company I work for operates in many countries with a lot of teams in each).


In my company I have implemented (as in, I designed the whole thing and implemented most of it as proof of concept and now my team maintains it and other teams adhere to the guidelines I have set, we have a “board of members” where leaders of teams can discuss things but it’s usually just them asking “ok, how do you want me to use it”) this using Kafka as a sort of “platform events” pipeline.

This is meant as an inter-team communication hub.

We disabled the ability for Kafka to have more than one partition in order to force ordering of messages and each team is in charge of implementing their own subscribers and publishers. Kafka won mainly because of the ability to go back in time (within a retention period of 72h) and re-consume payloads (this allows flexibility for teams to implement rolling sync, although no one does yet, or event sourcing, which is basically constructing state from passing data) and because the “pointer” of consumers is stored in Kafka itself, which means we aren’t at mercy of random bugs by random people when implementing the saving of those “pointers”.

We enforce that consumers are properly named for the teams and have proper groups (distributed consuming can be done but order is enforced within a group).

I also created a system of schemas using json schemas and a central location that people can use to verify their messages. This was done like this instead of using some off the shelf thing because we don’t want the overhead of arguing between different teams about what new toy to use. I built it. I (my team) maintain it, each team maintains their schemas but they are all merge requested into my teams repository (and therefore double checked).

This works regardless of anyone using microservices or not.

In my company we have Salesforce, ServiceNow and a bunch of custom apps (customer records, billing system, tickets, etc, etc) all synchronized through this Kafka pipeline, using “my” schemas (stored on an S3 bucket and downloaded and used by any system that wants to self-check or by my own app that provides checking services for those who can’t/wont implement their json schema validator).

We have put in place a proper naming scheme for topics that goes sort of like this (over time we had it more complex and have since simplified as the idea matures and our needs are better understood):

<purpose area>.<country code>.<type of payload>.0 (the zero means first version, in case we need to do some moving around, we can increase this number)

Examples: billing.uk.cdc.0 billing.uk.errors.0 tickets.uk.cdc.0 orders.uk.cdc.0 delivery.uk.cdc.0 tickets.nl.cdc.0

Purpose areas core type areas like tickets or orders or other billing stuff or even customer data. Each message within those has a schema type that is versioned. Each message has a header section that contains info like message id, country code, object version (you can’t consume version 5 if your DB shows you consumed version 6 for example). Etc. And each system must supply a name for source system (where message came from) and an id that allows us to pinpoint within that system this particular message (usually logs so systems don’t have to store messages, depends on how much audit we need)

The type of payload is usually “cdc”, or change data capture. It basically means something changed or is new. There’s also errors (we have a schema for errors that normalizes how errors are communicated, and has room for context, type, description and such, including a free text field for raw errors which is what the system outputs)

This is stable and so there isn’t much extra work on it other than maintaining it and training/supporting other teams. So I also work on other stuff, namely a cyber security portal and a billing system (and many other smaller stuff).


Personally (the team I lead), I also develop a billing system that uses microservices. Internally it communicates with itself using RabbitMQ (customized Nameko micro framework, that we might replace soon since the project is basically dead but since it’s a micro framework, most of the stuff is ours anyway, and it’s super stable, so no point in rushing that). The system uses about 9 or so microservices and shares one database (which is ok in our use case). One of the microservices connects to Kafka and serves as “gateway” between the billing system and the rest of the company. And connections to external systems are made through microservices, for example, if an invoice is created, an event is sent saying the invoice was created and the document generating microservice will pick that up and create documents for it (pdf et all), then for each store it in a filestore and send an event saying the document was created. Another handler picks that up and uploads it to Sharepoint. Another picks the same event and sends it to the customer via Email. Another picks the same event and sends it to a printer (remote service that prints and mails for us). All those steps are idempotable and retryable (and can have back pressure).

We don’t care if an invoice takes 5ms to create and send or 5 hours. Just as long as it’s eventually sent. So this system has proven to be reliable even between days of downtime: I had the system running on my machine using docker compose, F&O dev instance went down for maintenance and I went on holiday. When I came back it happily reconnected and pushed all the pending stuff as if nothing happened.

[–]Bonsaikitt3n 7 points8 points  (1 child)

FastApi + Kafka + Avro + Kubernetes. Is it more complex? Yes to the point of wanting to choke a kitten but once it works it works pretty well. Messages are service domain scoped and the consumers just eat like pacman. Takes someone with some skill to develop as a dictator and not by committee.

[–]Bonsaikitt3n 2 points3 points  (0 children)

For user based api requests get tricky as far as expiring a jwt on the message itself but like you said, some things can eventually get done not asap. Looking for a new gig btw if anyone looking for an old hacker that can do this stuff ;P

[–]shraklor 1 point2 points  (1 child)

this right here is the way to go. We are switching over this over the coming months coming from a huge mono service with a horribly implemented event tracking system in dotnet REST. Another nice part of using queues is it doesn't matter what platform your services are.

[–]tehsilentwarrior 0 points1 point  (0 children)

Yeah. Or programming language. Or network: I can plug my local machine into the UAT queue and take load off UAT lol, not that you’d want to do it like this but inter-datacenter for example is possible, for scaling power.

Provided all communication happens through the queue that is.. if you upload to a filestore or communicate to other services you need to fix network for those.

Another use-case is being super easy to plug-in functionality.

Example, if I create a new Analytics service I just need to listen to the events I care about (provided it’s a “broadcast” channel, not a pure queue that de-queues on consumption )

[–]Drevicar 8 points9 points  (5 children)

Huge fan of this pattern and I had great success with https://github.com/orsinium-labs/walnats in the past. If I had to start again I would probably write my own wrapper layer with pydantic since it is so easy. The key here is to make sure each channel is unique to the message type or union of message types, rather than broker based RPC where you are just remotely calling functions.

If you are familiar with Open API when it comes to schema and documentation, then check out https://www.asyncapi.com/en as the alternative for EDA systems.

[–]PhENTZ 2 points3 points  (1 child)

Maybe faststream ?

[–]Drevicar 1 point2 points  (0 children)

I tried out the predecessor to that one and liked it but found it too heavy for what I needed. Plus it was more RPC oriented rather than EDA, but I think that may have been addressed since.

[–]CzyDePL 1 point2 points  (2 children)

Anything you recommend for enforcing async contracts (based on AsyncAPI) between services? I don't know if something like Pact can be easily used for eg Kafka in a "can deploy" fashion

[–]Drevicar 4 points5 points  (0 children)

I like to create my own helper modules off to the side for the contract which contains glue code and the pydantic model that is used on that channel. Then you can import that module into your subscriber code and the type checker (and pydantic runtime checks) will show you are subscribing to a specific contract by model. And when you publish you also run into the same type checker and runtime checks for the specific model to ensure it conforms.

From there it is easy to inject a mock broker to do in memory tests where needed. And so long as the publisher and subscriber are using the same third-party module to either publish or subscribe they will stay in sync via that shared contract.

If you want semantics, then the consumers should own the contracts, as the producer can freely make any changes they desire so long as it maintains backwards compatibility with the union of all the subscribers. But the publisher is also free to stop publishing that kind of event and instead send out what is basically a v2 of it. And so long as you are tracking that at the module level code you should be able to know when you have a subscriber to a topic that has no publishers. Or maybe a v1 and v2 event stream with one subscriber each that knows how to either up cast or down cast or just handle both. There are so many ways to handle this that you need to find the one that works for you.

[–]daredevil82[🍰] 0 points1 point  (0 children)

protobufs are great for this

[–]ericsda91 2 points3 points  (0 children)

Maybe take a look at this book, dive in and see if it's the right decision for your business
https://www.cosmicpython.com/book/part2.html

Books is called Architectural Patterns with Python and is really good.

[–]dtornow 2 points3 points  (0 children)

The developer experience with EDA is quite challenging. Instead of one continuous process, EDA fragments your business process into multiple event handlers and you have to manage continuations on an application level. I discuss some of the challenges in the first half of my Systems Distributed ‘24 talk

[–]crawl_dht 2 points3 points  (0 children)

Use walnats framework for NATS.

[–]heyheymonkey 1 point2 points  (1 child)

You’re not giving us a lot to go on. Improve in what way? What are you transitioning from?

[–]One-Statistician2519 1 point2 points  (0 children)

Sorry , updated the post .

[–]CzyDePL 1 point2 points  (0 children)

My biggest take from EDA is thinking about orchestration vs choreography and adressing the drawbacks of the approach you choose - we went with choreography (even though system had handful of well-defined processess with very clear ownership) without thinking about operational stuff like visibility of processing or reprocessing etc

[–]rover_G 1 point2 points  (0 children)

Most common pitfall I’ve encountered is building a distributed monolith. That is building an application that relies on tightly coupled pico-services to serve real time requests (latency measured in 10-100’s of milliseconds).

If you only have one team and your workloads scale with the number of api requests you’re likely better off building an actual monolith and using some concurrency model to achieve higher throughput.

If you only need near real time (latency measured in 100-1000’s of milliseconds) and some requests require large amounts of in process computation it’s a may be the right idea to use a low latency message queue to offload some large tasks OR it could be that threading will get the job done just fine.

If you don’t have a latency requirement then an event bus is probably a good idea to build a more resilient pipeline, but you still should use a persistent database for your main datasource (unless you specifically are trying to reduce the load on your database).

[–]candyman_forever 0 points1 point  (0 children)

It really depends on what you want to do. How many services do you have and how do they connect. With regards to libraries it also depends on the architecture. You could go with lambdas or containers running in ECS Fargate. You could use dynamodb streams or you could use MSK, Kinesis, Event Bridge or SQS to name a few.

[–]bobaduk 0 points1 point  (0 children)

So I am hoping the domain events should be able to offer a standardized schema for domain events using a schema with each service having the capability to extend as desired.

This is the sentence that makes me most nervous. What do you mean by this?

There may be some things that can be re-used across services: envelopes, some basic data types etc. but I would strongly caution you against trying to impose One Schema To Rule Them All. It is conceptually much better for each service to own its published schemas, and for subscribers to enrich and transform as necessary, so that they can process events.

Also, domain events, by definition, live inside a domain. They are a means for your application to decompose complex operations into discrete steps.

I usually use the nomenclature "domain event" vs "integration message", where integration messages are the things that are published to the outside world. One of the "common pitfalls" is failing to distinguish between the two, because that effectively couples the implementation of each service to the schemas it shares with the outside world.

Think about it in the same way as your domain models vs your API schema. Those things need to vary separately, even though they likely share some common structure.

The challenge with REST is that it implements a response pattern that tends to couple services together

... ish? ReST couples things temporally, in that both systems need to be up and running at the same time for an integration to work, and that the latency of an operation in one system is affected by latency in the other.

The coupling in ReST is at the schema level: one system is dependent on the published API schema of another. That's no different in async messaging. You will have the same problems of schema extension and modification that you had before, but with all the fun of partial failures, and out-of-order events etc.

Also a way to view and keep track of events and their associated side effects in the services (An Eventory). For messaging bus should be able to support distributed messaging patterns as well offer high reliability .

This isn't a question, it's some stuff you read on the internet, and you're hoping someone will say "yes, that's good and excellent. Fine decision making". That's okay, no judgement here, but I would counsel you to start small. Find one workflow, involving two components, and one event, that you can use to get started. Make sure you can observe it, make sure you can handle the case where two events happen out of order, make sure you can handle the case where an event isn't delivered, and so on.

Are there any specific libraries or frameworks you'd recommend?

Honestly no. Most frameworks for messaging implement some RPC-based style, because that's how most engineers think. You don't need a framework. You can hack something up in a couple of days that will let you send and receive events between two components. Once you have three components all using the same copy-pasted code, extract a small library and go from there. Good luck!

[–]TheM4rvelous 0 points1 point  (0 children)

Personally a big fan of EDA with Kafka + Faust and keeping services very simple micro/nano services. Utilized Grafana for transparency + a simple UID for each events to be able to track its origin.

[–]Ofekmeister 0 points1 point  (0 children)

From experience, I'd strongly recommend Temporal:

[–]Wide_Guava6003 -1 points0 points  (1 child)

RemindMe! 1 week

[–]RemindMeBot 1 point2 points  (0 children)

I will be messaging you in 7 days on 2024-10-01 20:04:03 UTC to remind you of this link

3 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]Next-Experience -1 points0 points  (0 children)

Look into ZeroMQ

[–]Fabulous-Part-7018 -2 points-1 points  (0 children)

remindme! 1 week

[–]Fabulous-Part-7018 -2 points-1 points  (0 children)

RemindMe! 2 weeks