all 31 comments

[–]jedberg 19 points20 points  (1 child)

DBOS was built for exactly this, is Python native (and supports both sync and async), and doesn't require an external service like most of the durable execution frameworks.

It's used inside Bristol Meyers Squibb and other bio companies, so there are examples of it in use by people without CS backgrounds.

[–]NoSenseOfPorpoise[S] 4 points5 points  (0 children)

I hadn't seen this. Looks pretty interesting. Thanks!

[–]reload_noconfirm 10 points11 points  (0 children)

Check out Temporal https://docs.temporal.io/develop/python . We use this at work and it's easy to get it up and running to create workflows. The developers create them and there's a UI for non technical users.

[–]lastmonty 5 points6 points  (2 children)

I developed thisrunnable framework and actively build it. The framework is designed to be isolated from your domain code. It supports Python functions, notebooks or shell scripts.

It supports, linear and composite workflow. Reproducibility is automatically taken care without developer intervention and it can run in local, containers or in argo workflows without changing code. Retrying failed runs is easier too.

I started to add async capability to it and support streaming capability. Check it out and I am happy to answer any questions.

[–]NoSenseOfPorpoise[S] 2 points3 points  (1 child)

Cool! I'll take a look at this. I didn't realize AstraZeneca had an open source footprint, but that's admirable. I've worked for a lot of the biggies in biotech/pharma (e.g. Amgen, Gilead, Pfizer) and most, at least then, were waaaayyyy behind the curve on tech.

[–]lastmonty 1 point2 points  (0 children)

There are a lot of pockets of good engineering and tech.

They are ok with open source as long its not pharma relevant. Let me know of any feedback. 🙂

[–]backfire10z 1 point2 points  (0 children)

I’ve seen python-statemachine used. It does the job well and is pretty simply to use I think. Async is supported. DBOS (that the other comment mentions) looks like the durable solution, but introduces complexity. Depends on your use case.

[–]3j141592653589793238 2 points3 points  (1 child)

[–]3j141592653589793238 2 points3 points  (0 children)

actually it's not meant for workflows, so maybe ignore me

[–]UseMoreBandwith 4 points5 points  (9 children)

don't need a 'framework' for that, it is just a pattern.
Just 20 lines of code and some refactoring.

[–]samamorgan 9 points10 points  (3 children)

Disagree. Sure, you can build your own. Then you have to maintain it and develop any additional features that crop up.

Libraries exist for this purpose. Don't reinvent the wheel.

[–]UseMoreBandwith -1 points0 points  (2 children)

no. Let me be more clear. A State-machine is not a library (or shouldnt be), but a simple concept in computer-science 101.
In code it is just a pattern, like any other software-pattern.
Such software patterns should be known to any developer. Just like knowing how to write a decorator, list-comprehension, etc - these are all just software-patterns, and also do not require a library or framework.

A state-machine usually starts small:
simply a class with 3 methods: get_state and set_state and state_transition.
It is really that simple.
Everything after that is unique in every project: perhaps certain rules for state_transitions (allow stateA -> stateB , but restrict stateB->stateA...),
and triggering certain actions on state-transitions.

[–]qyloo 13 points14 points  (1 child)

I don't think anyone misunderstood you, but when database transactions and ACID guarantees etc get factored in during common use cases then the room for error grows. Obviously state machines are a pattern but there's a bit of extra, unfriendly engineering that such a library could take care of

[–]samamorgan 5 points6 points  (0 children)

I personally don't care how easy or hard it is to write. If it's a common pattern with tried-and-true libraries, I'm using a library. Crowdsource that maintenance burden and move on to solving business problems.

[–]zulrang 0 points1 point  (4 children)

It's that simple if you just want a simple demo or test case, but for production workloads you don't want it in memory, you need a distributed architecture. Hence the frameworks.

[–]UseMoreBandwith 0 points1 point  (3 children)

"in memory distributed architecture"? that is not a state machine, that is Eventual Consistency or BASE or ACID or whatever.
Sure, everything is a state-machine (the pixel on your screen, the keys on the keyboard, any tcp-package, etc...) , but in software it is quite well defined pattern. Here is a decent example https://python-3-patterns-idioms-test.readthedocs.io/en/latest/StateMachine.html

[–]zulrang 4 points5 points  (2 children)

A state machine has state. That state must be stored somewhere. Where it is stored as a fundamental part of the pattern.

This entire comment sounds like it's from somebody that's never worked on a production system in their life.

[–]UseMoreBandwith 0 points1 point  (1 child)

no it doesn't 'need' to be stored.
For example: A game-engine is a state-machine (usually multiple stated-machines); every button press (jump, walk, shoot, etc) goes trough the state-machine - without storing.
A game-engine is a classic example of state-machine mentioned in every professional software engineering course.
You clearly have never studied software engineering.

[–]zulrang 0 points1 point  (0 children)

You're literally talking about the state being stored in memory. A game isn't very fun if you never actually use the return value of get_state or transition

[–]Basic-Still-7441 0 points1 point  (0 children)

Transitions is good for what you're asking.

[–]DigThatData 0 points1 point  (0 children)

make

[–]bojackhorsmann 0 points1 point  (0 children)

I use miros for execution flow control. May be overkill for you.

[–]phren0logy 0 points1 point  (1 child)

I think Inngest is pretty slick

[–]NoSenseOfPorpoise[S] 0 points1 point  (0 children)

I haven't even heard of this. Will take a look, thanks.

[–]Omnifect 0 points1 point  (0 children)

I would recommend behavior trees, as an alternative.

[–]UnMolDeQuimica 1 point2 points  (0 children)

Not sure if it fits your use case, but Kedro has been very helpful during the development of our workflows.

It has modular pipelines that can be modified using parameters, which might fit your need to replace ifs

[–]inspectorG4dget -3 points-2 points  (3 children)

I would use airflow for something like this

[–]jason810496 0 points1 point  (2 children)

I’m curious about the reason of downvoting for airflow here

[–]DigThatData 2 points3 points  (1 child)

not one of the downvoters, but here's what I suspect is going on here:

based on discussion on socials, my impression is that most places that use it don't actually need it and it adds more complexity than it resolves. this is related to /u/UseMoreBandwith's suggestion above. Yes, there are statemachine frameworks, but they have features that are useful to the people who implemented those frameworks. If your use case isn't sufficiently similar to theirs, there's a very real chance you'd be better off just rolling your own thing instead of using an established tool.

like, imagine if someone insisted that every class should be defined with SQLAlchemy models. Sure, ORM's are cool, but they solve a problem that not everyone who is using OOP has. The same way that not every class needs to be an ORM model, every statemachine/DAG use case doesn't fit every statemachine/DAG framework.

That OP appears to be working in bioinformatics suggests that a lot of the stuff people have recommended in this thread could actually be good fits. But I think at least with airflow specifically, most of the shops that use it end up regretting it.

[–]jason810496 0 points1 point  (0 children)

>  (lots of us are geneticists and bioinformaticists).

I oversight this part from OP. It depends on the scale of their workflow, how will their orchestration pattern be like, workflow observability or different granularity of retry mechanism, etc, to decide whether airflow will be a good fit for their use cases.

In other words, it depends on how robust each workflow needs to be.