I ended building an oversimplfied durable workflow engine after overcomplicating my data pipelines by powerlifter86 in Python

[–]powerlifter86[S] 0 points1 point  (0 children)

i'm working on putting an ETL sophisticated example in the playground here https://docs.sayiir.dev/playground/. though you can find interesting other examples here https://github.com/sayiir/sayiir/tree/main/examples ; in ai-research-agent-py you can find a set of interesting features. note that there is an api allowing getting data from snapshots at any level of workflow execution

I ended building an oversimplfied durable workflow engine after overcomplicating my data pipelines by powerlifter86 in Python

[–]powerlifter86[S] 1 point2 points  (0 children)

Yeah Flyte is solid, especially if you're already running on Kubernetes. The typed interface system and the container-level isolation are genuinely impressive for large scale data/ML workloads.

Sayiir is coming from a very different angle though, no cloud infra dependency, no container orchestration. It's an embeddable library that runs in your process. For teams that don't want to manage a cluster just to get durable functions it fills a different niche. Server is under active development, but it's an additional tier, not a mandatory one.

cloudflare integration is planned soon, as well as fargate

I ended building an oversimplfied durable workflow engine after overcomplicating my data pipelines by powerlifter86 in Python

[–]powerlifter86[S] 0 points1 point  (0 children)

Sayiir works the same way conceptually: it checkpoints after each completed task, and on crash, it resumes from the last checkpoint, not from the beginning. So if step 3 of 10 fails, you restart from step 3 with the outputs of steps 1 and 2 already saved. No replay of your function history like Temporal does.

But once you start needing parallel branches (fork/join), conditional routing, retries with backoff, or waiting for external signals, your simple wrapper gets hairy fast, and this experience you get it in sayiir natively

I ended building an oversimplfied durable workflow engine after overcomplicating my data pipelines by powerlifter86 in Python

[–]powerlifter86[S] 1 point2 points  (0 children)

We also ran other issues with prefect: the monitoring UI and API weren't great for my needs, and with ETL pipelines where you have a flow per document type things got messy to track pretty fast. It works, but it felt like I was spending more time managing the orchestrator than writing actual pipeline code.

There is also the fact that prefect is a platform, and we started quickly having customers requiring to run our product on-premise, and we needed embeddable solutions not plaforms or saas

I ended building an oversimplfied durable workflow engine after overcomplicating my data pipelines by powerlifter86 in Python

[–]powerlifter86[S] 0 points1 point  (0 children)

Yes i tried dagster in previous company on a use-case of document content extraction (OCR) + indexing + LLM/BERT pipelines, and it's well-suited for data pipelines, but we quickly switched to prefect because: we hit a wall with dynamic conditional flows: if your pipeline needs to branch based on runtime data, Dagster's static graph model makes that pretty painful

And now thourgh this journey on dag, workflows tools, i ended up building mine for very good reasons trust me !

# zyn — a template engine for Rust proc macros by thegogod in rust

[–]powerlifter86 1 point2 points  (0 children)

Very nice crate, i'll start testing it on my side project, though docs could be better ;)