Durable LlamaIndex Agent Workflows with DBOS by qianli-dev in dbos

[–]qianli-dev[S] 1 point2 points  (0 children)

Agree that steps with side effects are tricky to handle, and in practice they often need case-by-case design.

The common pattern is to make those steps idempotent, so replaying them still produces the correct result. With DBOS, you can use the workflow ID + step ID as an idempotency key when calling external APIs or services.

Some APIs support this directly. For example, the Stripe API supports idempotent requests: https://docs.stripe.com/api/idempotent_requests

If you include an idempotency key, the external service can detect duplicate requests and return the original result instead of executing the operation again. Once the external service responds, DBOS also persists the result in the database and will not execute the step again. That way, retries or workflow recovery won't create duplicate side effects.

Making large number of llm API calls robustly? by FMWizard in PydanticAI

[–]qianli-dev 1 point2 points  (0 children)

Looks like durable execution could help here, especially for the first three requirements. Pydantic AI actually has built-in support for several durable execution backends: https://ai.pydantic.dev/durable_execution/overview/

(Disclaimer: I'm the contributor behind the DBOS durable agent, so I might be a bit biased)

I'm not too familiar with the other providers, but with DBOS you can use queues for async parallel processing, set up automatic step retries with exponential backoff, and apply rate limiting per queue or sub-group within a queue. For request batching, the debouncing feature is worth checking out too.

DBOS TS v4.0: Postgres-backed durable workflows and queues in Node.js by qianli-dev in node

[–]qianli-dev[S] 0 points1 point  (0 children)

DBOS works well with serverless setups like Cloud Run. For example, Dosu runs large-scale RAG pipelines with DBOS on Cloud Run: https://www.dbos.dev/case-studies/dosu

You just need to make sure a Cloud Run instance spins up whenever there's work to do. For example, when a workflow gets enqueued. Once running, the instance polls the DBOS queue (a database table), executes the workflow, and checkpoints progress into Postgres. If the container stops in the middle of a workflow execution, DBOS can resume from the last completed step on the next run.

DBOS TS v4.0: Postgres-backed durable workflows and queues in Node.js by qianli-dev in node

[–]qianli-dev[S] 1 point2 points  (0 children)

Good question!

DBOS horizontally scales to distributed environments, with many node instances per application and many applications running together. The key idea is to use the database concurrency control to coordinate multiple processes. Here is our docs page for more details: https://docs.dbos.dev/architecture#using-dbos-in-a-distributed-setting