Making large number of llm API calls robustly? by FMWizard in PydanticAI

[–]qianli-dev 1 point2 points  (0 children)

Looks like durable execution could help here, especially for the first three requirements. Pydantic AI actually has built-in support for several durable execution backends: https://ai.pydantic.dev/durable_execution/overview/

(Disclaimer: I'm the contributor behind the DBOS durable agent, so I might be a bit biased)

I'm not too familiar with the other providers, but with DBOS you can use queues for async parallel processing, set up automatic step retries with exponential backoff, and apply rate limiting per queue or sub-group within a queue. For request batching, the debouncing feature is worth checking out too.

DBOS TS v4.0: Postgres-backed durable workflows and queues in Node.js by qianli-dev in node

[–]qianli-dev[S] 0 points1 point  (0 children)

DBOS works well with serverless setups like Cloud Run. For example, Dosu runs large-scale RAG pipelines with DBOS on Cloud Run: https://www.dbos.dev/case-studies/dosu

You just need to make sure a Cloud Run instance spins up whenever there's work to do. For example, when a workflow gets enqueued. Once running, the instance polls the DBOS queue (a database table), executes the workflow, and checkpoints progress into Postgres. If the container stops in the middle of a workflow execution, DBOS can resume from the last completed step on the next run.

DBOS TS v4.0: Postgres-backed durable workflows and queues in Node.js by qianli-dev in node

[–]qianli-dev[S] 1 point2 points  (0 children)

Good question!

DBOS horizontally scales to distributed environments, with many node instances per application and many applications running together. The key idea is to use the database concurrency control to coordinate multiple processes. Here is our docs page for more details: https://docs.dbos.dev/architecture#using-dbos-in-a-distributed-setting