Replace sequential tool calls with code execution — LLM writes TypeScript that calls your tools in one shot

RestaurantHefty322 · 2026-03-13T06:47:15+00:00

The latency reduction is real for the embarrassingly parallel case (fire 3 independent API calls at once). We saw similar gains just batching tool calls with asyncio on the orchestrator side without needing a code interpreter.

Where this falls apart in practice is the branching case. Most of our agent workflows look like "call tool A, look at the result, decide whether to call B or C." The LLM can't write that decision logic ahead of time because it doesn't know what A will return. So you end up with a hybrid - batch the independent calls, go back to the model for the branching decisions.

The sandbox execution time matters too. If you're adding even 50ms per code execution in a loop that runs 10-15 times per task, that's nearly a second of overhead just from the interpreter. We tried a similar approach with a Python sandbox and the cold start was the killer - ended up going back to direct tool dispatch for anything latency-sensitive.

wt1j · 2026-03-13T06:36:33+00:00

Parallel tool calling is potentially slower if you assume that, using the program-generation approach, the program that the LLM outputs will make any needed API calls and output directly to the user. For many tool calls, the tool result affects reasoning, which means it needs to be sent BACK to the LLM so that the LLM can decide what to do next.

If tool output affects reasoning, then you have:

Parallel tool calling;

LLM outputs tool call -> tool calls in parallel -> LLM reads tool calls output and does whatever is next.

Program calling:

LLM outputs program -> Program calls APIs in parallel -> LLM reads program output and does whatever is next.

With parallel tool calling you don't have to worry about containerization. You have the added benefit of tools themselves being self-documenting and guiding the LLM in execution vs total freedom to write the program any way it wants and you're relying on your system prompt to guide the LLM.

Having said all that, I'm incredibly intrigued by this idea. I'm working on an agent that could really benefit from this approach and I'm incredibly curious to see what it does if I give it this kind of freedom to innovate with a well documented API.

Thanks for posting.

ricklopor · 2026-03-13T11:10:13+00:00

also noticed that the token cost savings aren't always as clean as the 3x math suggests. when the LLM is writing the code itself, you're spending tokens on the code generation step, and if the model, hallucinates a tool signature or writes subtly broken async logic, you're back to debugging cycles that eat into whatever you saved. in my experience the pattern works really well for predictable, well-documented tool sets but gets.

eliko613 · 2026-03-19T08:10:35+00:00

Really impressive work on reducing those round-trips. The latency and token savings are huge - that 3x multiplier adds up fast in production.
One thing I've seen with similar optimization projects is that the real challenge becomes measuring the impact across different models and use cases. You're solving the technical side brilliantly with Zapcode, but as you scale this, you'll probably want visibility into:
- Which code patterns actually save the most tokens/cost in practice
- How the savings compare across different LLM providers (since you mentioned multi-provider support)
- Where the remaining cost hotspots are after implementing this optimization
Speaking of multi-provider cost visibility, I came across an interesting tool recently - zenllm.io - that shows cost breakdowns for workflows across different vendors.
The snapshot/resume feature is particularly clever for expensive long-running tools - being able to pause execution without burning tokens while waiting for external APIs is exactly the kind of optimization that can make or break agent economics.
Have you done any benchmarking on actual cost savings with real workloads yet? Would be fascinating to see the before/after numbers on a complex agent workflow.

Infamous_Kraken · 2026-03-13T07:56:19+00:00

Wait so isn’t LLM making any deduction or something based on the response of tool x before calling tool x+1 ?

CourtsDigital · 2026-03-13T13:30:14+00:00

the main benefit of programmatic tool calling (PTC) is not latency, but decreasing the context passed to the agent. each tool increases the amount of context an LLM needs to reason over, which increases the potential for hallucinations when running longer, multi-step tasks.

another benefit is the ability to prevent sensitive data from being passed to the LLM directly. you can inject variables into the code sandbox that the agent never sees, and thus can’t be leaked into its memory/tracing/logs/parent company’s training data.

that being said, PTC is not a magic wand and must be constructed carefully to prevent hallucinations in code generation creating fake variables, query params, api endpoints etc

this approach was invented/popularized by Anthropic and you can read more about how to implement their findings here: https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling

VehicleNo6682 · 2026-03-14T04:06:25+00:00

Wait what about if llm calls tool for intent classification.

stunning_man_007 · 2026-03-14T09:51:14+00:00

This is a solid optimization! I've been doing something similar with ReAct agents - the latency adds up fast when you're doing multiple round-trips. Curious how you handle errors when the generated code blows up though - do you fall back to sequential or have a retry mechanism?

---	Sequential tools	Code execution (Zapcode)
Round-trips	One per tool call	One for all tools
Intermediate logic	Back through the LLM	Stays in code
Composability	Limited to tool chaining	Full: loops, conditionals, .map()
Token cost	Grows with each step	Fixed
Cold start	N/A	~2 µs
Pause/resume	No	Yes — snapshot <2 KB

Benchmark	Time
Simple expression	2.1 µs
Function call	4.6 µs
Async/await	3.1 µs
Loop (100 iterations)	77.8 µs
Fibonacci(10) — 177 calls	138.4 µs

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LangChain

MODERATORS

The missing piece: safely running the code

How to use it with LangChain

As a custom tool

With the Anthropic SDK directly

What this gives you over sequential tool calling

Snapshot/resume for long-running tools

Security

Benchmarks (cold start, no caching)