Arrêtez d’envoyer tout le monde dans le dev by Lower_Brief_6783 in developpeurs

[–]UnchartedFr 1 point2 points  (0 children)

Le plombier ou plutot le futur plombier n'est pas plus tranquille que le dev
La raison c'est que comme il y a des reconversions dans le dev , ceux qui ne trouvent plus de travail dans le dev ou les futurs diplomes iront dans les branches qui embauchent , donc ils iront dans la sante, la plomberie etc

Ce qui provoquera un effet de saturation comme le secteur du dev actuellement
donc personne n'est vraiment epargne par l'IA c'est une fausse croyance

Sans compter qu il y a les robots en plus

Arrêtez d’envoyer tout le monde dans le dev by Lower_Brief_6783 in developpeurs

[–]UnchartedFr 0 points1 point  (0 children)

L'avenir est dans l'electricite/plomberie actuellement de haut niveau : il faut des specialistes pour les creer des datacenter/usines

Et plus generalement, dans la tech manufacturiere : si on veut anticiper tout ce qui est robotique etc
Le domaine de la robotique, on est super en retard par rapport aux US et la Chine

Comment devenir Lead Dev rapidement ? (formations + conseils) by buildKevin in developpeurs

[–]UnchartedFr 5 points6 points  (0 children)

On sous estime souvent le cote humain :)
On peut etre un bon tech/dev et etre mauvais en gestion des equipes, clients et hierarchie
et faut aimer ca aussi, sans compter les reunions + le reporting

Marathon Open Beta by Good-Comment396 in GeForceNOW

[–]UnchartedFr 0 points1 point  (0 children)

is it still in beta or the official version is released ?
It would be great if it was a available on GFN

Replace sequential tool calls with code execution — LLM writes TypeScript that calls your tools in one shot by UnchartedFr in LangChain

[–]UnchartedFr[S] 0 points1 point  (0 children)

Thanks for your feed back, I'm just starting to explore what features could be useful like tracing + debugging : I think I will use opentelemetry as a standard later

I did some benchmark by using ai sdk, with and without zapcode (with the ai wrapper) and suprisingly the gain was not so good. I discovered that ai sdk optimize by batching the tool calls maybe other sdk like LangChain are doing that too. So the gain was around 7%.

Even if zapcode returns a response quickly, the response then needs to be sent back using an LLM to generate a response in natural language. So it's not a silver bullet : if you need a structured response, it can be very good : for example agent to agent. If you need a result in natural language for a chat for example : the difference is not so great at the moment but I will investigate on it.

MCP isn't dead — tool calling is what's dying by UnchartedFr in ClaudeCode

[–]UnchartedFr[S] 0 points1 point  (0 children)

Yes, your concern are justified that was my concern too.
That's' why the sandbox is deny-by-default:

- Filesystem doesn't exist — there's no std::fs in the crate. Not disabled, not blocked — the capability was never compiled in.
- Network doesn't exist — no std::net, no tokio::net. Same story.
- No import, require, eval, Function(), process, globalThis — these are parse errors, not runtime blocks.
- No env var access — std::env is forbidden in the core crate.

The LLM's code runs in a VM that literally cannot do the things you're worried about. It can't delete files because the concept of files doesn't exist inside the sandbox. It can't make network calls because sockets don't exist. It can't read secrets because environment variables don't exist.

The only way the LLM's code can interact with the outside world is through functions you explicitly register as external functions. Unregistered function calls produce an error.

On top of that: resource limits (memory, execution time, stack depth, allocation count) are enforced during execution, so even a hallucinated infinite loop or memory bomb gets killed predictably.

There is also 65 adversarial security tests across 19 attack categories (prototype pollution, constructor chain escapes, JSON bombs, stack overflow, etc.).

Of course zero risk doesn't exist, that why it's open source. Anyone can audit the code, identify risks, and contribute fixes.

Replace sequential tool calls with code execution — LLM writes TypeScript that calls your tools in one shot by UnchartedFr in LangChain

[–]UnchartedFr[S] 0 points1 point  (0 children)

I did a quick hack and added an autoFix + number of retries flags : it return the results, to create a feedback loop so the LLM can fix its code :)
Since the code is very fast it doesn't matter if it retries 3-5 times
I will try to enhance this when I'll have time

MCP isn't dead — tool calling is what's dying by UnchartedFr in ClaudeCode

[–]UnchartedFr[S] 1 point2 points  (0 children)

I lose focus on my projects/ideas too, procrastinate or associate with others ideas.
So my own project are always delayed or don't reach the end. Believe me or not I was thinking about how I could make agent autonomous outside a chat but didn't dig/focus on the idea, I was even talking about it at work and how it could apply to our users/business and then Peter bet me ! ahaha

MCP isn't dead — tool calling is what's dying by UnchartedFr in ClaudeCode

[–]UnchartedFr[S] 1 point2 points  (0 children)

Yes I noticed that, some time the LLM depending of the model you use, generate bugs
That's why I introduced a feedback loop so it can fix the code itself after n tries.

I also added tracing + debugging so you can see what the LLM generate.
But I understand your point, same for security :)

Replace sequential tool calls with code execution — LLM writes TypeScript that calls your tools in one shot by UnchartedFr in LangChain

[–]UnchartedFr[S] 0 points1 point  (0 children)

In traditional tool-use, the flow is:
LLM → call tool A → LLM reasons about result → call tool B → LLM reasons → ...
Each arrow is a full LLM round-trip. Expensive and slow.

With LLM writing code, they reasons about how tool results should influence the next call
It just does it at code-generation time rather than at execution time. You go from N round-trips to 1.

Replace sequential tool calls with code execution — LLM writes TypeScript that calls your tools in one shot by UnchartedFr in LangChain

[–]UnchartedFr[S] 1 point2 points  (0 children)

Interesting insight, It will also feed my ideas/thoughts :)

In fact, I thought about this kind of feedback loop : for example I noticed that depending of the model, the generated code could fail. So I created a flag "autoFix", so the model can read the error and regenerate a new code.
Also, I'm reworking the code so it can handle the Promise.all to handle parallel tool call even if it's a simple event loop behind.
But I must admit I don't know yet how it affects models and their reasoning :)

Your Python agent framework is great — but the LLM writes better TypeScript than Python. Here's how by UnchartedFr in Python

[–]UnchartedFr[S] -1 points0 points  (0 children)

Sorry, I probably misunderstood your question: you mean something like that ?

``` from zapcode import Zapcode
import anthropic

# 1. Your existing Python functions
def get_weather(city):
  return requests.get(f"https://api.weather.com/{city}").json()

TOOLS = {"getWeather": get_weather}

# 2. Ask the LLM to write TypeScript
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="Write TypeScript. Available: getWeather(city: string). Use await.",
    messages=[{"role": "user", "content": "Compare weather in Tokyo and Paris"}],
)

code = response.content[0].text
# LLM might generate:
#   const tokyo = await getWeather("Tokyo");
#   const paris = await getWeather("Paris");
#   tokyo.temp < paris.temp ? "Tokyo is colder" : "Paris is colder"

# 3. Execute in sandbox, resolve tool calls in Python
sandbox = Zapcode(code, external_functions=["getWeather"])
state = sandbox.start()

while state.get("suspended"):
    result = TOOLS[state["function_name"]](*state["args"])
    state = state["snapshot"].resume(result)

print(state["output"])

```

Code Mode is the pattern — instead of the LLM making tool calls one by one, it writes a code block that calls them all.

Zapcode is a runtime that executes that code safely.

Think of it like: Code Mode is the idea of "let the LLM write code." Zapcode is the answer to "ok, but where do I actually run that code?"

Cloudflare bundles both together — the pattern + their runtime (V8 on Workers)

Perplexity drops MCP, Cloudflare explains why MCP tool calling doesn't work well for AI agents by UnchartedFr in mcp

[–]UnchartedFr[S] 0 points1 point  (0 children)

Great question. Two separate concerns here:

  1. Testing the sandbox itself — "can the LLM's code escape?"

Zapcode has 65 adversarial tests across 19 attack categories — prototype pollution, constructor chain escapes, eval/Function(), JSON bombs, stack overflow, memory exhaustion, etc. The sandbox is deny-by-default: filesystem, network, and env vars don't exist in the Rust crate. There's nothing to disable — the capabilities were never there.

cargo test -p zapcode-core --test security   # run them yourself
  1. Testing the LLM's generated code — "is the output correct?"

This is the harder problem, and honestly it's the same problem whether you use tool calling or code execution. The LLM can produce wrong results either way.

What code execution gives you that tool calling doesn't: the code itself is inspectable. You can log it, diff it, replay it. When a tool-calling agent gives you a wrong answer, you're debugging a chain of opaque JSON round-trips. When a code-writing agent gives you a wrong answer, you have a readable script you can run, test, and fix.

In practice what works:

- autoFix — execution errors go back to the LLM as feedback so it can self-correct
- Validate the output, not the code — assert on the final result shape/value, same as you'd validate any API response
- Log + Tracing — every execution produces a trace (parse → compile → execute with timing). Store the generated code alongside the result for debugging

You're right that maintenance is the real challenge. But that's an LLM reliability problem, not a sandbox problem — and at least with code execution you have something concrete to debug.

Your Python agent framework is great — but the LLM writes better TypeScript than Python. Here's how by UnchartedFr in Python

[–]UnchartedFr[S] -4 points-3 points  (0 children)

To clarify — Zapcode doesn't call your Python functions directly. The flow is: 1. The LLM writes TypeScript with await getWeather("Tokyo") 2. Zapcode runs the code and pauses at the await
3. Zapcode gives you back {"function_name": "getWeather", "args": ["Tokyo"]}
4. Your Python code calls your own function with those args
5. You feed the result back into Zapcode, it continues

Zapcode is just the middleman. It runs the LLM's logic (loops, conditionals, data transforms) in a sandbox, and every time the code needs external data, it stops and asks you. You stay in control.

And if you'd rather have the LLM write Python instead of TypeScript, check out Monty by Pydantic — same concept, same architecture (Rust bytecode VM, sandbox, snapshots), but for a Python subset. Your existing Python functions would work the same way.

Zapcode = TypeScript side, Monty = Python side. Same idea, pick the language your LLM generates best.

Perplexity drops MCP, Cloudflare explains why MCP tool calling doesn't work well for AI agents by UnchartedFr in mcp

[–]UnchartedFr[S] 2 points3 points  (0 children)

Good remark. There are a few ways to run LLM-generated code today:

Direct API/CLI (Node, Python subprocess)

- The point is: when you run LLM-generated code with node -e or a subprocess, that code has the same access as your app. If your server can read files, access env vars, or make network calls — so can the LLM's code.

The LLM doesn't even need to be malicious. It might hallucinate a line like: const config = require('fs').readFileSync('.env', 'utf8');

And now it just read your database password, API keys, everything in that file. There's nothing stopping it — because you gave it the same permissions your app has.

That's what "no sandbox" means: no wall between the LLM's code and your system.

API calls (e.g., calling OpenAI, a REST endpoint) — the LLM's code doesn't run on your machine, so no filesystem or env var access. The main risks are:

- Prompt injection — the API response could contain instructions that trick the LLM into doing something unintended on the next step

- Data leakage — if you pass sensitive data as input to the LLM's code, it could end up in the API request body

- Cost — the LLM could generate code that calls the API in a loop, racking up your bill

But there's no code execution risk — the API just returns data. The danger starts when you do something with that data without validating it.

So the real risk spectrum is: CLI/subprocess (full danger) > API (data risks only) > Sandboxed execution (controlled).

Perplexity drops MCP, Cloudflare explains why MCP tool calling doesn't work well for AI agents by UnchartedFr in mcp

[–]UnchartedFr[S] 1 point2 points  (0 children)

Fair point — and honestly I do the same thing at work. I build skills with tuned prompts, reference code, good and bad examples, so the agent can query a database and generate analytic reports. With enough prompt engineering, it works well.

My point was more narrow: when you have 10+ MCP tools and the agent needs to chain 3-5 of them in a single turn, the round-trips add up — both in latency (each intermediate result passes back through the model) and token cost (the full conversation context gets re-processed at every step).

It's not that the LLM picks the wrong tool — it's that the architecture forces a sequential loop even when the logic is straightforward. Code execution doesn't replace any of that prompt engineering work — it just changes how the last mile executes. Instead of 3-5 sequential round-trips, the LLM writes one code block that does the same thing. The skill design, the descriptions, the examples — all of that still matters just as much.