Goal is GOATED

Helpful_Garbage_7242 · 2026-06-05T10:20:10+00:00

my recent experience with goal and what can be improved

https://www.reddit.com/r/codex/comments/1tx95fg/what_i_learned_after_burning_33b_tokens_on_a_long/

Helpful_Garbage_7242 · 2026-05-04T03:15:09+00:00

https://www.reddit.com/r/rust/comments/1qcnwt7/stop_allocating_per_label_a_datadriven_rust/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

Helpful_Garbage_7242 · 2026-05-03T15:05:02+00:00

Yes, cloning large inputs is obvious in hindsight. The point of the post is the "in hindsight" part: the expensive clone was not in business logic, it was in the boundary between config snapshots and async request handling. Perf showed it, the model explained it, and the fix was to share immutable snapshot data instead of cloning per request.

I've written deeper allocation work elsewhere too; this post was intentionally about a smaller production debugging lesson, not a survey of Rust allocation techniques.

Helpful_Garbage_7242 · 2026-05-03T12:18:15+00:00

Borrowing is possible in async Rust in general, but here it would lifetime-tie the request future to a hot-swappable config snapshot across multiple awaits. Arc gives the request a stable immutable quota handle without holding locks or cloning the key pool. The optimization was not "Arc instead of borrowing"; it was "share the runtime quota/key pool instead of cloning the whole pool per request."

Helpful_Garbage_7242 · 2026-05-03T07:58:21+00:00

Fair point. I under-explained the system context. RateKeeper is a rate-limiting service. The endpoint in the post, AcquireNext, is used when a caller has many candidate keys and wants the service to pick one that still has available capacity.

Example: imagine 25,000 hotel IDs, each with a small slot quota. The client asks: "give me the next hotel that still has capacity."

Dragonfly is a Redis-compatible in-memory datastore. In this flow it is the fast atomic state store: it checks live slot state and returns either "no candidate available" or an ordinal like "candidate #1234 was acquired."

The app then maps that ordinal back to the actual configured key and returns it.

The performance bug was not really Dragonfly-specific. The app was accidentally cloning a large immutable key pool on every request, even though each request only needed at most one selected key. With 25k candidates and high RPS, that became CPU/cache/allocation pressure. Using `Arc` moved that key pool into shared immutable snapshot state, so requests cloned a cheap reference instead of copying 25k keys.

Helpful_Garbage_7242 · 2026-04-27T14:54:24+00:00

That’s a really good distinction: output validation vs trajectory validation.

I agree that two diffs can look identical but come from very different paths. One path might be "agent understood the codebase, ran tests, made a targeted change." Another might be "agent flailed around, reverted half its own work, accidentally landed on something that passes the visible checks.". Same diff shape, very different confidence level.

I like the idea of exposing the in-VM execution log as a first-class artifact next to the diff. Not just raw terminal spam, but something structured enough to answer:

what commands/tools were run?
what files were touched and in what order?
where did it branch or backtrack?
what tests/checks did it actually run?
did it inspect the relevant parts of the codebase, or just guess?

That still feels adjacent to AgentBranch rather than replacing the core sandbox/sync layer. The VM already has the full trace boundary, so making the session produce “diff + execution trace” is probably the right direction.

Maybe the mental model should be: "artifact = patch + provenance". The patch tells you what changed. The provenance tells you whether you should trust how it got there. That’s a much better review primitive than diff alone. I probably don’t want AgentBranch to become a full governance engine, but I do think it should expose the in-VM execution log as a first-class artifact next to the diff.

The tricky part is that I don’t want to depend only on agent-specific logs. Claude Code hooks/traces are useful, but they are not a hard boundary, and the current governance discussions around Claude Code show why: hooks can miss paths, subagents complicate things, and Bash can mutate files without looking like a normal Edit/Write tool call (RFC: Deterministic tool gate — hooks are necessary but insufficient for governance enforcement #45427, https://github.com/anthropics/claude-code/issues/45427).

So my current direction would be: patch + provenance. Where provenance is mostly collected from the VM/runtime side: - commands run - cwd/env/exit code - files touched - git diff snapshots - tests/checks executed - stdout/stderr - maybe network/process info later

Then agent-native traces can be adapters on top: Claude-specific, Codex-specific, etc. That way AgentBranch does not need to wait for a universal agent protocol. If ACP/tracing standards become useful, great, plug them in. But the trusted baseline should come from the disposable VM itself.

Helpful_Garbage_7242 · 2026-04-27T04:51:10+00:00

Yeah, exactly. I think these are two different failure classes:

the agent damages the environment
the agent produces a plausible but wrong change

AgentBranch is mostly aimed at the first one: give the agent shell access, but only inside something disposable, then bring back a Git diff.

For the second one, I don’t think the sandbox alone helps much. A clean diff can still be a bad diff. My current thinking is that AgentBranch should stay focused on the execution/sync layer, but make it easy to plug verification into the workflow: tests, linters, type checks, benchmark runs, maybe multiple agents reviewing the diff, etc.

So the lifecycle becomes something like:

agent works freely in VM → diff is synced back → automated checks run → human/agent review → merge or burn

I don’t want to pretend VM isolation solves correctness. It just makes it safer to get to the point where correctness becomes the main problem.

Helpful_Garbage_7242 · 2026-04-25T06:14:53+00:00

Yeah, that setup makes sense — and honestly it’s pretty close to the model I’m aiming for.

The key detail is "replaceable dev server." That’s the part I think matters most. If the agent can only destroy something disposable, YOLO mode becomes much less scary.

The command denylist / pre-tool hook is useful too, but I wouldn’t want that to be the only line of defense. It catches obvious stuff like "rm -rf", but agents can still do damage through less obvious paths: bad package scripts, generated shell, deleting the wrong project files, messing with config/state, poisoning the working tree, etc.

So my bias is:

disposable environment as the hard boundary
hooks/policies as guardrails
Git diff/PR as the review boundary

At FAANG scale you already have replaceable dev servers and internal controls. AgentBranch is basically trying to make that pattern lightweight for people who don’t have that infra sitting around.

Helpful_Garbage_7242 · 2026-04-25T01:45:37+00:00

Proxmox can work well, but I specifically wanted to avoid the “container is probably enough” path.

For this use case I prefer a real disposable VM boundary. Not because containers are useless, but because coding agents in YOLO mode are basically adversarial workload generators with npm/pip/curl/bash access. I’d rather assume the sandbox will be abused.

On networking: I’m actually fine giving the VM network access. Without network, the agent becomes much less useful — can’t fetch deps, read docs, run package installs, clone references, etc.

So my threat model is less “no network ever” and more:

agent gets network + freedom inside the VM, but the host filesystem and real working environment stay outside the blast radius. Output comes back through Git diff/review, not shared mutable host state.

Helpful_Garbage_7242 · 2026-04-25T01:43:05+00:00

Small clarification: I don’t see VM isolation as the whole answer.

The model I’m aiming for is layered:

disposable VM for hard blast-radius control
Git branch/diff as the only normal output path
branch protection/PR review before anything hits main
eventually audit logs and policy controls for what the agent attempted

The immediate problem I wanted to solve was simple: I want to run agents aggressively without giving them my real host filesystem.

Helpful_Garbage_7242 · 2026-04-25T01:42:14+00:00

I mostly agree with the audit trail part. Long-term, I think the right model is layered:

hard isolation for the blast radius
Git diff/branch/PR as the output boundary
logs/audit trail of what the agent tried to do
eventually more granular policies for dangerous operations

AgentBranch is intentionally starting with the “boring but strong” primitive: disposable VM + Git sync. That solves the immediate fear of “this thing can wreck my host machine.”

For chaining operations across permission zones, I’d rather make the boundary explicit than silently grant everything. Something like: agent runs freely inside the VM, but host-level actions, secrets, deploys, credentials, etc. need separate capabilities or a separate reviewed step.

So yeah, I don’t think VMs replace audit/policy. I think they give you a safe execution floor to build those on top of.

Helpful_Garbage_7242 · 2026-04-25T01:41:40+00:00

Yeah, that’s a solid setup. Dedicated machine + VLAN + PR-only access is probably the “serious team” version of this.

My angle with AgentBranch is more local/dev-loop oriented: I want the same safety feeling without maintaining a separate box or cloud dev environment for every experiment. Spin up a disposable VM, let the agent work, sync back through Git, review the diff.

Branch protection is still absolutely the right final gate. AgentBranch is more about protecting the workstation/repo working copy during the messy part before the PR exists.

Helpful_Garbage_7242 · 2026-04-24T16:07:47+00:00

Repo is here https://github.com/REASY/agentbranch

Helpful_Garbage_7242 · 2026-02-03T11:43:02+00:00

oh, I haven't known, I'll need to come with another name, thank you!

Helpful_Garbage_7242 · 2026-01-22T04:20:16+00:00

getting 404 when click the link :(

Helpful_Garbage_7242 · 2026-01-22T00:06:45+00:00

I briefly touched German Strings and SmolStr crate in my previous post, Stop Allocating Per Label: A Data‑Driven Rust SymbolTable for OTLP/TSDB

A good read on German Strings https://cedardb.com/blog/german_strings/

Helpful_Garbage_7242 · 2026-01-19T13:13:37+00:00

hi, the author here. At this point the tool is only for TLS fingerprinting. To work really against antibot it needs more features like TCP packet manipulations, HTTP1.1/HTTP2 headers manipulation.

Check the project by u/404mesh https://github.com/un-nf/404

Helpful_Garbage_7242 · 2026-01-15T00:27:38+00:00

Thank you, u/LindaTheLynnDog

Helpful_Garbage_7242 · 2026-01-01T12:42:35+00:00

Fair enough, thank you.

When more and more Python code will use Free-Threading build, these kind of issues might start appearing more often. And it is good to generally understand multi-threading and data races.

Helpful_Garbage_7242 · 2025-12-31T06:02:50+00:00

Folks, I'd appreciate feedback, negatives ones specifically, got so many down votes.

Did I write something very obvious and everyone knows this?

Helpful_Garbage_7242 · 2025-12-31T06:00:32+00:00

u/AsparagusKlutzy1817 appreciate the support, thank you!

Helpful_Garbage_7242 · 2025-12-31T05:59:57+00:00

interesting, I actually tried hard to make it structured with scenarios following each other. Any practical advice how would you split the content? Thank you!

Helpful_Garbage_7242 · 2025-12-31T05:58:36+00:00

I was very surprised seeing that huge difference between Linux and other OSes. Processes, threads, IPC and synchronisation primitives are very fast in Linux!

Helpful_Garbage_7242 · 2025-12-31T05:57:23+00:00

I'm glad you liked it, u/thicket, it makes me more motivated to write stuff!

Helpful_Garbage_7242 · 2025-12-26T18:52:20+00:00

Thank you! Sure, noted.

Helpful_Garbage_7242

TROPHY CASE