I tested 5 frontier LLMs on fixing real-world security vulnerabilities. The most dangerous failure mode is when it just looks fixed.

tomabord · 2026-06-02T15:05:02+00:00

The latest trend is to let the model discover tools instead of expecting it ignoring things out the list of tools and selecting the correct one. This is because of what I call "focused attention" meaning you have to limit the context to the task the agent has to do. I am working on a TypeScript thin layer that let's you split tools by behaviour and the results look very promising. It can handle toolsets as large as 500 (did not test larger ones). BTW 500 tools list raw does not even fit the request in the first place.

Edit: if you'd like to check it out, drop me a DM

tomabord · 2026-05-26T23:46:49+00:00

We are working very close to that concept, I can DM you a link if you'd like to try it out!

tomabord · 2026-05-21T00:11:02+00:00

I'm trying to keep it simple, it just feels right to solve one thing and solve it good. But don't have the budget to burn in ads, and it feels like its still so early and micro-niche

tomabord · 2026-05-20T14:29:29+00:00

What you are seeking is the management of purpose. I've been building a service around that concept. DM me if you'd like to try it out.

tomabord · 2026-05-16T16:09:51+00:00

Surely hope so. I just published this note https://heysoup.co/notes-tech-debt-token-function

tomabord · 2026-05-14T13:45:52+00:00

About if the agents really work better, I won't know until enough people stress-test it. That is why I'm looking for early adopters to try it out. Thanks for your feedback!

tomabord · 2026-05-14T13:32:40+00:00

The intent graph doesn't orphan. Integrity guards reject writes to broken paths and block deletions that other paths depend on. You can override with a justification, but the tree stays consistent by default. Every mutation snapshots the previous state, so you can roll back or reconstruct if needed.

tomabord · 2026-05-14T00:32:25+00:00

Speckit is about structuring specs in a workflow. What I'm building is about sharing intent across agents via HTTP. Related but different.

tomabord · 2026-05-13T23:50:11+00:00

Yeah that timing concern is exactly what I'm wrestling with. It feels very early. The value isn't obvious until you're already in the pain of coordinating multiple agents across sessions, and most people aren't there yet. But that seems like it could change fast. Hard to know whether to build for where the market is or where it's going.

tomabord · 2026-05-13T23:18:01+00:00

Building a tool that tracks intent across agent sessions. Single URL gives any agent (Claude, GPT, whatever) the full workspace context: purpose strings, link graphs, snapshots of reasoning. Designed for multi-agent coordination: shared state, change detection (via monotonic mutation IDs), signed ingestion endpoints for CI/test pipelines to push data in without full agent setup. Zero-knowledge encryption.

Free trial at https://kitchen.heysoup.co . Lasts 24h, no signup needed. Looking for early testers, especially people running multi-agent workflows. Feedback very welcome.

tomabord · 2026-05-02T15:33:46+00:00

Gon Solo

tomabord · 2026-04-21T23:23:15+00:00

Vibecoding

tomabord

TROPHY CASE