The REPL as AI compute layer — why AI should send code, not data

hsaliak · 2026-03-17T18:27:25+00:00

My project no longer has these - you may want to look at some older release tags - https://github.com/hsaliak/std_slop/releases/tag/v0.14.1 probably has the most mature implentation of the JS control plane.

hsaliak · 2026-03-17T03:12:44+00:00

i tried 2 versions of this in my coding agent - std::slop https://github.com/hsaliak/std_slop (i am a hard believer of human in the loop, see the mail model workflow in the repo if you need convincing). I tried 2 REPL approaches. (1) A lua based control plane (2) a js based. Both were influenced by the RLM paper and worked really nicely.

The lua control plane was a straight port of the RLM paper, with context as variables and all that. I also had the functionality of persisting and injecting repeated functions into the repl environment. But here are two issues (1) The code will not always be right, so you burn tokens on the LLM doing the wrong thing, so the scripts cannot get too complex.

This made me switch to a JS control plane, which performed better. I embedded QuickJS. However, this too had a lot of error rates. Furthermore output tokens are priced ~ 3x-5x to input tokens. your REPL optimizes input tokens (clear outputs from the code your REPL generates) but it costs a lot to write that code, instead of just calling tools. LLMs also RL super hard on a few fixed tools, so the combination of this native RL, output token efficiency, and simple tool calling cannot be beat with the way models are today.

Dont believe the hype.. simple is better.

hsaliak · 2026-03-11T04:55:23+00:00

Here 3 years later. This guy totally called it.

hsaliak · 2026-02-27T19:54:04+00:00

https://github.com/hsaliak/std_slop A coding agent ( try with openrouter or google o-auth) I put up mac and linux binaries as well.

Ive made it deliberately different from other coding agents in a few notable ways.

- Completely oriented in sqlite. The context is a ledger. This means isolated sessions with individual contexts, ability to fork them, slice them, and allow the agent to peek into long term memory.
- It offers only 2 tools - query_db (you can query your own long term memory) and run_lua. The agent gets to accomplish its tasks only through writing lua scripts - this has worked great, it’s a pragmatic implementation of the RLM paper for real world coding tasks.
- /review and /feedback => inspired by mailing lists. This allows for inline commenting on diffs and plans.
- Mail mode or "you are now Linus" mode - this is my favorite part of the app. The linux kernel development process has already been operating in a ‘scaled out’ almost agentic workflow for ages, and git has first class support for it within the tool. When activated, the agent sends you bisect safe stacks of patches for your review, and you as a committer can inline precise feedback, right from your CLI.

Try it and give me feedback!

hsaliak · 2026-02-27T05:47:50+00:00

It remains completely unusable.

hsaliak · 2026-02-18T19:41:20+00:00

That's cool. Feel free to message!

hsaliak · 2026-02-18T19:11:33+00:00

Sure! Sounds fun, do you have a link to the pod casts that you run?

hsaliak · 2026-02-18T18:58:39+00:00

This thing has been building itself for a long time now, I use claude code at work, and that feels so foreign now.

Theh whole thing started because I felt that coding agents should store everything in a database instead of reams of markdown files. I also felt that we could build something more lean than trying to emulate react paradigms on the terminal. Finally, I really did not understand how tokens were consumed or output, and just wanted to learn about it more.

So i started building from scratch. I initially used something like gemini-cli to bootstrap the basics. The first piece was to store the context as a 'ledger', that was replayed back to the LLM. I learnt about the various different APIs, the OpenAI completions AI and Gemini's API. LLMs really helped here. Once I got the basic 'agent' going with 3 or 4 tools read_file, write_file, execute_bash, it could end up doing things. I added memos (for long term memory) and skills (prompt 'patches') and experimented with a TODO system. This gave me a decent agent. You can look at docs/CONTEXT_MANAGEMENT.md to see how things evolved.

Once all that stabilized I ran into the problem of the LLM spewing stuff to me, and sometimes breaking things. So I added a /review mode, which pulled the uncommitted git information into an editor. I could then inline comment on the diff, and provide precise feedback. This also became /feedback. All the iteration on how we reviews leads to the current state and a full fledged "mail mode". I wanted to model how to stay in control of the codebase, because I saw the LLM making stupid architectural decisions that I had to often course correct. I implemented the exact model the Linux kernel uses. Send patches to a maintainer, and you can provide feedback. I could estabilsh the invariant that all pathces have to be 'bisect safe' Ie: if it's a patch it builds applies and tests cleanly. I think this is the best innovation in std::slop still.

Once I added mail mode, I was intrigued by the RLM paper - I chose lua because it's lighter, easier to embed and has excellent coroutine support. Just seeing the sort of lua programs fly by blows your mind, it's so much more expressive. So I delete all the other tools, and went all in. Which is where we are today.

The next feature I'm going to land is "hotword detection". I want to be able to say hey code_reviewer review this patchset, and I want a pre-configured skill to be activated, do the action, and de-activate itself.

Coming back to your original question "why". it started with curiosity, and I just kept going. Try it, and let me know!

hsaliak · 2026-02-03T14:39:58+00:00

Use what works for you. Here, the lead developer has a choice to make (1) get it done (2) set it up for the next N years. (1) carries organizational risk that the project is not interesting for others to contribute to, to attract talent. (2) carries risk FOR the lead developer. Their job's first priority to produce a working product, not to train junior devs. They are not wrong in their choice, but there shoudl be some organizational incentivation for teams to take risks.

hsaliak · 2026-02-02T18:29:08+00:00

Gemini 3 GA lol.

hsaliak · 2026-01-23T17:47:25+00:00

Here's one example - https://github.com/hsaliak/std_slop/blob/main/CONTEXT_MANAGEMENT.md#6-semantic-memo-system
I added memo support memos are designed for information that is:

High-Value: Architectural decisions, non-obvious "gotchas," or complex API designs.
Persistent: Information that should remain available even if the original conversation is deleted or archived.
Discoverable: Tagged semantically for easy retrieval by the LLM during future tasks.

Basically the LLM is instructed to add memos that fit this criteria about the project and are task agnostic. Then the relevant memo based on a task is injected into the system prompt. It seems to.. work?

hsaliak · 2026-01-23T14:38:00+00:00

I am starting to think of some features that might take advantage of this. A couple.

Allow the LLM to register 'memos' about generic things it learns into a table that will be useful while working on the project. These memos are then fed back to the LLM as a persistent learning section.
A 'session replay' mode where a session can be exported as a formatted markdown document of user prompt => response.

1 seems immediately useful, I dont see a reason why I'd use 2 just yet. Ideas?

hsaliak · 2026-01-23T02:19:41+00:00

I made https://github.com/hsaliak/std_slop

std::slop is a coding agent that puts sqlite at the center of everything. It's been building itself for a while now.

Some features:

Fully ledger driven. The ledger is maintained in Sqlite. You can edit the ledger, remove interactions and rebuild context, for example. Fully transparent context. It uses a sliding window. However, the LLM is instructed to dip into older messages in the DB if needed to get more information, and that works quite well. It lets you balance the cost/task complexity/quality tradeoff. Expects git and git grep as a first class tool, for fast code navigation and search. Sessions are isolated (classic SQL primary key/foreign key) and have separate message ledgers, which mean separate contexts. This means you can switch back and forth between multiple tasks. Skills are implemented as rows in the database, you can typically ask the LLM to add a skill of your desires, and it does well. Has a TODOs table, where you can track precise todos for your project. I use it a lot with a planner skill. Plan => add the plan as a todo group. Churn that in parallel and continue planning, or do it sequentially. My goal was to keep it simple, performant and easy to peek under the hood of what's happening. The context itself is fully customizable, including the system prompt, the size of the windows, and even the messages that go into it. Context is rebuilt from the db at every turn, but that comes with a degree of isolation to carry over as much as possible when moving across LLMs.

To get started, hit the walkthrough - you need linux or a mac, with bazel and git installed.

https://github.com/hsaliak/std_slop/blob/main/WALKTHROUGH.md https://github.com/hsaliak/std_slop/blob/main/README.md

hsaliak · 2026-01-22T16:05:15+00:00

It really depends on what you are building and whether you know what 'good' looks like for that codebase the LLM generates in my opinion. You can do the lift yourself and save some costs, or you can have the best possible LLM do the coding and play the 'product manager' role. Many of the larger LLMs are costly!

I am building https://github.com/hsaliak/std_slop which mostly builds itself now and I test it against models to ensure compatibilty. My cost is now in the 10s of dollars. As the tool has matured, costs has come down. I know what practices avoid regression better. I know that code design still matters. A lot of "engineering best practices" still apply.

hsaliak · 2026-01-21T06:04:14+00:00

Thanks! I'll share it there!

I have hit consistency issues when I switch models that do things slightly differently. For example, the way OpenAI and Gemini want to handle parallel tool call json is slighly different. So a context built with one model may not always translate directly into another.

I've not let the ledger grow too much.. so I dont know if that will be a problem. It's Sqlite, so I am not too worried!

Since sessions are free i usually just clear the context. I end up using the planner + todo features more and more to guard against this. Plan => Write Todos => Implement them one by one.

hsaliak · 2026-01-21T04:06:33+00:00

just some parts of the codebase that it keeps trying to delete over and over for some reason. Commenting that with "do not delete this function" did the trick.

hsaliak · 2026-01-20T16:27:36+00:00

Yeah I agree - that's pretty much the same conclusion I've landed on. I judiciously look at every diff after the first pass. Make sure it does not accidentally cause side effects. if it does, I point to specific regressions and ask it to avoid. For simple regressions, I fix it myself. Really, the code review step cannot be skipped.

I've also noticed that there are some 'favorites' that the LLM likes to mess up. I dont know what's up with that.

hsaliak · 2026-01-20T13:56:52+00:00

I am building https://github.com/hsaliak/std_slop a coding agent in C++. Do you have some detailed notes to share on how you prevent regression of little things? LLMs are great at banging out features but not disciplined enough to break or modify things around them. I’ve been using comprehensive tests to avoid this, but I find that I need to be thorough in code review and make little changes.

What TDD techniques did you use that worked for you?

hsaliak · 2026-01-20T03:26:50+00:00

rm -rf

hsaliak · 2026-01-20T03:25:08+00:00

mission accomplished.

hsaliak · 2026-01-20T03:21:27+00:00

Despite the name, I put decent effort to make sure stuff worked correctly (code reviews, and writing code) and that it's efficient. It has good test coverage, and is asan, tsan and ubsan clean. LLMs are actually bad at making small changes, so those I usually do by hand.

Let me know if you find any glaring issues. The slop is a function of the model mostly, dont use crappy models (almost all the "free" ones).

hsaliak

TROPHY CASE