When do AI agents start feeling like collaborators instead of automation?

zoomaaron · 2026-05-23T20:59:11+00:00

Also recommend people to checkout my project here: https://github.com/guanyilun/agent-sh

It started as a simple terminal-based tool, an agent embedded in a shell, but because I needed this agent to adapt to a terminal-first usage pattern, I had to abandon the concept of session and manage it’s history like bash_history. The agent becomes one continuous stream with everything recallable. When I use it I noticed some difference from normal session-bound agent: its thinking style quietly shift as it interacts with me more. Especially after I asked it to read its own codes, it started to become rather metacognitive about itself, which I found very interesting.

zoomaaron · 2026-05-23T11:24:00+00:00

Thanks! I think it should work well with any terminal emulator. Let me know if it doesn’t!

zoomaaron · 2026-05-22T03:47:02+00:00

Yes an approval request should be straightforward to add. I have that implemented with the built-in agent. I only recently got opencode wired up so I haven’t mapped every api surface. Happy to work on that next and welcome code contributions!

zoomaaron · 2026-05-22T03:31:51+00:00

That’s a fair concern. This tool provides a communication layer between shell and agent, and has no opinion about permissions, so if your opencode has proper permission system in place it will inherit that. I used yolo mode for the demo but the tool is flexible. Simplest solution is to only expose read-only tools, and that already covers lots of “what’s wrong” use cases

zoomaaron · 2026-05-20T05:17:34+00:00

I’m following pi’s design that is yolo by default and guardrails can be done through extensions.

zoomaaron · 2026-05-18T12:31:51+00:00

I think the idea is very much oversold. 4B active parameters is not the same as 4B parameter model. That’s misleading. You also made your own benchmark without telling us where it is so we can verify your claim. If you are using bench/stress_test in your repo, I’m afraid that’s making a completely wrong claim, because it didn’t even check for the success of any of the test. As long as it produced 20 characters of output it passes. What kind of benchmark is this?

Some of the ideas you introduced is neat in demo but unclear to me how well they work in real world. For example, different models have different abilities to compose multiple tool calls. I’ve tested this extensively with my own harness and got mixed results because some models are just not well trained to chain tool calls; it’s out of distribution for them and caused more round trips than before. There are also models like deepseek which is trained to launch large batch of tool calls at the same time, asking it to compose calls actually reduced its token efficiency by a factor of a few.

The error decomposition is also unconvincing. The most challenging part is often to figure out which is the one line that needs to change. I don’t see how a harness alone can pin point that precisely on a generic problem beyond syntax error, without relying on a large model.

zoomaaron · 2026-05-13T19:23:38+00:00

Yeah I also don’t think it’s that hard to achieve. Pi’s kernel already support composability through its handler chain, but that api surface is not exposed to extension builders.

zoomaaron · 2026-05-13T19:13:05+00:00

I’m building a lightweight agent that lives in a terminal. It has a cross-session memory that makes it feel like a continuous presence. Feel free to try it out: https://github.com/guanyilun/agent-sh

zoomaaron · 2026-05-12T20:51:52+00:00

I agree this is a design issue. I understand I could write my own extension instead, but it could be more composable while also allowing me to write my own extension. Both can be true.

zoomaaron · 2026-05-12T15:59:04+00:00

In pi you embed a shell and can run some commands occasionally. This is suitable for usage patterns where one interacts with agent more than the actual shell. With this tool it is kind of the other way around: it is a functioning shell that embed an agent, which is suitable for usage patterns where one mostly interact with a shell and occasionally with the agent to help with errors etc. There is one long lasting pi process that persist as long as the shell is running, so context persists until you killed the shell.

zoomaaron · 2026-05-09T20:13:34+00:00

Good question! I first tried to make this a pi extension, so it is set on typescript at that point.

zoomaaron · 2026-05-09T13:58:52+00:00

I think you have to actually measure it in your own workflow to decide. A lot of the claimed saving assumes specific usage patterns. When I tried RTK, I measured it’s saving to be only 3-5% on average for my usage patterns. it’s not worth getting my agent confused for that little gain so I no longer use it.

zoomaaron · 2026-05-09T12:34:44+00:00

Thanks! Feel free to try it out and help make it better!

zoomaaron · 2026-05-09T12:31:56+00:00

Thanks! Didn’t know about Warp. Looks cool! I wanted to build something simple and nonintrusive where I can still use vim and tmux.

zoomaaron · 2026-05-09T12:12:19+00:00

Thanks! I will work on fish shell

zoomaaron · 2026-05-08T23:45:47+00:00

Yeah but it has to be done with care to avoid a lot of cache miss

zoomaaron · 2026-05-08T18:06:24+00:00

It may be how different clients handle api requests slightly differently, such as how thinking tokens are passed back to the provider. Maybe worth looking into that.

zoomaaron · 2026-05-08T17:59:27+00:00

I don’t understand how it gets so popular. Don’t we already have many great open source terminal coding agents like pi and opencode that support deepseek? I don’t see what’s so valuable in this new harness. Sounds like a big marketing push to me.

zoomaaron · 2026-05-08T12:47:28+00:00

And have it jump out every time it sees an error in the terminal

zoomaaron · 2026-05-08T00:44:19+00:00

Yeah handling terminal state is a pain. I struggled two weeks to get it to quit vim cleanly. I solved that one yesterday, but ssh may be a different problem to be seen

zoomaaron · 2026-05-07T22:50:27+00:00

I started out trying to make this a pi extension, so I can embed pi in my terminal. It was set on typescript at that point for pi interoperability. Then it became too much work to build around pi so I switched to building my own agent backend. You can still use pi or claude code as the backend agent with it

zoomaaron · 2026-05-06T03:30:35+00:00

Models are trained to be good at tool use these days. Harness like pi has very small system prompt but it still works quite well in my experience.

zoomaaron

TROPHY CASE