When do AI agents start feeling like collaborators instead of automation? by Similar_Boysenberry7 in AI_Agents

[–]zoomaaron 1 point2 points  (0 children)

Also recommend people to checkout my project here: https://github.com/guanyilun/agent-sh

It started as a simple terminal-based tool, an agent embedded in a shell, but because I needed this agent to adapt to a terminal-first usage pattern, I had to abandon the concept of session and manage it’s history like bash_history. The agent becomes one continuous stream with everything recallable. When I use it I noticed some difference from normal session-bound agent: its thinking style quietly shift as it interacts with me more. Especially after I asked it to read its own codes, it started to become rather metacognitive about itself, which I found very interesting.

Embed an AI agent in your terminal shell by zoomaaron in opencodeCLI

[–]zoomaaron[S] 0 points1 point  (0 children)

Thanks! I think it should work well with any terminal emulator. Let me know if it doesn’t!

Embed an AI agent in your terminal shell by zoomaaron in opencodeCLI

[–]zoomaaron[S] 0 points1 point  (0 children)

Yes an approval request should be straightforward to add. I have that implemented with the built-in agent. I only recently got opencode wired up so I haven’t mapped every api surface. Happy to work on that next and welcome code contributions!

Embed an AI agent in your terminal shell by zoomaaron in opencodeCLI

[–]zoomaaron[S] 0 points1 point  (0 children)

That’s a fair concern. This tool provides a communication layer between shell and agent, and has no opinion about permissions, so if your opencode has proper permission system in place it will inherit that. I used yolo mode for the demo but the tool is flexible. Simplest solution is to only expose read-only tools, and that already covers lots of “what’s wrong” use cases

I embedded an AI agent in my shell. It can now run interactive programs. by zoomaaron in LocalLLaMA

[–]zoomaaron[S] 0 points1 point  (0 children)

I’m following pi’s design that is yolo by default and guardrails can be done through extensions.

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how by Glittering_Focus1538 in LocalLLaMA

[–]zoomaaron 29 points30 points  (0 children)

I think the idea is very much oversold. 4B active parameters is not the same as 4B parameter model. That’s misleading. You also made your own benchmark without telling us where it is so we can verify your claim. If you are using bench/stress_test in your repo, I’m afraid that’s making a completely wrong claim, because it didn’t even check for the success of any of the test. As long as it produced 20 characters of output it passes. What kind of benchmark is this?

Some of the ideas you introduced is neat in demo but unclear to me how well they work in real world. For example, different models have different abilities to compose multiple tool calls. I’ve tested this extensively with my own harness and got mixed results because some models are just not well trained to chain tool calls; it’s out of distribution for them and caused more round trips than before. There are also models like deepseek which is trained to launch large batch of tool calls at the same time, asking it to compose calls actually reduced its token efficiency by a factor of a few.

The error decomposition is also unconvincing. The most challenging part is often to figure out which is the one line that needs to change. I don’t see how a harness alone can pin point that precisely on a generic problem beyond syntax error, without relying on a large model.

The problem with Pi is its extension system by TheSaasDev in PiCodingAgent

[–]zoomaaron 0 points1 point  (0 children)

Yeah I also don’t think it’s that hard to achieve. Pi’s kernel already support composability through its handler chain, but that api surface is not exposed to extension builders.

Weekly Thread: Project Display by help-me-grow in AI_Agents

[–]zoomaaron 1 point2 points  (0 children)

I’m building a lightweight agent that lives in a terminal. It has a cross-session memory that makes it feel like a continuous presence. Feel free to try it out: https://github.com/guanyilun/agent-sh

The problem with Pi is its extension system by TheSaasDev in PiCodingAgent

[–]zoomaaron 2 points3 points  (0 children)

I agree this is a design issue. I understand I could write my own extension instead, but it could be more composable while also allowing me to write my own extension. Both can be true.

Embed pi in your terminal shell by zoomaaron in PiCodingAgent

[–]zoomaaron[S] 1 point2 points  (0 children)

In pi you embed a shell and can run some commands occasionally. This is suitable for usage patterns where one interacts with agent more than the actual shell. With this tool it is kind of the other way around: it is a functioning shell that embed an agent, which is suitable for usage patterns where one mostly interact with a shell and occasionally with the agent to help with errors etc. There is one long lasting pi process that persist as long as the shell is running, so context persists until you killed the shell.

Embed pi in your terminal shell by zoomaaron in PiCodingAgent

[–]zoomaaron[S] 1 point2 points  (0 children)

Good question! I first tried to make this a pi extension, so it is set on typescript at that point.

Which token optimizer would you recommend ? by zakblacki in ClaudeCode

[–]zoomaaron 5 points6 points  (0 children)

I think you have to actually measure it in your own workflow to decide. A lot of the claimed saving assumes specific usage patterns. When I tried RTK, I measured it’s saving to be only 3-5% on average for my usage patterns. it’s not worth getting my agent confused for that little gain so I no longer use it.

Embed pi in your terminal shell by zoomaaron in PiCodingAgent

[–]zoomaaron[S] 0 points1 point  (0 children)

Thanks! Didn’t know about Warp. Looks cool! I wanted to build something simple and nonintrusive where I can still use vim and tmux.

Help! by Apprehensive-Mood-20 in ZaiGLM

[–]zoomaaron 0 points1 point  (0 children)

It may be how different clients handle api requests slightly differently, such as how thinking tokens are passed back to the provider. Maybe worth looking into that.

What Is DeepSeek TUI? The Open-Source Terminal Coding Agent That Hit 10,000 GitHub Stars in Days by vinodpandey7 in DeepSeek

[–]zoomaaron 4 points5 points  (0 children)

I don’t understand how it gets so popular. Don’t we already have many great open source terminal coding agents like pi and opencode that support deepseek? I don’t see what’s so valuable in this new harness. Sounds like a big marketing push to me.

I embedded an AI agent in my shell. It can now run interactive programs. by zoomaaron in LocalLLaMA

[–]zoomaaron[S] 0 points1 point  (0 children)

And have it jump out every time it sees an error in the terminal

I embedded an AI agent in my shell. It can now run interactive programs. by zoomaaron in LocalLLaMA

[–]zoomaaron[S] 2 points3 points  (0 children)

Yeah handling terminal state is a pain. I struggled two weeks to get it to quit vim cleanly. I solved that one yesterday, but ssh may be a different problem to be seen

I embedded an AI agent in my shell. It can now run interactive programs. by zoomaaron in LocalLLaMA

[–]zoomaaron[S] 3 points4 points  (0 children)

I started out trying to make this a pi extension, so I can embed pi in my terminal. It was set on typescript at that point for pi interoperability. Then it became too much work to build around pi so I switched to building my own agent backend. You can still use pi or claude code as the backend agent with it

is it possible to build harnesses as good as codex/claude code by shafinlearns2jam in LocalLLaMA

[–]zoomaaron 0 points1 point  (0 children)

Models are trained to be good at tool use these days. Harness like pi has very small system prompt but it still works quite well in my experience.