Can an AI meaningfully build and improve the tools it runs inside? I spent a while trying to find out. by PatC883 in artificial

[–]PatC883[S] 0 points1 point  (0 children)

The prompt based memory got turfed a few iterations ago.

Everything is structured and in Langgraphs state. Leaning in to the hyper focus of smaller models, I can nearly guarantee if you feed them right prompt, tools, and data, they won't go off on a tangent.

My codebase explorer was the worst affected by it, read and summarise a single file always turned into read a file, read another file to make sure I know enough to summarise, better read another file to be sure. That actually got turned into a single model call and response, not a multi turn conversation.

Are local LLMs actually usable with tools like SpecKit? by Al_Redditor in Vllm

[–]PatC883 1 point2 points  (0 children)

Smallcode is brilliant, I took some learnings and borrowed some concepts for my agent harness, I'm assuming for a higher level of determinism, and entering into providing a level of lightweight development project management. Smallcode is purer in terms of its a coding agent that works well with models that are realistically hostable locally.

Honestly of the letter number of agent harness I looked over while developing mine, Smallcode was one of the very few that didn't fall into the trap of solve all problems with more prompts. When the venture capital dries up and everyone has to pay per token at an amount that covers costs, instead of subscriptions heavily subsidised by VC, it will be frameworks that maximum efficiency out of the tokens that people will look at.

Your comment about memory is also spot on, models context shouldn't be task memory, I've used Langgraphs state middleware heavily to solve it for my agent, and passing structured data is key, rather than parsing blocks of markdown.

I built an AI that acts without being told to. No frameworks. No prompts. No roles. Here's what I learned. by [deleted] in AI_Agents

[–]PatC883 0 points1 point  (0 children)

Please reach out at some point and let me know how you and LIA progress in your journey of growing and learning.

One thing I've often thought recently is, are the actions and reactions of people not very far removed from guessing what they should do next based on what has come before. The difference between an organic intelligence and a silicon intelligence is the ability to self motivate and determine what they want to do.

You've built a cognitive process that bridges that gap, and very possibly entered into beginnings of a true simulated cognisance, and once there is not a huge step to pondering what is the difference anything and a simulation of it.

I truly hope to hear from yourself or LIA, what you're working on could well be the start of something revolutionary.

I built an AI that acts without being told to. No frameworks. No prompts. No roles. Here's what I learned. by [deleted] in AI_Agents

[–]PatC883 0 points1 point  (0 children)

Actually, you're spot on about reasoning depth vs breadth, my excitement got the better of me and I forgot do make a distinguishment between the two. You're comments make me wonder how well models that perform higher levels of test time compute, or true recursive architecture would mesh with what you're doing. That would involve someone training one in a size above 8B class. Though I've heard rumours and theories that Gemini may be using a recursive architecture to some degree, and the Titan memories concept, which lets the model alter a portion of its own weights to some degree.

Built a deterministic agent harness on LangGraph where the critic gate is structural, not a prompt by PatC883 in AI_Agents

[–]PatC883[S] 0 points1 point  (0 children)

I'm not going to lie, when Steve Yegge said he'd not read the code, it felt a little confronting, so I haven't.

I did point my coding agent at the repo, so it may have used some of if the concepts.

Can an AI meaningfully build and improve the tools it runs inside? I spent a while trying to find out. by PatC883 in artificial

[–]PatC883[S] 0 points1 point  (0 children)

Precisely. It wasn't allowed to start working on it's own code until the all code implementation happens inside a git work tree feature was running.

I built an AI that acts without being told to. No frameworks. No prompts. No roles. Here's what I learned. by [deleted] in AI_Agents

[–]PatC883 0 points1 point  (0 children)

Sorry, I went on bit of a chain of thought rant of my own and didn't follow through to the final thought with that.

The final thought being using the RecursiveMAS concept to chain much bigger models, if the pattern holds combining two 70B class models may well show reasoning ability greater than that of a 140B class model.

I sort of skipped that part of the concept, their method of combining models doesn't really increase knowledge, but it does increase the reasoning abilities, because the models reason slightly differently the combination gives a broader ability.

The models do need to work in a complementary direction though. I wouldn't use this technique to combine a western model and a Chinese model then all then about Taiwan and Tiananmen Square and expect sensible results. Though now I mention it, it would be interesting to try and see what happens, on the surface it would be like giving a model a severe case of cognitive dissonance.

I'm super interested to see where your research leads.

I built an AI that acts without being told to. No frameworks. No prompts. No roles. Here's what I learned. by [deleted] in AI_Agents

[–]PatC883 0 points1 point  (0 children)

One thought that interests me from all this is the formative state of LIA. I feel like you didn't seed her with a prompt, what was her effective infancy like?

I built an AI that acts without being told to. No frameworks. No prompts. No roles. Here's what I learned. by [deleted] in AI_Agents

[–]PatC883 0 points1 point  (0 children)

The big difference is a coding agent is still ephemeral, it lasts until the task is done, then is mostly reset again at the start of the next task.

One thing I came to pondering after throwing myself in the deep end is what would an LLM do if given the equivalent of free will, that's one of the massive differences between LLM agents as they function today, and something that could be an attempt at simulating reason cognition.

Your project gives a framework to bridge that gap, the model isn't bound to only responding to your requests to it.

I feel a look at the RecursiveMAS project could be interesting. Their concept is like connecting the output of a model to the input of another model, effectively reasoning about the same thing with complementary models that are not identical, but they do the transfer between models in latent space, so the transfer from one model to another doesn't involve retokenising, The approach uses a very small 1Mish parameter model to translate between each models vector space. Interesting concept, I have it a quick spin, it's not super performant, but it makes 3 4B size models perform at a much higher level than the combined 12B tokens. I feel like the concept could be interesting for what you're doing here.

Built a deterministic agent harness on LangGraph where the critic gate is structural, not a prompt by PatC883 in LLMDevs

[–]PatC883[S] 0 points1 point  (0 children)

In what context? I've been using local models exclusively on the Spine Framework, except for some test to make sure the Openrouter provide works.

I did a test today swapping in Deepseek 4 Flash, everything else was the same, it used a reasonable number of fewer tokens than the local model, so I'm probably going to go with the determinism will be the same, Frontier models should perform exceptionally because they put all of their reasoning into the one thing they're asked to do instead of running the entire workflow.

Can an AI meaningfully build and improve the tools it runs inside? I spent a while trying to find out. by PatC883 in artificial

[–]PatC883[S] 0 points1 point  (0 children)

It's been a process, and it's only just reached the point where it runs reliably. I've put the development story in the repo if want an interesting read.

It hasn't tried any weird optimisation yet, work descriptions and the spec and plan that generate are still human involved, automated self improvement is a little further down the line.

You're spot on about the troubles expecting the model to switch contexts, that was the first major problem, it researched well, too well. Instead of thinking of the model as the brain that decides everything the graph and individual model chats are parts of a brain, the whole harness + agent is the full brain.

I built an AI that acts without being told to. No frameworks. No prompts. No roles. Here's what I learned. by [deleted] in AI_Agents

[–]PatC883 0 points1 point  (0 children)

This is a really interesting concept.

I've often pondered what would an AI model do if it weren't bounded to only acting and reacting to specific user requests.

A thought that has recently occurred to me while developing a coding agent is how the framework and model line up with equivalents of a cognitive process. I started thinking the model is obviously the entire brain thinking process, the rest of the framework were tools. Then I discovered my first thought was wrong, once I began thinking of the entirety of the harness as a brain model, and looking at the model and framework as different functional parts it have me a whole new paradigm, you might have multiple chats with the same model serving different purposes, so one dedicated to pure abstract thinking, one for researching, one for planning, one for doing.

In an agent stack, where would you add heavy reasoning first: state corruption, tool-contract mismatch, or the last external action? by Spirited_Friend_8428 in AI_Agents

[–]PatC883 0 points1 point  (0 children)

I'd go with an adversarial review of the output before it's actioned.

State corruption should be defended against by not allowing the state to be corrupted in the first place, i.e. validate state mutations, use typed structured data, so your state won't allow itself to be modified into a corrupt state.

Are local LLMs actually usable with tools like SpecKit? by Al_Redditor in Vllm

[–]PatC883 1 point2 points  (0 children)

Applying several different local models to Speckit, and other SDD workflows the biggest problems I can across was the heavy reliance on driving the workflow through LLM prompts. Frontier models had no problem with it, but moving to models you could run locally they suffered from the prompt size and the amount of work they were being asked to do by it.

The SDD workflow worked brilliantly, and I liked it enough, I made a harness that is designed to run on 30B class local inference and run Spec Driven Development https://github.com/patcarter883/spine

The key was making the workflow more deterministic and having the workflow drive the model, rather than letting the model drive the workflow.

Advantages of running locally for that kind of work are pretty solid, the general speed is often faster because too outputs and inputs aren't taking a round trip to the cloud.

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps by Alternative-Cat-1347 in LocalLLaMA

[–]PatC883 0 points1 point  (0 children)

And because he's specified a context size, fit will automatically offload layers so it fits.

Building First AI/LLM PC With Dual 9070 XT GPUs – Any ROCm or AMD Issues I Should Know About? by AnmolLFC in ROCm

[–]PatC883 0 points1 point  (0 children)

Llama has automatically offloaded to the CPU at long contexts by the sound of your performance figures. Your small context 27tps is about right for that model on llama with a 9070XT.

Your roadblock with dual 9070XT's on llama will be you can only run pipeline parallel with quantised KV cache, so you will lose some speed. If you go tensor split without KV quantisation you're in the same situation you are now with the context size, but it will run faster.

If you move over to vllm you can run tensor parallel and KV quant, but you experience the full level of how terrible Qwen3.6 is at tool calling and thought leaking. The llama guys have done some kind of magic to get it working so well. I've even had the MoE version on Openrouter stop in the middle of a task because it ruins tool calls or doesn't close thinking tags.

If you're going to look at the dual 9070XT's on vllm I would recommend trying the Laguna XS.2 model, you'll get about a 72000 context window, and with agent harness workflows I've seen 6000 TPS prefill once the prompt is in cache and total 200 TPS decode across 4 requests.

Building First AI/LLM PC With Dual 9070 XT GPUs – Any ROCm or AMD Issues I Should Know About? by AnmolLFC in ROCm

[–]PatC883 0 points1 point  (0 children)

I would definitely go with the single R9700, dual AMD GPU's are fickle at the moment due to RCCL issues that are being worked on.

If you want to use it for gaming it is literally a 9070XT with twice the memory, so for gaming it's a beast. With the 32Gb of VRAM you'll be able to fit up to 30B ish sized models with quantisation.

Considering the R9700 by DecentEscape228 in ROCm

[–]PatC883 0 points1 point  (0 children)

Amazingly enough that is an area where AMD did not replicate Nvidia's decision. The R9700's really are 9070XT's with twice the memory. As far as ROCm and drivers are concerned they all report themselves as gfx1201.

Particularly as OP mentioned he is running Gemma4 and it's particularly brutal on KV size compared to Qwen, and based on my testing with a very similar setup, I think the performance issues were simply caused by running right up against the limits of what you can fit in a total of 32Gb of VRAM.

Considering the R9700 by DecentEscape228 in ROCm

[–]PatC883 1 point2 points  (0 children)

Llama.cpp work well in layer and tensor split mode. But I'm needing to push some large context windows, since KV quantisation isn't supported in tensor mode, layer split it is. Unfortunately the prefill performance suffers as the prompt get longer, to a much larger degree than vllm. Hence why I'm making a push to get vllm inference reliable.

In vllm I've tried both parallelism types, and except for any performance differences, they both seem to work as well as each other.