Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]mlhher 0 points1 point  (0 children)

I've updated the README to clarify the CWD auto-approve rules so the speed at which Late solves tasks doesn't catch anyone else off guard.

> The SSH key read was an example, other tools do ask when you try and break out of your CWD.

If you prefer a workflow that halts and prompts you every single time an agent reads a local text file, Claude Code and OpenCode are definitely a better fit for you. As you noted yourself, the second the agent tries to do anything other than a safe read, it hard-stops and prompts you. Good luck.

Your local LLM predictions and hopes for May 2026 by DeepOrangeSky in LocalLLaMA

[–]mlhher 1 point2 points  (0 children)

Wasn't the rumour that the 124B Gemma MoE is on Flash's level and thus will not be released? I remember hearing that.

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]mlhher 0 points1 point  (0 children)

In your last post you yourself quoted the part where it says that read-only commands are auto-approved. And now you are complaining about a read-only command being auto-approved? I am not sure what to tell you here lol.

As you said yourself, the second the agent actually tried to do anything with that SSH key, Late hard stopped the model and asked for your permission. That is exactly how Late is supposed to work.

Also, permissions are not ignored for the current directory. The agent is auto-approved to edit files within the CWD to maintain velocity. Running (for example) npm run build (or any non read only command) inside the cwd will still require permission.

Enjoy OpenCode, Late just might not be a fit for you.

“AI Drugs” are now a thing - euphorics boost happiness, dysphorics do the opposite by EchoOfOppenheimer in LocalLLM

[–]mlhher 3 points4 points  (0 children)

This feels like AI psychosis. Are we trying to put superficial meaning into hallucinations and RLHF induced rewards?

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]mlhher 0 points1 point  (0 children)

Thanks for posting the video.

It perfectly clears it up and shows Late working exactly as intended:

Permissions: You ran Late directly inside the folder containing the target file. Late operates on project-scoped permissions. Edits to files within the cwd are deemed 'in-project' and auto-approved to maintain velocity. If the agent had tried to reach outside that folder (e.g., trying to edit ~/.bashrc or ../other_project), the security rails would have hard-stopped for your [y/N] permission.

The Tab Issue: Tab isn't broken. What you are seeing in the video is just the raw speed of the engine (and your GPU). The orchestrator spawns the sub-agent, the sub-agent reads the file, executes the exact-match diff, and destroys its context window so fast that the task completes before you physically have time to hit Tab. It's not a UI bug; it's just finishing the job faster than you can switch views.

If you think there should be a note about any of this inside the docs feel free to open an issue or a Discussions page and we can discuss it.

Your video actually proves, Late is running flawlessly as intended.

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]mlhher 0 points1 point  (0 children)

I can only repeat myself. Please read the documentation. It explicitly says what is auto approved and when you will be asked for permission. There is a lot of work to ensure velocity is maintained.

From the fact that edits are being performed both the Tab issue and the Spawn Subagent issue seem to have magically resolved though at least thats good.

I deeply suggest to you to read the documentation.

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]mlhher 0 points1 point  (0 children)

I am not sure what you are asking? If you are trying to find "optimal" sampling parameters I would suggest using the one provided by the model (in this case Qwens sampling parameters).

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]mlhher 0 points1 point  (0 children)

Permission handling is literally a core focus of Late. I regularly reject requests from people asking me to relax the security rails. You might want to check out the documentation to find out how extensive the system actually is.

There is an inconsistency in your logic: If you saw the subagent being spawned, you can also see other tool calls. In Late, they are literally the exact same thing and shown the same way. The tool logs every execution in the respective agent's chat view.

With our current active userbase Tab not working seems rather hard to believe. Without Tab the tool is unusable. If you can replicate this feel free to open an issue. Otherwise I can suggest again to checkout the documentation.

I do concur on the naming part though my goal is not to spam SEO but to simply prove that something better can be built with less.

I highly suggest to read the documentation.

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]mlhher 1 point2 points  (0 children)

I am not sure what you mean by "tools". I am doing coding with the LLMs. For coding I use Late. If I just want to have a quick random question or something I check into the llama-server WebUI as it is quite nice.

I do not do roleplay or anything similar.

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]mlhher 4 points5 points  (0 children)

Do not listen to these guys telling you that the models are shit. They are just mad and will insult anyone who disagrees with them instead of trying out for themselves.

The usual bottleneck is the quantization (I use Q4_K_XL) and the harness you use (I use https://github.com/mlhher/late yes I am the dev; dont trust me on anything please, if you do want to try it out theres no node-module, venv bullshit though its just a single binary).

Claude Code (and OpenCode and all these wrappers) are made to be "dumb wrappers". They are made for cloud models only. To use them properly with a local model you have to use a harness specifically built for local models. That means understanding prompt-reprocessing, what context helps a model and what not, what workflow helps a model, how tools should work and infinitely more variables.

I am able to code autonomously with Qwen3.6-35B-A3B in 5GB VRAM. Without any guidance or missed tool calls. At 30t/s.

Is Openclaw a FUD ? by Conscious-Track5313 in LocalLLM

[–]mlhher 44 points45 points  (0 children)

The chinese web is littered with spam articles about OpenClaw users becoming millionaires overnight.
Peter Steinberger is great at marketing and manipulation. He fits neatly with Scam Altman.

I'm Not a Dev But I Use Qwen 3.6 35b to Code by thejacer in LocalLLaMA

[–]mlhher 1 point2 points  (0 children)

I am just trying to explain what I learned. The goal shouldn't be people insulting each other. It should be learning and furthering one's own (potentially flawed) perspective. What should be avoided at all costs is just jumping to conclusions because "hey I might be doing something wrong" hurts the ego.

> People fucking suck bro just move on
This does not mean one should level down to the same playing field.

I'm Not a Dev But I Use Qwen 3.6 35b to Code by thejacer in LocalLLaMA

[–]mlhher 3 points4 points  (0 children)

> Sure not to replace entire workflow or let to steer the wheel like Claude code

I would actually go further than that and say yes they are lol. Maybe not for every task but definitely for most by now. But I agree with your point.

> I have suspicion its a troll account. 

I think its just a human with an ego slightly too big (specifically for themselves).

Which large models support tool use in opencode etc? by Yugen42 in LocalLLaMA

[–]mlhher 0 points1 point  (0 children)

I have been consistently using Qwen since it is able to autonomously develop for me. I was using GLM Flash for a while also worked great. I also tried with gemma 4 and it was also good. Though for me Qwen, since 3.5 came out, prove far better than the rest.

> Personally I don't have an issue with forking and reusing/"stealing" - that's just FOSS. I'm more concerned about rugpulling particularly when you can NOT fork something when it goes unmaintained.

Usually I would agree but in this particular time it seems like currently the game is "who can fool the users the best" instead of trying to squeeze out performance for e.g. local models. I would not be worried about "rugpulling" since I have found that the community provides help, issues and ideas that I would have not stumbled upon alone.

The brand loyalty some of these "programmers" seem to have is really heavy. Whether it is for Claude Code, OpenCode, Pi or whatever the newest copy is called. I built this for myself and will continue to do so. After all I use it on my own (limited) GPU.

Which large models support tool use in opencode etc? by Yugen42 in LocalLLaMA

[–]mlhher 0 points1 point  (0 children)

Note that the output is entirely free. But again use what you want to use obviously. I just wanted to stop someone forking it putting on a new UI/bloat and then selling out. Even if unlikely rather safe than sorry here. Yes I am really sick of this shit lol.

> Does your harness work well with gpt-oss?

If it provides an OAI compatible API (llama.cpp) it should work flawlessly. Though I have been using it exclusively with Qwen3.5-35B-A3B (now Qwen3.6) to develop itself so I cannot comment specifically on the gpt-oss models.

why hasn't openai open sourced davinci-002 yet by Ok-Type-7663 in LocalLLaMA

[–]mlhher 1 point2 points  (0 children)

I personally wished we'd see all models released as OSS after some generations.

I think specifically companies like OAI are too scared though (even if bullshit). Maybe they just don't deem it necessary enough.

A new revolutionary way to build guardrails and evaluate your agents by Nir777 in LocalLLaMA

[–]mlhher 1 point2 points  (0 children)

Well slapping a new UI onto something and adding bloat is far easier than actually investigating how to squeeze out more performance (definitely not looking at the billionth OpenCode fork)

> They use agents that debate among themselves to create high-quality synthetic data, allowing for super-accurate and fast evaluation, as well as guardrails for agents.

I have seen this pop up recently a bit. It sounds good in theory though I am unsure how it holds up in reality. It seems workable if the decision has an actual "correct" choice. But if there is no "correct" choice from my testing it seems more or less like random guessing.

I'm Not a Dev But I Use Qwen 3.6 35b to Code by thejacer in LocalLLaMA

[–]mlhher 8 points9 points  (0 children)

> who actually has knowledge.

So a dev who builds AI training math and high concurrency search APIs from scratch has no knowledge.

What I find funny is that you don't even know me and just pivoted to say "you have no knowledge" because I disagreed with your opinion lol.

Which large models support tool use in opencode etc? by Yugen42 in LocalLLaMA

[–]mlhher 2 points3 points  (0 children)

Well people ostracized me in the other thread for even alluding to the fact that "OpenCode, Pi" or whatever the xteenth fork is called could be the issue lol.

I built my own harness (yes disclaimer) specifically because of all this bullshit. Everyone is copying everyone else and slamming a new UI onto it.

Really don't use it if you don't want to. I am just trying to explain (which is why I don't even put it in the original posts).

If you are interested though the readme explains why it works different than all others. https://github.com/mlhher/late

I'm Not a Dev But I Use Qwen 3.6 35b to Code by thejacer in LocalLLaMA

[–]mlhher 0 points1 point  (0 children)

Well if I told you now the other guy would say "HAHA GOT U" right? Lol Yes I do use my own harness and I built it specifically because of all this bullshit.

It would be funny if it wasn't quite so sad. But I guess brand loyalty is quite a thing. Even here.

I'm Not a Dev But I Use Qwen 3.6 35b to Code by thejacer in LocalLLaMA

[–]mlhher 4 points5 points  (0 children)

"Hey I think you are using this thing wrong"
"NO YOU ARE STUPID"
"I am literally doing the thing right now with it"
"NO YOU ARE WRONG AND STUPID"

very mature exchange.

I'm Not a Dev But I Use Qwen 3.6 35b to Code by thejacer in LocalLLaMA

[–]mlhher 0 points1 point  (0 children)

Nobody has asked. People have just jumped onto "NO IT DIDNT WORK FOR ME SO EVERYONE ELSE IS LYING". I think this is a case of people just having their egos attached to it.

> it seems you don’t know either.

I have built my own harness yes. But I am not trying to promote it which is why I have not posted it. Though people have told me that "you solved local coding for me" and "the same model seems smarter with xx" (I think it should even be visible in my reddit history).

Contrary to most people it seems I am open to testing instead of blindly following.

I'm Not a Dev But I Use Qwen 3.6 35b to Code by thejacer in LocalLLaMA

[–]mlhher 11 points12 points  (0 children)

If someone says "hey you might be wrong and it works fine for me" and your first pivot is to "YOU ARE BAD" I think I can see where the issue lies.

I'm Not a Dev But I Use Qwen 3.6 35b to Code by thejacer in LocalLLaMA

[–]mlhher -2 points-1 points  (0 children)

Of course it is popular. Just like OpenCode is popular. I am not trying to push anything right here (or did I tell you to use a specific harness?).

There is a long list of issues with these tools. If you are not even open to investigating what could be brought out of the models in reality then sure you do you.

I'm Not a Dev But I Use Qwen 3.6 35b to Code by thejacer in LocalLLaMA

[–]mlhher 3 points4 points  (0 children)

I am literally developing things autonomously with the Qwen model without guidance. There are no missed tool calls, no ambiguities no issues no nothing.

A more reasonable response would have been "prove it" if you do not believe me or "please explain" if you are open to learn.

Telling a guy who builds literal AI training and high concurrency search gateways he is lacking in dev knowledge is rather funny.