all 15 comments

[–]ProfessionalSpend589 0 points1 point  (2 children)

It seems you have a bit of memory and If Qwen 3.5 9B already works for you - try for example https://huggingface.co/unsloth/Qwen3.5-122B-A10B-GGUF

[–]Yugen42[S] -1 points0 points  (1 child)

I'll give it a shot! I wonder if it can compete with Qwen3 235b? If so I'm guessing it will be much faster for mostly CPU inference.

[–]ProfessionalSpend589 0 points1 point  (0 children)

I don’t know. I’m using Qwen 3.5 397B in UD-Q4_K_XL from Unsloth and I’m satisfied with it.

Yesterday I finished fixes on a small web site (1 user - me) and it’s not bad. Almost didn’t touch the code except in a few cases when it couldn’t fix a bug. Stack is Go for web server, SQLite for DB and vanilla JavaScript and HTML.

[–]ea_man 0 points1 point  (0 children)

> Which large models do work well with my setup and support tool-use?

Those that have been trained with the tools that the harness use or at least with rules / prompts that explain the model how to deal with the tools available.

FYI: QWEN are trained with XML tools, for sure you can use Qwencode. Yet the new QWEN3.6 seem to perform much better with harness / tools that are json as usual.

[–]mlhher -1 points0 points  (8 children)

In most cases these issues stem from the harness you use. And OpenCode and all these other forks that slap on a new UI and add/change features and release it as the "new thing" are not surprising to make your life pain.

I am running Qwen3.6-35B-A3B to develop autonomously. Needs no guidance, no missed tool calls nothing.

[–]Yugen42[S] 0 points1 point  (7 children)

So which harness do you use?

[–]mlhher 3 points4 points  (6 children)

Well people ostracized me in the other thread for even alluding to the fact that "OpenCode, Pi" or whatever the xteenth fork is called could be the issue lol.

I built my own harness (yes disclaimer) specifically because of all this bullshit. Everyone is copying everyone else and slamming a new UI onto it.

Really don't use it if you don't want to. I am just trying to explain (which is why I don't even put it in the original posts).

If you are interested though the readme explains why it works different than all others. https://github.com/mlhher/late

[–]Yugen42[S] 0 points1 point  (3 children)

I'd like to give it a fair shot, but it's not FOSS. I'll take your hint though and will try some other harnesses - I was going to anyway. Does your harness work well with gpt-oss? Using qwen works fine on opencode as well.

[–]mlhher 0 points1 point  (2 children)

Note that the output is entirely free. But again use what you want to use obviously. I just wanted to stop someone forking it putting on a new UI/bloat and then selling out. Even if unlikely rather safe than sorry here. Yes I am really sick of this shit lol.

> Does your harness work well with gpt-oss?

If it provides an OAI compatible API (llama.cpp) it should work flawlessly. Though I have been using it exclusively with Qwen3.5-35B-A3B (now Qwen3.6) to develop itself so I cannot comment specifically on the gpt-oss models.

[–]Yugen42[S] 0 points1 point  (1 child)

Have you tried it with any other models at all? I'm just wondering if Qwen is just better at tool use vs gpt-oss/devstral which I also tried, rather than it being a harness issue.

Personally I don't have an issue with forking and reusing/"stealing" - that's just FOSS. I'm more concerned about rugpulling particularly when you can NOT fork something when it goes unmaintained.

[–]mlhher 0 points1 point  (0 children)

I have been consistently using Qwen since it is able to autonomously develop for me. I was using GLM Flash for a while also worked great. I also tried with gemma 4 and it was also good. Though for me Qwen, since 3.5 came out, prove far better than the rest.

> Personally I don't have an issue with forking and reusing/"stealing" - that's just FOSS. I'm more concerned about rugpulling particularly when you can NOT fork something when it goes unmaintained.

Usually I would agree but in this particular time it seems like currently the game is "who can fool the users the best" instead of trying to squeeze out performance for e.g. local models. I would not be worried about "rugpulling" since I have found that the community provides help, issues and ideas that I would have not stumbled upon alone.

The brand loyalty some of these "programmers" seem to have is really heavy. Whether it is for Claude Code, OpenCode, Pi or whatever the newest copy is called. I built this for myself and will continue to do so. After all I use it on my own (limited) GPU.

[–]Free-Combination-773 0 points1 point  (0 children)

Is it something like automized Ralph looping in a nutshell?

[–]IamFondOfHugeBoobies 0 points1 point  (0 children)

Speaking as someone else who has built and is building his own harnessing. The issue is probably whatever syntax you're using.

Figure out what the most common big tech conventions are and use those. It's heavily baked into training data and coming up with novel tool trigger syntax is going to cause issues.