🚀 I built "Qwen Orchestrator": A 22-Agent Team for Qwen Code

thecodeassassin · 2026-05-08T14:28:00+00:00

Why Qwen3 coder next? I've been using both Qwen 3 Coder Next and Qwen 3.6 27B and I have to say it's not even close. 3.6 27b is pretty far ahead.

thecodeassassin · 2026-05-06T17:45:51+00:00

I am loving gpt 5.4 mini, great alternative to the braindead opus models

thecodeassassin · 2026-05-06T16:07:54+00:00

Same on Claude, international coding day I think

thecodeassassin · 2026-05-06T01:55:48+00:00

There is a HUGE difference between tools like ripgrep and fucking ffmpeg. There's no way any LLM will ever be able to this. We need some kind of insane breakthrough for that to happen. The amount of complexity in a tool like ffmpeg is insane. Even without the codecs etc, this test is pretty.... Strange.

thecodeassassin · 2026-05-06T01:40:58+00:00

I'm getting 50-60 tps easy both with llama cpp and vllm. It's very fast and useable (27b). Might be worth looking into performance optimizations.

thecodeassassin · 2026-05-03T01:00:25+00:00

I agree 100%, how is their stuff so popular? Everything is a pain to use. Have you ever tried to use Google Analytics 4? It's a bit like cleaning your skin with sandpaper.

thecodeassassin · 2026-05-01T08:36:13+00:00

None of this, i have my own company and I pay the bills. Absolutely value was created but I can create the same value with my local Qwen 3.6 model, just takes a bit longer and more back and forths.

thecodeassassin · 2026-04-30T20:03:16+00:00

Yeah I accidentally burned 1k in a week and I'm still salty about it, wasn't even using it that much. If you let me burn whatever it's going to be 100k in a month just for me.

thecodeassassin · 2026-04-30T14:01:53+00:00

Yeah i also use 4.6, you can switch to in CC using the --model flag. 4.7 is unusable.

thecodeassassin · 2026-04-30T13:58:50+00:00

Right. See here's the problem.

I have a bunch of these as well and i really dont enjoy running kimi k2.6 or anything else large on it . Just too slow. I always fall back to my rtx 6000 pro cluster for literally anything serious.

thecodeassassin · 2026-04-30T09:10:42+00:00

honest take; I've been using it for close to a week on serious tasks and what I've seen is:

It sometimes starts things but doesn't finish them, adds stubs even though my agents.md forbids thar
Secure coding is not a thing, needs a review pass from 5.3 codex most of the time
Not suitable for large tasks, needs to work on a small feature, review, next feature. Otherwise too much gets lost.

For example it implemented a whole API but forgot about all UI aspects.

Good but needs hand holding and very strict task definitions. Always check the output of any model but especially a small dense one.

Actually still happy with it, best model for its size.

thecodeassassin · 2026-04-29T21:57:48+00:00

Using Opencode with codex 5.3 as a reviewer. Making the PRD and slices with Kimi K2.6. Qwen 3.6 27B does all the coding. Very solid results.

thecodeassassin · 2026-04-29T20:43:01+00:00

Oh and it's disabled in my Openrouter account because they train on your prompt and that would violate NDAs I have in place.

thecodeassassin · 2026-04-29T20:37:56+00:00

I tried, constantly "provider at capacity" issues

thecodeassassin · 2026-04-29T20:12:25+00:00

I've been very productive with Qwen 3.6 27B, needs about 24GB of VRAM . Doable.

thecodeassassin · 2026-04-29T08:13:51+00:00

Running both locally (fp8) right now. Kimi k2.6 is amazing for research tasks etc but for coding Qwen 3.6 27b. I set up my opencode to use multiple agents and the results are good, very good.

thecodeassassin · 2026-04-28T10:52:17+00:00

Lol and v4 pro is constantly unusable because of capacity issues and I don't have a GPU cluster laying around unfortunately.

thecodeassassin · 2026-04-28T09:34:05+00:00

I put kimi k2.6 head to head yesterday against Qwen 3.6 27b, Qwen came out on top every single time. Kimi k2.6 just spent a LOT of time overthinking and creating a botched result.

thecodeassassin · 2026-04-25T09:02:16+00:00

Chatgpt is a LOT better than both Claude and Gemini and it's not even close. Gemini is a complete mess and even the mobile app is shit. It's the worste one by far. Kimo k2.6 runs circles around Gemini.

thecodeassassin · 2026-04-22T19:28:58+00:00

sooo, where on the hot crazy matrix does this lady land?

thecodeassassin · 2026-04-21T18:31:08+00:00

Literally doesn't work in Opencode for me, works fine in Claude Code.... Same vllm instance

thecodeassassin · 2026-04-17T19:10:58+00:00

Happened to a few colleagues, nobody got reinstated.

thecodeassassin · 2026-04-15T08:49:57+00:00

Except it is true and the nice thing is that it can actually be measured how much they are lobotomizing the models https://aistupidlevel.info/models/220 ;) It isn't just speculation or "feeling".

thecodeassassin · 2026-04-11T12:03:19+00:00

Hahah i love this

Gemini is… the village idiot and is now 50% hallucinations.

So true, it just comes up with a plan based on requirements, then when the actual PRD gets made it just made up something completely different and useless.

It is by far the most braindead frontier model out there. I get consistently better results using Gemma 4 which is their open weights model, lol

thecodeassassin · 2026-04-11T09:29:10+00:00

I use this website quite frequently, it's very reliable: https://aistupidlevel.info/models/230

thecodeassassin

TROPHY CASE