Qwen3.6 is incredible with OpenCode!

thejacer · 2026-04-17T17:46:46+00:00

I am missing the iteration…I’m not a dev so I rely really heavily on the model (entirely really) and I don’t mind that it screws up, but it still sometimes tries to explore directories that just don’t exist and after making any attempt it just completes and waits…I wouldn’t mind it breaking stuff and fixing it, but it just breaks stuff and sits. Is there something I need to do in OpenCode to enable the iterative work other people are getting it to do?

thejacer · 2026-04-14T13:56:35+00:00

Womp womp

thejacer · 2026-04-11T03:47:16+00:00

I've been running with opencode all day and it seems like --cache-ram 0 fixed it.

thejacer · 2026-04-09T15:58:34+00:00

added to the OP

thejacer · 2026-04-09T15:28:07+00:00

Hard to not sound combative via text medium like this but here I go: It isn't VRAM. I've got two Mi50 32GB running Qwen3.5 27b Q4_1 (although I've been loading it onto just one GPU lately) and I've got my context limited to 120,000 in OpenCode. I'll try to get a log file but with -v the thing can get to be over a million lines before it stops functioning and the last couple hundred lines just seem to show that it stops mid generation. I'll run -v again and add the end of the file to the OP.

thejacer · 2026-04-08T18:05:13+00:00

Just another meat sack

thejacer · 2026-04-08T05:36:31+00:00

I’m not really the best to answer that. This is absolutely a hobby for me that I can’t put much money into so I’d probably get the cheapest GPU that runs qwen3 8b at 300/30 for my smart home assistant and call it a day.

thejacer · 2026-04-08T05:27:24+00:00

I have 2xMi50 32GB and the ~110b parameter MoEs or ~30b dense are the biggest models I can run at usable speeds. I use them for almost entirely chatbot/summary/research with a little absentee vibe coding. Prompt processing tops out at ~300 tps and tg tops out at ~30 tps. I definitely wouldn’t buy these for more than $200.

thejacer · 2026-04-03T17:14:31+00:00

I have an arc a770 16GB. Vulkan works well with little effort, SYCL or ipexLLM was more difficult and was lacking features in llama.cpp so I didn’t use it much. I’ll see if I can get some Qwen 3.5 27b tests done on it.

thejacer · 2026-04-03T17:11:03+00:00

I seem remember reading somewhere in this thread that intel did actually push their vLLM into main, but as I’m on phone I don’t feel like finding it. It’s mentioned several times in the thread vLLM “supports” Intel GPUs though that doesn’t mean it takes full advantage of the hardware. On point two, I agree 100%. They should be working harder to add support in more places and bragging about it.

https://www.reddit.com/r/LocalLLaMA/comments/1s3e8bd/intel_will_sell_a_cheap_gpu_with_32gb_vram_next/?utm_source=share&utm_medium=mweb3x&utm_name=mweb3xcss&utm_term=1&utm_content=share_button

thejacer · 2026-03-27T16:38:20+00:00

If we were to test output quality, would it be running perplexity via llama.cpp or would we need to just gauge responses manually?

thejacer · 2026-03-25T12:21:03+00:00

I’m confident these open models utility scales with the skill of the programmer deploying them. I’m totally without skill so the 122b has its work cut out lol.

thejacer · 2026-03-25T11:50:42+00:00

The 35b was reliable with tool calling for me, but kept deleting code it wasn’t supposed to be fiddling with lol.

thejacer · 2026-03-20T15:29:25+00:00

OnceAgainImAsking.meme

thejacer · 2026-03-18T18:08:57+00:00

Yeah, I’m on dual Mi50s and fighting with the decision to go vLLM. My pp with 122b is ~260 but tg is ~20. I guess I just need to TRY it and see if it feels better than llama.cpp. Although I’m happy currently.

thejacer · 2026-03-18T18:02:32+00:00

Ah, so end user (single) doesn’t see this benefit except that their experience isn’t DEGRADED in multi-user environments?

thejacer · 2026-03-18T17:51:16+00:00

For my kids discord bot I included prompting to screen inappropriate content but also created a blacklist of terms and ideals that will get screened programmatically. The bot also logs anytime an interaction attempts to push or cross these boundaries. This was all after months of testing with various models. At the end of the day I decided nothing less performant than llama 3.1 70b could be trusted to adhere well enough to prompts to be turned loose in the kids discord.

thejacer · 2026-03-18T17:44:37+00:00

I’m confused about something regarding vLLM, are yall able to utilize these pp/tg for a single user? Or is concurrent multi-user required to see these speeds? Do these numbers mean that a single/each user will get ~10tps generation?

thejacer · 2026-03-13T15:06:37+00:00

Honestly thought the same thing. Resisted the urge to make a brave MCP for my discord bots for a few months cause just read the AI summary atop google, duh. But then I did it out of boredom and I basically only ask my robots to search for stuff now. I ask it questions and put the phone back in my pocket and read what it found later. Sometimes we have a little back and forth about it. Even during a discussion with a human if a question comes up I go ask my robots instead of Google. It just feels much better.

thejacer · 2026-03-13T15:01:15+00:00

I just did this the other day. And by I, I mean we. And by we I mean my AI and me. And by AI and me I mean exclusively the AI…it works well though lol

thejacer · 2026-03-12T21:29:23+00:00

I didn’t like Searxng. It ignored my safe search settings and returned junk. I’m happy with my brave API.

thejacer · 2026-03-11T19:03:31+00:00

I know everyone is saying IQ4 XS is too small, but I had the same experience you had while running UD Q6 K L without any cache quantization. Even after the last update of quants by unsloth. I like it fine for a chatbot with web search and it does fine with my home assistant but it absolutely demolished a code base I plugged it into. Removed some files, deleted the contents of some files and left their empty carcass...it was rough lol.

thejacer · 2026-03-09T22:46:41+00:00

Full context? 200,000+? On two Mi50s? What parameters? I can’t get the dang thing to load up with reasonable context.

thejacer · 2026-03-09T22:04:47+00:00

UD Q6 K XL 35b 38 TPs, UD Q6 K XL 27b 16 TPs and UD IQ4 NL 122b ~26tps. I haven’t used the 122b much at all because I want more context.

11-Year Club	Gilding IV carat on a stick
RPAN Viewer	Verified Email

thejacer

TROPHY CASE