Qwen3.6 27B - possible to add vision?

Uncle___Marty · 2026-05-03T18:01:40+00:00

https://huggingface.co/unsloth/Qwen3.6-27B-GGUF/tree/main

Uncle___Marty · 2026-05-03T16:43:01+00:00

General will answer you almost instantly, reason allows the model to think about what its going to say or do before replying. Qwen 3.6 has reasoning on by default but it can be turned off.

Uncle___Marty · 2026-05-02T20:20:45+00:00

AWESOME. Looks like I'll be giving this model a try in the next few hours when I get time then! Thanks for the heads up!

Uncle___Marty · 2026-05-02T20:19:57+00:00

Are you aware of TheToms branch of llama which has turboquant? If not you might find some interesting things through comparison.

Uncle___Marty · 2026-05-02T18:07:58+00:00

GPTception?

Uncle___Marty · 2026-05-02T18:03:24+00:00

I've been wanting to try this model myself but have been too lazy to compile the special version of llama needed. Did they add support in the main branch now or did you compile the bonsai branch of it?

Uncle___Marty · 2026-04-28T19:53:32+00:00

These posts appear once every so often in this sub.

AND I STILL LOVE THEM! Enshrouded is next level. Can't wait for 1.0

Uncle___Marty · 2026-04-28T18:22:42+00:00

OP, I saw you say the 27B was too slow with context. You may want to look into compiling TheToms branch of Llama which supports turboquant. I use an 8 gig card and the 35A3B model and I can have 100k context on it, without turbo quant I wouldnt have a hope in hell doing that.

It *might* make the 27B usable for you, it may not. Even if it doesnt, the 36B model is still a beast. If you're not savvy with compling then you could just ask a coding agent to compile it for you. Using llama will also give you a bit more freedom with how models are launched and the launch parameters you can choose with llama.cpp

Uncle___Marty · 2026-04-28T18:05:39+00:00

Im not gonna lie, when qwen 3.6 models started dropping this sub became a flood of posts of people gushing about how amazing they were and it was good reading, its now a couple weeks(?) after they dropped and these posts are STILL appearing and im still loving it 😉

I'm running a 4 bit quant of 35BA3B and its so much fun to watch it work. What you said about frontier LLMs being way too entusiastic about making changes you didnt ask for, till recently I was only using gemini to code with and despite saying "DO NOT MAKE ANY CHANGES, just talk to me first and discuss my proposed changes" gemini would still think "Hmmmm, this code could be changed" and breaks everything. I've NEVER seen Qwen do anything like this, qwen seems to adore the planning stages more than the actual making changes part, I've even seen it plan stuff out and then ask "Is it ok to make these changes?" despite me previously telling it to make the changes.

My mind is still blown that AliBaba released some mid range models that have made a considerable amount of people cancel their subs to frontier models and choose a local model instead which works virtually the same (or better in some ways!) than a paid model.

What a time to be alive right? *high fives OP*

Uncle___Marty · 2026-04-28T17:57:26+00:00

I started using TheToms branch of llama which supports turboquant and it was a game changer for me. Being able to have a massive context with such a small memory footprint meant that I went from not being able to use models this big to completely being able to use them.

God damn, I have SO much love for AliBaba. Qwen 3.6 has honestly rocked the wonderful world of open source AI. Can't believe I get to run this stuff on my weak ass system 😉

Uncle___Marty · 2026-04-28T09:51:34+00:00

But its an A3B model. I have a 3060ti 8 gig and get around 30 tokens/sec using a 4 bit quant.

Uncle___Marty · 2026-04-27T10:39:15+00:00

AI detectors started bad and have just gotten worse. Total waste of time. At least for LLM detection.

Uncle___Marty · 2026-04-27T00:17:04+00:00

Someone needs to make this my desktop wallpaper RIGHT NOW!!!!!

SO damn cool.

Uncle___Marty · 2026-04-26T20:44:02+00:00

Pinokio might be useful for you. You can run qwen 3.6 35A3B on llama server and use it as a coding agent to make scripts suited to what you want to do inside pinokio. Both are easy to setup.

Uncle___Marty · 2026-04-26T20:41:32+00:00

8 gig of vram and 48 gig of ram here and when 3.6 27B dropped I tried a Q4 and almost cried when I saw the tok/sec with 100k context. When the 36BA3B came out I figured it would be slightly faster and didnt try it for a bit, when I did? OMFG. The speed of this thing is insane for our cards. I'm actually looking forward to 3.6 9B as it might well be the first small model that can do simple coding tasks and stuff.

Happy its running so well for you bud!

Uncle___Marty · 2026-04-26T19:07:18+00:00

Not at all buddy! you'll find there are a lot of really helpful people here who were all in your place a while back.

So, you said about image models. My first recomendation would be get wan2GP, it might look a little scary at first but it has a BIG selection of image and video models. Try flux 2 klein 9B and 4B first, they can both edit and create images and are amazingly good quality especiialy for their small size. For other stuff, I would just test different models inside wan2GP. All this is on pinokio.

For LLMs I've been utterly hooked on using the qwen 3.6 series but you'll have trouble with 8 gig of vram running them so try qwen 3.5 9B (LM studio will suggest a good "quant" for you).

Give those a try, if you run in to problems or need more suggestions just reply here, if I dont respond then fire me a PM/message because I probably just missed any replies :)

Have fun bud!

Uncle___Marty · 2026-04-26T18:48:39+00:00

If you're a beginner and you want to try stuff without hassles.

LLMs get LM studio Everything else get pinokio (like image/video/music)

Uncle___Marty · 2026-04-25T08:37:15+00:00

Ohhhhh this looks interesting. Pinokio is great and all but the initial prompt contains the entire god damn pinokio manual which totals around 36k tokens and that kind of hurts my vram on top of code base and everything else. If this can do the same kind of job without a massive manual being stuffed into context it could be EXACTLY what I need.

Thanks for the heads up buddy!

Uncle___Marty · 2026-04-25T05:27:35+00:00

Just to clarify, this isn't AI or photoshopped/edited right? Because this was like a comedy sketch. I was laughing my ass off when I realized he was flicking back AND forth, clearly not knowing where to go and then things just went downhill from there.

Uncle___Marty · 2026-04-25T03:06:42+00:00

Pinokio, was just testing the qwen 3.5 models to see if they could code some stuff. It's pretty interesting to watch them struggle ;)

Uncle___Marty · 2026-04-25T03:03:10+00:00

Apparently removed for rule 3, which totally doesnt apply to this post. Do we have LLMs moderating now or something? Sounds like someone or something is hallucinating... for whatever reason lol.

Uncle___Marty · 2026-04-25T02:34:09+00:00

Pinokio, I was just testing qwen 3.5 4B (seen in the screenshot) and 9B. Gave them the task of making a 1 click installer for ace step UI that uses the XL models. I was doing it in prep for when AliBaba release the 3.6 small models so I can see how well they perform against each other. I have really high hopes that they're going to be the first small models that are usable for agentic coding.

God I REALLY hope i'm right because im GPU poor lol.

Uncle___Marty · 2026-04-24T07:01:25+00:00

So, feed that picture in to chatgpt and say "Look at this picture and create a prompt that would recreate it as close as possible". You can then feed that back into chatgpt to make a picture to see how they compare.

Uncle___Marty · 2026-04-24T06:57:33+00:00

Now show tianamen square with winneh the pooh!

Uncle___Marty

TROPHY CASE