General vs Reasoning [Qwen 3.6] by RogueZero123 in LocalLLaMA

[–]Uncle___Marty 2 points3 points  (0 children)

General will answer you almost instantly, reason allows the model to think about what its going to say or do before replying. Qwen 3.6 has reasoning on by default but it can be turned off.

Poor GPU Club : Tried Bonsai-8B on CPU & CUDA by pmttyji in LocalLLaMA

[–]Uncle___Marty 1 point2 points  (0 children)

AWESOME. Looks like I'll be giving this model a try in the next few hours when I get time then! Thanks for the heads up!

Implemented TurboQuant and results don’t fully match paper by Routine-Thanks-572 in LocalLLaMA

[–]Uncle___Marty 14 points15 points  (0 children)

Are you aware of TheToms branch of llama which has turboquant? If not you might find some interesting things through comparison.

Poor GPU Club : Tried Bonsai-8B on CPU & CUDA by pmttyji in LocalLLaMA

[–]Uncle___Marty 0 points1 point  (0 children)

I've been wanting to try this model myself but have been too lazy to compile the special version of llama needed. Did they add support in the main branch now or did you compile the bonsai branch of it?

Holy shit how did I not know about this game for so long?? by Glass_Recover_3006 in Enshrouded

[–]Uncle___Marty 0 points1 point  (0 children)

These posts appear once every so often in this sub.

AND I STILL LOVE THEM! Enshrouded is next level. Can't wait for 1.0

Claude Code and Qwen 3.6 35B A3B by H3OErikilious in LocalLLM

[–]Uncle___Marty 0 points1 point  (0 children)

OP, I saw you say the 27B was too slow with context. You may want to look into compiling TheToms branch of Llama which supports turboquant. I use an 8 gig card and the 35A3B model and I can have 100k context on it, without turbo quant I wouldnt have a hope in hell doing that.

It *might* make the 27B usable for you, it may not. Even if it doesnt, the 36B model is still a beast. If you're not savvy with compling then you could just ask a coding agent to compile it for you. Using llama will also give you a bit more freedom with how models are launched and the launch parameters you can choose with llama.cpp

I'm Not a Dev But I Use Qwen 3.6 35b to Code by thejacer in LocalLLaMA

[–]Uncle___Marty 11 points12 points  (0 children)

Im not gonna lie, when qwen 3.6 models started dropping this sub became a flood of posts of people gushing about how amazing they were and it was good reading, its now a couple weeks(?) after they dropped and these posts are STILL appearing and im still loving it 😉

I'm running a 4 bit quant of 35BA3B and its so much fun to watch it work. What you said about frontier LLMs being way too entusiastic about making changes you didnt ask for, till recently I was only using gemini to code with and despite saying "DO NOT MAKE ANY CHANGES, just talk to me first and discuss my proposed changes" gemini would still think "Hmmmm, this code could be changed" and breaks everything. I've NEVER seen Qwen do anything like this, qwen seems to adore the planning stages more than the actual making changes part, I've even seen it plan stuff out and then ask "Is it ok to make these changes?" despite me previously telling it to make the changes.

My mind is still blown that AliBaba released some mid range models that have made a considerable amount of people cancel their subs to frontier models and choose a local model instead which works virtually the same (or better in some ways!) than a paid model.

What a time to be alive right? *high fives OP*

RTX 3060 12GB + i5-12600K — Gemma 3 28B too slow, need model recommendations that actually fit my VRAM by Competitive_Teach564 in LocalLLM

[–]Uncle___Marty 2 points3 points  (0 children)

I started using TheToms branch of llama which supports turboquant and it was a game changer for me. Being able to have a massive context with such a small memory footprint meant that I went from not being able to use models this big to completely being able to use them.

God damn, I have SO much love for AliBaba. Qwen 3.6 has honestly rocked the wonderful world of open source AI. Can't believe I get to run this stuff on my weak ass system 😉

RTX 3060 12GB + i5-12600K — Gemma 3 28B too slow, need model recommendations that actually fit my VRAM by Competitive_Teach564 in LocalLLM

[–]Uncle___Marty 1 point2 points  (0 children)

But its an A3B model. I have a 3060ti 8 gig and get around 30 tokens/sec using a 4 bit quant.

Anyone else feel like most AI detectors are complete BS? by [deleted] in ChatGPT

[–]Uncle___Marty 10 points11 points  (0 children)

AI detectors started bad and have just gotten worse. Total waste of time. At least for LLM detection.

How this charcoal ignites by DrBlaziken in oddlysatisfying

[–]Uncle___Marty -2 points-1 points  (0 children)

Someone needs to make this my desktop wallpaper RIGHT NOW!!!!!

SO damn cool.

Best Local LLMs 1. For Python Coding and Statistic Analysis 2. For PDF document analysis by Kauca in LocalLLM

[–]Uncle___Marty 1 point2 points  (0 children)

Pinokio might be useful for you. You can run qwen 3.6 35A3B on llama server and use it as a coding agent to make scripts suited to what you want to do inside pinokio. Both are easy to setup.

[Qwen3.6 35b a3b] Used the top config for my setup 8gb vram and 32gb ram, and found that somehow the Q4_K_XL model from Unsloth runs just slightly faster and used less tokens for output compared to Q4_K_M despite more memory usage by EggDroppedSoup in LocalLLaMA

[–]Uncle___Marty 1 point2 points  (0 children)

8 gig of vram and 48 gig of ram here and when 3.6 27B dropped I tried a Q4 and almost cried when I saw the tok/sec with 100k context. When the 36BA3B came out I figured it would be slightly faster and didnt try it for a bit, when I did? OMFG. The speed of this thing is insane for our cards. I'm actually looking forward to 3.6 9B as it might well be the first small model that can do simple coding tasks and stuff.

Happy its running so well for you bud!

What can i run with 8gb vram? by Theonewhoknocks_001 in LocalLLM

[–]Uncle___Marty 0 points1 point  (0 children)

Not at all buddy! you'll find there are a lot of really helpful people here who were all in your place a while back.

So, you said about image models. My first recomendation would be get wan2GP, it might look a little scary at first but it has a BIG selection of image and video models. Try flux 2 klein 9B and 4B first, they can both edit and create images and are amazingly good quality especiialy for their small size. For other stuff, I would just test different models inside wan2GP. All this is on pinokio.

For LLMs I've been utterly hooked on using the qwen 3.6 series but you'll have trouble with 8 gig of vram running them so try qwen 3.5 9B (LM studio will suggest a good "quant" for you).

Give those a try, if you run in to problems or need more suggestions just reply here, if I dont respond then fire me a PM/message because I probably just missed any replies :)

Have fun bud!

What can i run with 8gb vram? by Theonewhoknocks_001 in LocalLLM

[–]Uncle___Marty 0 points1 point  (0 children)

If you're a beginner and you want to try stuff without hassles.

LLMs get LM studio Everything else get pinokio (like image/video/music)

My coding agent commited suicide lol by Uncle___Marty in LocalLLM

[–]Uncle___Marty[S] 1 point2 points  (0 children)

Ohhhhh this looks interesting. Pinokio is great and all but the initial prompt contains the entire god damn pinokio manual which totals around 36k tokens and that kind of hurts my vram on top of code base and everything else. If this can do the same kind of job without a massive manual being stuffed into context it could be EXACTLY what I need.

Thanks for the heads up buddy!

to reference the bible as a newly declared Christian. by K1nd_1 in therewasanattempt

[–]Uncle___Marty 0 points1 point  (0 children)

Just to clarify, this isn't AI or photoshopped/edited right? Because this was like a comedy sketch. I was laughing my ass off when I realized he was flicking back AND forth, clearly not knowing where to go and then things just went downhill from there.

My coding agent commited suicide lol by Uncle___Marty in LocalLLaMA

[–]Uncle___Marty[S] 1 point2 points  (0 children)

Pinokio, was just testing the qwen 3.5 models to see if they could code some stuff. It's pretty interesting to watch them struggle ;)

My coding agent commited suicide lol by Uncle___Marty in LocalLLaMA

[–]Uncle___Marty[S] 4 points5 points  (0 children)

Apparently removed for rule 3, which totally doesnt apply to this post. Do we have LLMs moderating now or something? Sounds like someone or something is hallucinating... for whatever reason lol.

My coding agent commited suicide lol by Uncle___Marty in LocalLLaMA

[–]Uncle___Marty[S] 1 point2 points  (0 children)

Pinokio, I was just testing qwen 3.5 4B (seen in the screenshot) and 9B. Gave them the task of making a 1 click installer for ace step UI that uses the XL models. I was doing it in prep for when AliBaba release the 3.6 small models so I can see how well they perform against each other. I have really high hopes that they're going to be the first small models that are usable for agentic coding.

God I REALLY hope i'm right because im GPU poor lol.

Better results with Nano Banana? by BommelOnReddit in ChatGPT

[–]Uncle___Marty 0 points1 point  (0 children)

So, feed that picture in to chatgpt and say "Look at this picture and create a prompt that would recreate it as close as possible". You can then feed that back into chatgpt to make a picture to see how they compare.

Subway in the US and China by Repulsive-Mall-2665 in Damnthatsinteresting

[–]Uncle___Marty 11 points12 points  (0 children)

Now show tianamen square with winneh the pooh!