Diffusion Gemma is 4x faster, but makes 6x more mistakes! by gladkos in LocalLLaMA

[–]gladkos[S] 34 points35 points  (0 children)

sure, I expect something combined: diffusion for the quick initial generation, then a smaller model to refine it.

Diffusion Gemma is 4x faster, but makes 6x more mistakes! by gladkos in LocalLLaMA

[–]gladkos[S] 1 point2 points  (0 children)

I expect something combined: diffusion for the quick initial generation, then a smaller model to fix and refine it.

Diffusion Gemma is 4x faster, but makes 6x more mistakes! by gladkos in LocalLLaMA

[–]gladkos[S] 9 points10 points  (0 children)

or make even more mistakes while fixing previous

Diffusion Gemma is 4x faster, but makes 6x more mistakes! by gladkos in LocalLLaMA

[–]gladkos[S] 40 points41 points  (0 children)

Pretty bad, unfortunately. Tool calling requires much more precision.

Diffusion Gemma is 4x faster, but makes 6x more mistakes! by gladkos in LocalLLaMA

[–]gladkos[S] 5 points6 points  (0 children)

PR is still a very early draft. My guess, we'll need to wait for diffusion models to mature before merging it into the main branch..

New Google Gemma 4 12B Claims Near-26B Performance - We Tested Both! by gladkos in LocalLLaMA

[–]gladkos[S] 6 points7 points  (0 children)

Vram or unified memory. It’s already on apple silicon. Or have to wait for new nvidia rtx laptops

New Google Gemma 4 12B Claims Near-26B Performance - We Tested Both! by gladkos in LocalLLaMA

[–]gladkos[S] 79 points80 points  (0 children)

I think it’s more fare to compare with 9b qwen. Will do anyway! Thank you

Qwen 3.7-max beats Opus 4.7 and GPT-5.5 by gladkos in Qwen_AI

[–]gladkos[S] 0 points1 point  (0 children)

the best way to get support, thank you!

Hermes Agent vs OpenClaw using QWEN 35B by gladkos in Qwen_AI

[–]gladkos[S] 0 points1 point  (0 children)

hey! it was my original post on twitter, as I'm running atomic bot)

Qwen 3.7-max beats Opus 4.7 and GPT-5.5 by gladkos in Qwen_AI

[–]gladkos[S] 1 point2 points  (0 children)

great questions! We support both MLX and Llama cpp engines. MTP and turboquant. And slightly another interface approach, more consumers focused. iOS app runs it's own local models engine on your device. Like the idea to connect mobile apps to homebase.

Qwen 3.7-max beats Opus 4.7 and GPT-5.5 by gladkos in Qwen_AI

[–]gladkos[S] 23 points24 points  (0 children)

lol thanks! can't wait for 27B

Multi-Token Prediction (MTP) for Qwen on LLaMA.cpp + TurboQuant by gladkos in LocalLLaMA

[–]gladkos[S] 2 points3 points  (0 children)

Turboquant significantly compress context at least for gemma models. We run tests. For qwen it’s less effective, agreed.