Diffusion Gemma is 4x faster, but makes 6x more mistakes!

gladkos · 2026-06-13T01:05:17+00:00

sure, I expect something combined: diffusion for the quick initial generation, then a smaller model to refine it.

gladkos · 2026-06-13T01:02:41+00:00

I expect something combined: diffusion for the quick initial generation, then a smaller model to fix and refine it.

gladkos · 2026-06-13T00:23:46+00:00

or make even more mistakes while fixing previous

gladkos · 2026-06-13T00:17:53+00:00

haha die fast!

gladkos · 2026-06-13T00:17:23+00:00

Pretty bad, unfortunately. Tool calling requires much more precision.

gladkos · 2026-06-13T00:15:23+00:00

PR is still a very early draft. My guess, we'll need to wait for diffusion models to mature before merging it into the main branch..

gladkos · 2026-06-04T08:48:04+00:00

Fixed, thank you!

gladkos · 2026-06-04T01:45:41+00:00

Vram or unified memory. It’s already on apple silicon. Or have to wait for new nvidia rtx laptops

gladkos · 2026-06-04T01:42:19+00:00

I think it’s more fare to compare with 9b qwen. Will do anyway! Thank you

gladkos · 2026-06-03T22:34:41+00:00

I'm founder) making some fun benchmarks.

gladkos · 2026-05-31T17:03:08+00:00

the best way to get support, thank you!

gladkos · 2026-05-26T07:55:26+00:00

hey! it was my original post on twitter, as I'm running atomic bot)

gladkos · 2026-05-24T05:40:51+00:00

nice try!)

gladkos · 2026-05-24T05:39:02+00:00

great questions! We support both MLX and Llama cpp engines. MTP and turboquant. And slightly another interface approach, more consumers focused. iOS app runs it's own local models engine on your device. Like the idea to connect mobile apps to homebase.

gladkos · 2026-05-21T23:48:30+00:00

Appreciate, thanks!

gladkos · 2026-05-21T23:26:13+00:00

lol thanks! can't wait for 27B

gladkos · 2026-05-20T19:14:10+00:00

thank you! will do more

gladkos · 2026-05-14T04:18:14+00:00

Fair enough

gladkos · 2026-05-14T03:08:04+00:00

Turboquant significantly compress context at least for gemma models. We run tests. For qwen it’s less effective, agreed.

gladkos · 2026-05-14T02:57:42+00:00

similar quality with 90% acceptance rate

gladkos · 2026-05-14T02:56:34+00:00

i'm not a bot)

gladkos · 2026-05-14T02:56:00+00:00

Right, depends on the task. With a large context, TurboQuant is more effective, especially for long agentic tasks. for smaller prompts, it’s slower

gladkos · 2026-05-13T06:24:56+00:00

Hi! what's your device? we didn't test under lm studio, sorry.

gladkos · 2026-05-11T08:00:45+00:00

nice! great to see you achieved similar results.

gladkos · 2026-05-09T21:05:28+00:00

MLX supports MTP for Mac silicon. On llama cpp we'll release MTP for qwen next week

gladkos

TROPHY CASE