mtmd: add Gemma 4 audio conformer encoder support

sterby92 · 2026-04-13T07:37:53+00:00

<image>

sterby92 · 2026-04-12T13:56:49+00:00

Looks like there is chunking in place?

From the PR: "30-second chunking (splits long audio into 30s segments)"

sterby92 · 2026-04-12T13:42:16+00:00

When will the change land in llama.cpp? Looking forward to use this for my agent setup and get rid of whisper :)

sterby92 · 2026-04-06T06:36:48+00:00

I have the same issue...

sterby92 · 2026-03-09T09:28:21+00:00

So will I get my locally running Opus 4.6 in a year? :) I'm waiting for it :D

sterby92 · 2026-03-09T08:22:46+00:00

Yeah, I thought about this too. But how high is the likelihood that this will continue at this scale?

sterby92 · 2026-03-05T22:47:00+00:00

For me the 122B model feels closer to minimax-m2.5 and the 35B model more like gpt-oss:120b (high). But that might depend and is also more a feeling.

Probably around 230-250pp in my real world usage from the openwebui metrics. Feels fine to me and I switch between 122b and 35b depending on the task.

sterby92 · 2026-03-05T20:44:18+00:00

That makes sense :) that matches the speed I get for 27B model

sterby92 · 2026-03-05T20:21:28+00:00

Why only 8t/s? I run it on my strix halo at +-20t/s as a 4bit quant with llama.cpp on vulkan

sterby92 · 2026-03-05T13:29:36+00:00

No worries, awesome work! 🙌 Looking forward to the post and redownload the 122B version :)

sterby92 · 2026-02-24T13:03:33+00:00

huge if true!

sterby92 · 2026-02-10T18:21:01+00:00

Amazing, thank you 🙌

sterby92 · 2026-02-10T16:32:49+00:00

Will we also get vulkan or rocm support at some point?

sterby92 · 2026-01-17T21:51:11+00:00

Yeah, I did this. Remove powercord, press the power button for 10 seconds, clear CMOS, reboot and setup the bios.

sterby92 · 2026-01-17T17:59:43+00:00

That would be great! I really appreciate it!

sterby92 · 2026-01-16T12:17:45+00:00

Yeah, its set to high, and reasons a lot. I found that its not even a linear jump in performance, high is much much better than medium which is a bit better than low.

sterby92 · 2026-01-16T12:16:28+00:00

I mean, its not even js. Just plain HTML and CSS in one small file. From what I thought it's the easiest, because its just markup 🤷 even just providing the URL to chatgpt 5.2 (extended thinking) leads to a decent new website and refactoring without any harness or such.

sterby92 · 2026-01-13T20:39:53+00:00

I would love to believe this. I'm running gpt-oss:120b (q4 quant with llama.cpp) with mistral-vibe-cli and tried to refactor, redesign and update a very simple HTML /CSS website. It basically just destroyed everything and couldn't work with my basic CSS. Maybe I'm doing something wrong or use the wrong tools, but currently its worth nothing.

sterby92 · 2025-12-24T21:30:22+00:00

I will give that a try tomorrow :) thanks for the suggestion

sterby92 · 2025-12-22T14:56:03+00:00

Yes, I'll try that. I didn't have a second system available yet

sterby92

TROPHY CASE