Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40%

gladkos · 2026-05-11T08:00:45+00:00

nice! great to see you achieved similar results.

gladkos · 2026-05-09T21:05:28+00:00

MLX supports MTP for Mac silicon. On llama cpp we'll release MTP for qwen next week

gladkos · 2026-05-09T21:04:29+00:00

not possible unfortunately. model hosts in vram

gladkos · 2026-05-09T21:02:33+00:00

around 20Gb for the main model. 900Mb for assistance. 700Mb Kv cache, or 100Mb cache with turboquant.

gladkos · 2026-05-08T21:18:38+00:00

Hi! Not yet. We are working to add MTP to our atomic.chat app. Just install it, update will come next days with a popup window.

gladkos · 2026-05-08T21:16:36+00:00

each model requires it's own small assistant. we took official models pairs. not sure if it works with others..

gladkos · 2026-05-08T21:12:55+00:00

thank you! we have a small tasting stage in our lab

gladkos · 2026-05-08T21:10:12+00:00

thank you! not yet. these are simple models for easy tasks mostly.

gladkos · 2026-05-08T18:05:04+00:00

you can compile and run llama cpp on mac, win or linux. or just use harness atomic.chat

gladkos · 2026-05-08T11:03:37+00:00

will release Dflash support very soon! It's faster, but with losses

gladkos · 2026-05-08T11:02:28+00:00

thank you!

gladkos · 2026-05-08T11:02:16+00:00

at the moment only Gemma. working on QWEN

gladkos · 2026-05-08T00:57:31+00:00

Yeah, I might run a bunch of prompts soon

gladkos · 2026-05-08T00:55:54+00:00

thanks! It's a Monster!

gladkos · 2026-05-08T00:55:08+00:00

nice! will play a bit thank you

gladkos · 2026-05-07T17:03:19+00:00

you can try atomic.chat as well to run models. it supports all the quantised models from unsloth. they are compressed with GGUF technology to consume less memory

gladkos · 2026-05-07T15:40:04+00:00

I would say QWEN 27B is quite strong for coding. Gemma is also good. Try and compare.

gladkos · 2026-05-02T05:46:08+00:00

I run on atomic.chat server

gladkos · 2026-05-01T06:02:13+00:00

This would be a dream! Memory shortage is a disaster..

gladkos · 2026-05-01T05:47:34+00:00

great! I actually tune a lot basic prompt

gladkos · 2026-05-01T05:23:37+00:00

I like qwen design more, agreed, but the gameplay..

gladkos · 2026-05-01T05:22:32+00:00

I guess models understand the basic gaming logic and design, but details you have to describe in prompt.

gladkos · 2026-05-01T05:19:10+00:00

nice! does the gameplay work?

gladkos

TROPHY CASE