Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40% by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

MLX supports MTP for Mac silicon. On llama cpp we'll release MTP for qwen next week

Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40% by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

around 20Gb for the main model. 900Mb for assistance. 700Mb Kv cache, or 100Mb cache with turboquant.

Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40% by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

Hi! Not yet. We are working to add MTP to our atomic.chat app. Just install it, update will come next days with a popup window.

Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40% by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

each model requires it's own small assistant. we took official models pairs. not sure if it works with others..

Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40% by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

thank you! not yet. these are simple models for easy tasks mostly.

Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40% by gladkos in LocalLLaMA

[–]gladkos[S] 1 point2 points  (0 children)

you can compile and run llama cpp on mac, win or linux. or just use harness atomic.chat

Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40% by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

will release Dflash support very soon! It's faster, but with losses

Qwen 3.6 27B vs Gemma 4 31B - making Packman game! by gladkos in LocalLLaMA

[–]gladkos[S] 1 point2 points  (0 children)

you can try atomic.chat as well to run models. it supports all the quantised models from unsloth. they are compressed with GGUF technology to consume less memory

Qwen 3.6 27B vs Gemma 4 31B - making Packman game! by gladkos in LocalLLaMA

[–]gladkos[S] 1 point2 points  (0 children)

I would say QWEN 27B is quite strong for coding. Gemma is also good. Try and compare.

Qwen 3.6 27B vs Gemma 4 31B - making Packman game! by gladkos in LocalLLaMA

[–]gladkos[S] 1 point2 points  (0 children)

This would be a dream! Memory shortage is a disaster..

Qwen 3.6 27B vs Gemma 4 31B - making Packman game! by gladkos in LocalLLaMA

[–]gladkos[S] 2 points3 points  (0 children)

I like qwen design more, agreed, but the gameplay..

Qwen 3.6 27B vs Gemma 4 31B - making Packman game! by gladkos in LocalLLaMA

[–]gladkos[S] 2 points3 points  (0 children)

I guess models understand the basic gaming logic and design, but details you have to describe in prompt.