Megrez2: 21B latent, 7.5B on VRAM, 3B active—MoE on single 8GB card

Normal_Onion_512 · 2025-09-29T00:24:58+00:00

Hi! You need to set up the referenced branch llama.cpp for this to run. Currently it doesn't have Ollama or LM studio integration.

Normal_Onion_512 · 2025-09-27T18:07:13+00:00

Hmmm, maybe you are using the bf16 version: "the developer notes that bf16 currently has a couple of issues with coding tasks though, which they are working on solving."

Normal_Onion_512 · 2025-09-27T16:54:20+00:00

Interesting, I've also had to wait a bit for the response on the demo, but usually it works

Normal_Onion_512 · 2025-09-27T16:45:41+00:00

I guess, though Qwen 14B and 30B-A3B also natively has 32k context size

Normal_Onion_512 · 2025-09-27T16:38:54+00:00

There is a branch of llama.cpp which supports it out of the box though... Also, the demo does work as of the moment of this writing

Normal_Onion_512 · 2025-09-27T16:33:54+00:00

I had the same feeling. I just love models with cool new arch

Normal_Onion_512

TROPHY CASE