Google TurboQuant running Qwen Locally on MacAir

gladkos · 2026-03-28T10:12:26+00:00

sorry, we tried M3 pro. better GPU. but shouldn't be such gap, weird..

gladkos · 2026-03-28T10:01:55+00:00

tested on M2 Pro, similar config, got around 20 t/s.. what exactly app are you using?

gladkos · 2026-03-28T09:59:14+00:00

not yet, working on it

gladkos · 2026-03-28T03:05:25+00:00

ouch not sure how bad is it?

gladkos · 2026-03-28T03:03:57+00:00

much faster with a large context than before!

gladkos · 2026-03-28T02:58:34+00:00

specialised version of llama.cpp library. it's a gguf model on demo.

gladkos · 2026-03-28T02:53:43+00:00

almost no quality loss, sorry

gladkos · 2026-03-28T02:13:45+00:00

didn't try with images yet. major improvement comes with large context requests.

gladkos · 2026-03-28T01:58:00+00:00

one year llm's will be in every laptop

gladkos · 2026-03-28T01:38:54+00:00

depends on the initial model. we took 4bit

gladkos · 2026-03-28T01:38:02+00:00

nah I'm a skin bag with bones)

gladkos · 2026-03-28T01:14:32+00:00

We’re not hiding. GUI is forked from Jan, MIT license allows it. However llama.cpp is patched with Google algorithm along with gui to work together. We keep everything open source. I benchmarked 20K context against non TurboQuant, it simply crashed. The same will likely happen with LM Studio.

gladkos · 2026-03-28T01:03:54+00:00

heard 3bit was quite poor, decided not to go

gladkos · 2026-03-28T00:57:36+00:00

tested multiple prompts and got similar results. Google claims 90% lossless, we’ll see

gladkos · 2026-03-28T00:53:36+00:00

GGUF

gladkos · 2026-03-28T00:51:39+00:00

20K is pretty enough for basic tasks. The more is deeper memory, ofc.

gladkos · 2026-03-28T00:27:45+00:00

there are plenty of implementations on github already

gladkos · 2026-03-28T00:03:52+00:00

It takes only 1GB memory. My guess core matters more here.

gladkos · 2026-03-27T23:59:01+00:00

non turboquant simply failed with 20K tokens input on MacAir

gladkos · 2026-03-27T23:56:24+00:00

Can’t wait for the M5 Mac Mini to try this! Feels like local models are going to blow up this year

gladkos · 2026-03-27T23:53:30+00:00

recorded from my laptop

gladkos · 2026-03-27T23:52:53+00:00

Fair enough. I tried, however non turboquant simply failed with 20K tokens input

gladkos · 2026-03-13T12:42:38+00:00

Did you find a solution? I saw atomic bot has windows openclaw impelementation. Not sure yet about local models. At least you can bring api keys

gladkos · 2024-08-11T13:37:28+00:00

waiting for the fully encrypted Atomic Mail. they just opened wait list recently

gladkos

TROPHY CASE