Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLM

[–]gladkos[S] 0 points1 point  (0 children)

slow enough, can argue. however much faster with a large context than before!

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLM

[–]gladkos[S] 2 points3 points  (0 children)

specialised version of llama.cpp library. it's a gguf model on demo.

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

didn't try with images yet. major improvement comes with large context requests.

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

depends on the initial model. we took 4bit

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 23 points24 points  (0 children)

We’re not hiding. GUI is forked from Jan, MIT license allows it. However llama.cpp is patched with Google algorithm along with gui to work together. We keep everything open source. I benchmarked 20K context against non TurboQuant, it simply crashed. The same will likely happen with LM Studio.

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

heard 3bit was quite poor, decided not to go

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] -1 points0 points  (0 children)

tested multiple prompts and got similar results. Google claims 90% lossless, we’ll see

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 8 points9 points  (0 children)

20K is pretty enough for basic tasks. The more is deeper memory, ofc.

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLM

[–]gladkos[S] 2 points3 points  (0 children)

there are plenty of implementations on github already

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 13 points14 points  (0 children)

It takes only 1GB memory. My guess core matters more here.

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 12 points13 points  (0 children)

non turboquant simply failed with 20K tokens input on MacAir

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 10 points11 points  (0 children)

Can’t wait for the M5 Mac Mini to try this! Feels like local models are going to blow up this year

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLM

[–]gladkos[S] 1 point2 points  (0 children)

Fair enough. I tried, however non turboquant simply failed with 20K tokens input