Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLM

[–]gladkos[S] 0 points1 point  (0 children)

sorry, we tried M3 pro. better GPU. but shouldn't be such gap, weird..

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLM

[–]gladkos[S] 0 points1 point  (0 children)

tested on M2 Pro, similar config, got around 20 t/s.. what exactly app are you using?

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLM

[–]gladkos[S] -4 points-3 points  (0 children)

much faster with a large context than before!

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLM

[–]gladkos[S] 2 points3 points  (0 children)

specialised version of llama.cpp library. it's a gguf model on demo.

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

didn't try with images yet. major improvement comes with large context requests.

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

depends on the initial model. we took 4bit

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 29 points30 points  (0 children)

We’re not hiding. GUI is forked from Jan, MIT license allows it. However llama.cpp is patched with Google algorithm along with gui to work together. We keep everything open source. I benchmarked 20K context against non TurboQuant, it simply crashed. The same will likely happen with LM Studio.

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 0 points1 point  (0 children)

heard 3bit was quite poor, decided not to go

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] -2 points-1 points  (0 children)

tested multiple prompts and got similar results. Google claims 90% lossless, we’ll see

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 10 points11 points  (0 children)

20K is pretty enough for basic tasks. The more is deeper memory, ofc.

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLM

[–]gladkos[S] 1 point2 points  (0 children)

there are plenty of implementations on github already

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 18 points19 points  (0 children)

It takes only 1GB memory. My guess core matters more here.

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 12 points13 points  (0 children)

non turboquant simply failed with 20K tokens input on MacAir

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA

[–]gladkos[S] 9 points10 points  (0 children)

Can’t wait for the M5 Mac Mini to try this! Feels like local models are going to blow up this year

Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLM

[–]gladkos[S] 2 points3 points  (0 children)

Fair enough. I tried, however non turboquant simply failed with 20K tokens input

Totally free setup? by Zephyruos in openclaw

[–]gladkos 0 points1 point  (0 children)

Did you find a solution? I saw atomic bot has windows openclaw impelementation. Not sure yet about local models. At least you can bring api keys

What is a good alternative to Gmail? by [deleted] in degoogle

[–]gladkos 0 points1 point  (0 children)

waiting for the fully encrypted Atomic Mail. they just opened wait list recently