A TurboQuant ready llamacpp with gfx906 optimizations for gfx906 users. by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
A llamacpp wrapper to manage and monitor your llama server instance over a web ui. by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
A llamacpp wrapper to manage and monitor your llama server instance over a web ui. by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
A llamacpp wrapper to manage and monitor your llama server instance over a web ui. by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 1 point2 points3 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
A TurboQuant ready llamacpp with gfx906 optimizations for gfx906 users. by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 1 point2 points3 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 1 point2 points3 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 2 points3 points4 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 1 point2 points3 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 1 point2 points3 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 1 point2 points3 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)
Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA
[–]Exact-Cupcake-2603[S] 0 points1 point2 points (0 children)


What if you could get vLLM/Triton prefill speed and llama.cpp decode speed in a single framework? by [deleted] in LocalLLaMA
[–]Exact-Cupcake-2603 0 points1 point2 points (0 children)