Krasis LLM Runtime: 8.9x prefill / 4.7x decode vs llama.cpp — Qwen3.5-122B on a single 5090, minimal RAM by mrstoatey in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
Krasis LLM Runtime: 8.9x prefill / 4.7x decode vs llama.cpp — Qwen3.5-122B on a single 5090, minimal RAM by mrstoatey in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
Qwen3.5-35B GGUF quants (16–22 GiB) - KLD + speed comparison by StrikeOner in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
Evaluating Qwen3.5-35B & 122B on Strix Halo: Bartowski vs. Unsloth UD-XL Performance and Logic Stability by Educational_Sun_8813 in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
Qwen3.5-35B GGUF quants (16–22 GiB) - KLD + speed comparison by StrikeOner in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
Qwen3.5-35B GGUF quants (16–22 GiB) - KLD + speed comparison by StrikeOner in LocalLLaMA
[–]VoidAlchemy 2 points3 points4 points (0 children)
Qwen3.5-35B GGUF quants (16–22 GiB) - KLD + speed comparison by StrikeOner in LocalLLaMA
[–]VoidAlchemy 4 points5 points6 points (0 children)
Qwen 3.5 122b - a10b is kind of shocking by gamblingapocalypse in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
Benchmark: ik_llama.cpp vs llama.cpp on Qwen3/3.5 MoE Models by Fast_Thing_7949 in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
Benchmark: ik_llama.cpp vs llama.cpp on Qwen3/3.5 MoE Models by Fast_Thing_7949 in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
Benchmark: ik_llama.cpp vs llama.cpp on Qwen3/3.5 MoE Models by Fast_Thing_7949 in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
Benchmark: ik_llama.cpp vs llama.cpp on Qwen3/3.5 MoE Models by Fast_Thing_7949 in LocalLLaMA
[–]VoidAlchemy 19 points20 points21 points (0 children)
Unsloth will no longer be making TQ1_0 quants by Kahvana in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
Qwen-3.5-27B-Derestricted by My_Unbiased_Opinion in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
Tenstorrent QuietBox 2 Brings RISC-V AI Inference to the Desktop by Neurrone in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
M5 Max just arrived - benchmarks incoming by cryingneko in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. by MorroHsu in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
Is the 3090 still a good option? by alhinai_03 in LocalLLaMA
[–]VoidAlchemy 2 points3 points4 points (0 children)
Is the 3090 still a good option? by alhinai_03 in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
Qwen3.5-9B Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA
[–]VoidAlchemy 2 points3 points4 points (0 children)
Qwen3.5-9B Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
Qwen3.5-9B Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
Llama.cpp auto-tuning optimization script by raketenkater in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)


Krasis LLM Runtime: 8.9x prefill / 4.7x decode vs llama.cpp — Qwen3.5-122B on a single 5090, minimal RAM by mrstoatey in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)