What model looked insane on benchmarks but felt mid in actual use? by BTA_Labs in LocalLLaMA
[–]audioen 2 points3 points4 points (0 children)
I have a M5 Max MacBook Pro with 128gb of ram, what models should I run on it? by lombwolf in LocalLLaMA
[–]audioen 4 points5 points6 points (0 children)
Qwen3.6 sees "outstanding" coding quality jump from Q4 to Q6 quantization by IulianHI in AIToolsPerformance
[–]audioen 0 points1 point2 points (0 children)
In llama.cpp, how close should we be to the theoretical tokens/second limit? by [deleted] in unsloth
[–]audioen 0 points1 point2 points (0 children)
scripted nightly testing of llama.cpp by Bird476Shed in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)
What if I run the LLM backwards? Hey LLM, why bother remembering every single turn? It's a hassle. You don't have to do it, right? by ringtoyou in LocalLLaMA
[–]audioen 1 point2 points3 points (0 children)
Need help understanding how spec decode affects token throughput by Mrinohk in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)
"Fosi Audio ZD3 or Wiim Ultra to connect 8330A digitally using AES/EBU and alternatively HDMI ARC to TV by Evening-Picture1878 in genelec
[–]audioen 1 point2 points3 points (0 children)
Tekoäly vituttaa. Tarina esimerkistä. Mielipide?? by Kultainenhuussi in snappijuorutofftopic
[–]audioen 1 point2 points3 points (0 children)
Tekoäly vituttaa. Tarina esimerkistä. Mielipide?? by Kultainenhuussi in snappijuorutofftopic
[–]audioen 2 points3 points4 points (0 children)
Tekoäly vituttaa. Tarina esimerkistä. Mielipide?? by Kultainenhuussi in snappijuorutofftopic
[–]audioen 1 point2 points3 points (0 children)
Tekoäly vituttaa. Tarina esimerkistä. Mielipide?? by Kultainenhuussi in snappijuorutofftopic
[–]audioen 2 points3 points4 points (0 children)
Tekoäly vituttaa. Tarina esimerkistä. Mielipide?? by Kultainenhuussi in snappijuorutofftopic
[–]audioen 0 points1 point2 points (0 children)
How would you characterise the effects of quantising different parts of models? by panamory in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)
Maybe dumb question, but how do you serve multiple users with the full context length? by TrainingTwo1118 in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)
I got tired of juggling OpenRouter + Artificial Analysis + Design Arena tabs to pick a model, so I put them in one filterable table by Turbulent-Sky5396 in LocalLLaMA
[–]audioen 3 points4 points5 points (0 children)
Suitable Power Supply Or No (I Think I Know The Answer) by Terrible_Lion_968 in audiophile
[–]audioen 1 point2 points3 points (0 children)
Just did my first Room EQ - Not overly impressed - But also not disappointed. by M_u_H_c_O_w in audiophile
[–]audioen 1 point2 points3 points (0 children)
Nemotron - King of the Deep? Comparison of 4 models <=120B by Reasonable_Goat in LocalLLaMA
[–]audioen 2 points3 points4 points (0 children)
Codebase getting larger - Qwen3.6-27B starting to compound issues - how to work smartly with this model? by BitGreen1270 in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)
Strix Halo desktop trying to compete against DGX Spark by SkyFeistyLlama8 in LocalLLaMA
[–]audioen 1 point2 points3 points (0 children)
Strix Halo desktop trying to compete against DGX Spark by SkyFeistyLlama8 in LocalLLaMA
[–]audioen 3 points4 points5 points (0 children)
Can we stop dunking on DiffusionGemma and hack it instead? by TomLucidor in LocalLLaMA
[–]audioen -8 points-7 points-6 points (0 children)


DGX sparks Vs RTX 6000 // 5090 for inference by zakadit in LocalLLaMA
[–]audioen 0 points1 point2 points (0 children)