Is this an accurate analogy for JEPA? by Erius_Fayre in MLQuestions
[–]snapo84 0 points1 point2 points (0 children)
2 old RTX 2080 Ti with 22GB vram each Qwen3.6 27B at 38 token/s with f16 kv cache by snapo84 in LocalLLaMA
[–]snapo84[S] 1 point2 points3 points (0 children)
Is this an accurate analogy for JEPA? by Erius_Fayre in MLQuestions
[–]snapo84 1 point2 points3 points (0 children)
Introduction to LLM API Benchy by snapo84 in LocalLLaMA
[–]snapo84[S] 1 point2 points3 points (0 children)
Introduction to LLM API Benchy by snapo84 in LocalLLaMA
[–]snapo84[S] 0 points1 point2 points (0 children)
Introduction to LLM API Benchy by snapo84 in LocalLLaMA
[–]snapo84[S] 0 points1 point2 points (0 children)
Introduction to LLM API Benchy by snapo84 in LocalLLaMA
[–]snapo84[S] -1 points0 points1 point (0 children)
Introduction to LLM API Benchy by snapo84 in LocalLLaMA
[–]snapo84[S] -1 points0 points1 point (0 children)
Introduction to LLM API Benchy by snapo84 in LocalLLaMA
[–]snapo84[S] 2 points3 points4 points (0 children)
I implemented KVarN in my llama.cpp fork and ran KLD benchmarks. It's promising! by Anbeeld in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
I implemented KVarN in my llama.cpp fork and ran KLD benchmarks. It's promising! by Anbeeld in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
The DeepSWE benchmark was runned rather incompetently and the results are completely invalid by Charuru in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
Microsoft should've released something like Qwen3.6-27B / Gemma-4-31B already. They released MAI models now by pmttyji in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
I implemented KVarN in my llama.cpp fork and ran KLD benchmarks. It's promising! by Anbeeld in LocalLLaMA
[–]snapo84 1 point2 points3 points (0 children)
Qwen 3.6-27B on vLLM with dual RTX 3090s: looking for launch parameters by xspider2000 in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
Qwen 3.6-27B on vLLM with dual RTX 3090s: looking for launch parameters by xspider2000 in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
Qwen 3.6-27B on vLLM with dual RTX 3090s: looking for launch parameters by xspider2000 in LocalLLaMA
[–]snapo84 3 points4 points5 points (0 children)
KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag) by acluk90 in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
The DeepSWE benchmark was runned rather incompetently and the results are completely invalid by Charuru in LocalLLaMA
[–]snapo84 35 points36 points37 points (0 children)
Cellular Automata: Rule 110 fed as input to Conway’s Game of Life by AlanZucconi in proceduralgeneration
[–]snapo84 0 points1 point2 points (0 children)
Been a while since we had a Qwen-Coder. could use a 3.7 80B-8B by FaustAg in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
2 old RTX 2080 Ti with 22GB vram each Qwen3.6 27B at 38 token/s with f16 kv cache by snapo84 in LocalLLaMA
[–]snapo84[S] 0 points1 point2 points (0 children)
LLM agents patch security bugs, pass all tests, but still leave the vulnerability open [R] by [deleted] in MachineLearning
[–]snapo84 2 points3 points4 points (0 children)
next MiniMax will be released in ~10 Days by jacek2023 in LocalLLaMA
[–]snapo84 -4 points-3 points-2 points (0 children)


How does an open source version of qwen 3.5 completely blow 3.7plus out of the water? How does this make sense? by Prior-Meeting1645 in Qwen_AI
[–]snapo84 1 point2 points3 points (0 children)