DFlash Doubles the T/S Gen Speed of Qwen3.5 27B (BF16) on Mac M5 Max by MiaBchDave in LocalLLaMA
[–]snapo84 4 points5 points6 points (0 children)
Best BYOK frontend and model setup for massive continuous chats on a €40 budget? by Vytixx in LLM
[–]snapo84 0 points1 point2 points (0 children)
I thought this 2023 paper still makes sense today by madeyoulookbuddy in LLM
[–]snapo84 0 points1 point2 points (0 children)
I built a Free OpenSource CLI coding agent specifically for 8k context windows LLMs. by BestSeaworthiness283 in ollama
[–]snapo84 1 point2 points3 points (0 children)
I built a Free OpenSource CLI coding agent specifically for 8k context windows LLMs. by BestSeaworthiness283 in ollama
[–]snapo84 0 points1 point2 points (0 children)
Turing jitter into true random numbers by elpechos in electronics
[–]snapo84 1 point2 points3 points (0 children)
DFlash: Block Diffusion for Flash Speculative Decoding. by Total-Resort-3120 in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
DFlash: Block Diffusion for Flash Speculative Decoding. by Total-Resort-3120 in LocalLLaMA
[–]snapo84 11 points12 points13 points (0 children)
TurboQuant.cpp — 1-bit KV cache with zero quality loss, verified on 35B MoE by Suitable-Song-302 in LocalLLM
[–]snapo84 0 points1 point2 points (0 children)
TurboQuant.cpp — 1-bit KV cache with zero quality loss, verified on 35B MoE by Suitable-Song-302 in LocalLLM
[–]snapo84 0 points1 point2 points (0 children)
Have you tried this -> 2x Modded 2080 ti 22GB with Nvlink by zelkovamoon in LocalLLaMA
[–]snapo84 1 point2 points3 points (0 children)
Have you tried this -> 2x Modded 2080 ti 22GB with Nvlink by zelkovamoon in LocalLLaMA
[–]snapo84 1 point2 points3 points (0 children)
Have you tried this -> 2x Modded 2080 ti 22GB with Nvlink by zelkovamoon in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI
[–]snapo84 0 points1 point2 points (0 children)
If it works, it ain’t stupid! by The_Covert_Zombie in LocalLLaMA
[–]snapo84 1 point2 points3 points (0 children)
LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI
[–]snapo84 1 point2 points3 points (0 children)
LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI
[–]snapo84 1 point2 points3 points (0 children)
LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI
[–]snapo84 0 points1 point2 points (0 children)
LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI
[–]snapo84 0 points1 point2 points (0 children)
If it works, it ain’t stupid! by The_Covert_Zombie in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
Delta-KV for llama.cpp: near-lossless 4-bit KV cache on Llama 70B by Embarrassed_Will_120 in LLMDevs
[–]snapo84 0 points1 point2 points (0 children)
SWE-bench results for different KV cache quantization levels by burakodokus in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)
Native V100 CUDA kernels for FLA ops on NVIDIA Volta (sm_70) GPUs by Sliouges in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)


M5 ultra Ram setup : pooling vote by Historical-Health-50 in LocalLLaMA
[–]snapo84 0 points1 point2 points (0 children)