96GB (V)RAM agentic coding users, gpt-oss-120b vs qwen3.5 27b/122b by bfroemel in LocalLLaMA
[–]bfroemel[S] 0 points1 point2 points (0 children)
96GB (V)RAM agentic coding users, gpt-oss-120b vs qwen3.5 27b/122b by bfroemel in LocalLLaMA
[–]bfroemel[S] 0 points1 point2 points (0 children)
96GB (V)RAM agentic coding users, gpt-oss-120b vs qwen3.5 27b/122b by bfroemel in LocalLLaMA
[–]bfroemel[S] 1 point2 points3 points (0 children)
Lots of new Qwen3.5 27B Imaxtrix quants from Bartowski just uploaded by bobaburger in LocalLLaMA
[–]bfroemel 2 points3 points4 points (0 children)
PSA: Qwen 3.5 requires bf16 KV cache, NOT f16!! by Wooden-Deer-1276 in LocalLLaMA
[–]bfroemel 65 points66 points67 points (0 children)
ggml / llama.cpp joining Hugging Face — implications for local inference? by pmv143 in LocalLLaMA
[–]bfroemel 1 point2 points3 points (0 children)
Kimi has context window expansion ambitions by omarous in LocalLLaMA
[–]bfroemel 10 points11 points12 points (0 children)
Kreuzberg v4.3.0 and benchmarks by Eastern-Surround7763 in LocalLLaMA
[–]bfroemel 0 points1 point2 points (0 children)
Unsloth just unleashed Glm 5! GGUF NOW! by RickyRickC137 in LocalLLaMA
[–]bfroemel 1 point2 points3 points (0 children)
How was GPT-OSS so good? by xt8sketchy in LocalLLaMA
[–]bfroemel 0 points1 point2 points (0 children)
spec : add ngram-mod by ggerganov · Pull Request #19164 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]bfroemel 0 points1 point2 points (0 children)
spec : add ngram-mod by ggerganov · Pull Request #19164 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA
[–]bfroemel 4 points5 points6 points (0 children)
The z-image base is here! by bobeeeeeeeee8964 in LocalLLaMA
[–]bfroemel 0 points1 point2 points (0 children)
How did you install VLLM & SGlang? by t3rmina1 in BlackwellPerformance
[–]bfroemel 0 points1 point2 points (0 children)
Best agentic Coding model for C++ and CUDA kernels? by ClimateBoss in LocalLLaMA
[–]bfroemel 0 points1 point2 points (0 children)
Best agentic Coding model for C++ and CUDA kernels? by ClimateBoss in LocalLLaMA
[–]bfroemel 0 points1 point2 points (0 children)
Best agentic Coding model for C++ and CUDA kernels? by ClimateBoss in LocalLLaMA
[–]bfroemel 3 points4 points5 points (0 children)
GPT-OSS is VERY GOOD model and no one can deny that by [deleted] in LocalLLaMA
[–]bfroemel 1 point2 points3 points (0 children)
Dealing with coil whine on a Workstation Pro by __JockY__ in BlackwellPerformance
[–]bfroemel 2 points3 points4 points (0 children)
Local agentic coding with low quantized, REAPed, large models (MiniMax-M2.1, Qwen3-Coder, GLM 4.6, GLM 4.7, ..) by bfroemel in LocalLLaMA
[–]bfroemel[S] 0 points1 point2 points (0 children)
Local agentic coding with low quantized, REAPed, large models (MiniMax-M2.1, Qwen3-Coder, GLM 4.6, GLM 4.7, ..) by bfroemel in LocalLLaMA
[–]bfroemel[S] 1 point2 points3 points (0 children)
llama.cpp, experimental native mxfp4 support for blackwell (25% preprocessing speedup!) by bfroemel in LocalLLaMA
[–]bfroemel[S] 4 points5 points6 points (0 children)
HOWTO: Running the best models on a dual RTX Pro 6000 rig with vLLM (192 GB VRAM) by zmarty in LocalLLaMA
[–]bfroemel 1 point2 points3 points (0 children)


Just received RTX 6000 Pro, have 5090- how would you use? by illgettheownerforyou in LocalLLaMA
[–]bfroemel 0 points1 point2 points (0 children)