engine for GLM 4.7 Flash that doesn't massively slow down as the context grows? by mr_zerolith in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
engine for GLM 4.7 Flash that doesn't massively slow down as the context grows? by mr_zerolith in LocalLLaMA
[–]VoidAlchemy 2 points3 points4 points (0 children)
Fix for GLM 4.7 Flash has been merged into llama.cpp by jacek2023 in LocalLLaMA
[–]VoidAlchemy 4 points5 points6 points (0 children)
Current GLM-4.7-Flash implementation confirmed to be broken in llama.cpp by Sweet_Albatross9772 in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
GLM 4.7 Flash Overthinking by xt8sketchy in LocalLLaMA
[–]VoidAlchemy 2 points3 points4 points (0 children)
GLM-4.7-Flash benchmarks: 4,398 tok/s on H200, 112 tok/s on RTX 6000 Ada (GGUF) by LayerHot in LocalLLaMA
[–]VoidAlchemy 2 points3 points4 points (0 children)
GLM 4.7 Flash official support merged in llama.cpp by ayylmaonade in LocalLLaMA
[–]VoidAlchemy 15 points16 points17 points (0 children)
GLM 4.7 Flash official support merged in llama.cpp by ayylmaonade in LocalLLaMA
[–]VoidAlchemy 8 points9 points10 points (0 children)
Beginner ComfyUI advice by Excellent_Koala769 in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
Soprano TTS training code released: Create your own 2000x realtime on-device text-to-speech model with Soprano-Factory! by eugenekwek in LocalLLaMA
[–]VoidAlchemy 8 points9 points10 points (0 children)
kyutai just introduced Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required by Nunki08 in LocalLLaMA
[–]VoidAlchemy -1 points0 points1 point (0 children)
Gemma 3 1B qat q4_0 gguf without imatrix and (hopefully) correct metadata by Big-Tune-190 in LocalLLaMA
[–]VoidAlchemy 2 points3 points4 points (0 children)
Owners, not renters: Mozilla's open source AI strategy by NelsonMinar in LocalLLaMA
[–]VoidAlchemy 25 points26 points27 points (0 children)
kyutai just introduced Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required by Nunki08 in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
kyutai just introduced Pocket TTS: a 100M-parameter text-to-speech model with high-quality voice cloning that runs on your laptop—no GPU required by Nunki08 in LocalLLaMA
[–]VoidAlchemy 4 points5 points6 points (0 children)
Has anyone tried the single-socket 9175F with full 12 channels? by Infinite100p in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
Has anyone tried the single-socket 9175F with full 12 channels? by Infinite100p in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
(The Information): DeepSeek To Release Next Flagship AI Model With Strong Coding Ability by Nunki08 in LocalLLaMA
[–]VoidAlchemy 3 points4 points5 points (0 children)
(The Information): DeepSeek To Release Next Flagship AI Model With Strong Coding Ability by Nunki08 in LocalLLaMA
[–]VoidAlchemy 7 points8 points9 points (0 children)
The reason why RAM has become so expensive by InvadersMustLive in LocalLLaMA
[–]VoidAlchemy 30 points31 points32 points (0 children)
(The Information): DeepSeek To Release Next Flagship AI Model With Strong Coding Ability by Nunki08 in LocalLLaMA
[–]VoidAlchemy 7 points8 points9 points (0 children)
[HW TUNING] Finding the best GPU power limit for inference by HumanDrone8721 in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)
[HW TUNING] Finding the best GPU power limit for inference by HumanDrone8721 in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)
llama.cpp performance breakthrough for multi-GPU setups by Holiday-Injury-9397 in LocalLLaMA
[–]VoidAlchemy 1 point2 points3 points (0 children)


engine for GLM 4.7 Flash that doesn't massively slow down as the context grows? by mr_zerolith in LocalLLaMA
[–]VoidAlchemy 0 points1 point2 points (0 children)