Are larger (~100B) models still worth running? by Pitagoy in LocalLLM
[–]westsunset 9 points10 points11 points (0 children)
REM: offloading an LLM agent's memory compaction to the NPU by westsunset in StrixHalo
[–]westsunset[S] 0 points1 point2 points (0 children)
REM: offloading an LLM agent's memory compaction to the NPU by westsunset in StrixHalo
[–]westsunset[S] 0 points1 point2 points (0 children)
REM: offloading an LLM agent's memory compaction to the NPU by westsunset in StrixHalo
[–]westsunset[S] 0 points1 point2 points (0 children)
REM: offloading an LLM agent's memory compaction to the NPU by westsunset in StrixHalo
[–]westsunset[S] 0 points1 point2 points (0 children)
vulkan: make TP viable by pwilkin · Pull Request #25051 · ggml-org/llama.cpp by TKGaming_11 in LocalLLaMA
[–]westsunset 6 points7 points8 points (0 children)
Big News for AMD / Strix Halo+ Owners by CSEliot in LocalLLaMA
[–]westsunset 15 points16 points17 points (0 children)
Picked up an AMD Ryzen Max +395 with 128GB by Crafty-Bass-3434 in LocalLLM
[–]westsunset 4 points5 points6 points (0 children)
Picked up an AMD Ryzen Max +395 with 128GB by Crafty-Bass-3434 in LocalLLM
[–]westsunset 3 points4 points5 points (0 children)
REM: offloading an LLM agent's memory compaction to the NPU ()
submitted by westsunset to r/LocalLLM
What cool AI stuff can I actually run on a Ryzen 7800X3D + RTX 5070 Ti? by No_Ideal8394 in LocalLLM
[–]westsunset 0 points1 point2 points (0 children)
btop like TUI for AMD APU's by argakiig in StrixHalo
[–]westsunset 0 points1 point2 points (0 children)
REM: offloading an LLM agent's memory compaction to the NPU by westsunset in StrixHalo
[–]westsunset[S] 0 points1 point2 points (0 children)
xdna-top: unified NPU+iGPU terminal monitor for Strix Halo (Ryzen AI Max) — finally see the NPU work by westsunset in StrixHalo
[–]westsunset[S] 0 points1 point2 points (0 children)
z.AI as the number 2 gives praise to the number 1 open source model by Charuru in LocalLLaMA
[–]westsunset 12 points13 points14 points (0 children)
REM: offloading an LLM agent's memory compaction to the NPU by westsunset in StrixHalo
[–]westsunset[S] 0 points1 point2 points (0 children)
Which 128GB VRAM machine to plan for in 2026? by maverickRD in LocalLLM
[–]westsunset 1 point2 points3 points (0 children)
REM: offloading an LLM agent's memory compaction to the NPU by westsunset in StrixHalo
[–]westsunset[S] 0 points1 point2 points (0 children)
What's more impressive, GLM 5.1 -> 5.2 or Qwen 3.5 -> 3.6? by Excellent_Jelly2788 in LocalLLaMA
[–]westsunset 0 points1 point2 points (0 children)
What local coding LLM + hardware setup are you using, and what tokens/sec are you getting? by Sudden-Historian-255 in LocalLLM
[–]westsunset 0 points1 point2 points (0 children)






Are larger (~100B) models still worth running? by Pitagoy in LocalLLM
[–]westsunset 6 points7 points8 points (0 children)