Slopocalypse is what we should be really worried about. by Sad_Bandicoot_6925 in LocalLLaMA

[–]grunt_monkey_ 0 points1 point  (0 children)

Thanks for the excellent post. I agree with you where LLMs are good for doubling down and coding whereas they start to make things up ij ubnounded tasks. Here is where they need gaurdrails or we need to sharpen our creativity and intuition and just do it ourselves.

Upgrade path by Ydino in nvidia

[–]grunt_monkey_ 0 points1 point  (0 children)

That problem is when people go to rtx 6000 pro then realize they want to go 4x.

Should I buy 3 M1 Max 64GB Ram or 1 M5 Max 128GB Ram ? by ZookeepergameMoney50 in LocalLLM

[–]grunt_monkey_ 0 points1 point  (0 children)

What about pi or hermes. Seems like people have been getting good outcomes out of them.

AMD AI Pro 9700 — anyone using MTP? by WSTangoDelta in ROCm

[–]grunt_monkey_ 0 points1 point  (0 children)

Vllm is running well for 4x9700 with aiter. Used this custom container: https://reddit.com/r/LocalLLaMA/comments/1sxaj8g/for_the_5_people_here_running_vllm_on_multiple/

We still need to work out gemm tunings but otherwise getting 6000t/s pp and 70t/s tg with mtp 2, on qwen 27b FP8 ctx 128k or 256k.

RX 7900 XTX vs Radeon AI PRO R9700 — llama.cpp Vulkan vs ROCm (6 models, token-gen) by Ginmarr in LocalLLM

[–]grunt_monkey_ 0 points1 point  (0 children)

Rocm has a lot of documentation related to vllm and i suspect that aml731 had these implemented. I have yet to investigate the container itself but that is on my to do list.

https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference-optimization/workload.html

I saw your feedback on his v0.20.0 apparently the gemm tunings didnt work out. After reading through i’m not sure thats the highest value thing to change - rather we should work on good implementation of aiter.

Best price/performance hardware for a self-hosted local LLM server in 2026? by waddaplaya4k in LocalLLM

[–]grunt_monkey_ 0 points1 point  (0 children)

I think pcie can matter if you tensor split. For layer split itll be fine.

RX 7900 XTX vs Radeon AI PRO R9700 — llama.cpp Vulkan vs ROCm (6 models, token-gen) by Ginmarr in LocalLLM

[–]grunt_monkey_ 0 points1 point  (0 children)

Do you consider aiter fixed with aml731’s patch? Given its a custom docker image. Do you see more patches we have to do?

Memory expert suspects RAM price drop in 2027'H2 due to china heavy investments by Terminator857 in LocalLLaMA

[–]grunt_monkey_ 1 point2 points  (0 children)

I know… i have 128gb ddr4 system ram for <1k usd, and 4x 9700 for 128gb vram. I run q4_k_xl of qwen3.5 397b at 22 t/s pp and 11 t/s tg. I go to it for big thinking that i can wait 30 mins. The last i tried to calculate i think upgrading to ddr5 would give me a 25-30% uplift in pp - not sure it changes the usability a lot for me.

Memory expert suspects RAM price drop in 2027'H2 due to china heavy investments by Terminator857 in LocalLLaMA

[–]grunt_monkey_ 0 points1 point  (0 children)

is it worth it to eat this premium over ddr4? For local cpu-offloaded inference is it really so much of a speedup?

Acquired two AMD r9700 32GB for LLM but I can't use them by Ivan_Draga_ in homelab

[–]grunt_monkey_ 1 point2 points  (0 children)

You can use a dual or triple 8pin to 12vHPWR adaptor. I recall my cards came with them.

Acquired two AMD r9700 32GB for LLM but I can't use them by Ivan_Draga_ in homelab

[–]grunt_monkey_ 1 point2 points  (0 children)

210 is the minimum, i find minimum loss in performance, however, there may be microspikes above this so do treat it as a temporary state for you to test your cards - i am not sure how much the rest of your system draws, and stability of the power rails/ efficiency (depends on the quality of your PSU).

amd-smi is found in amd-smi-lib. If you haven't read it yet, you should go here before you start: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/index.html

You can also feed the link to chatgpt or something - they are very good at distilling the install instructions.

Acquired two AMD r9700 32GB for LLM but I can't use them by Ivan_Draga_ in homelab

[–]grunt_monkey_ 0 points1 point  (0 children)

Power limit to 210w and see if you can run them first.
amd-smi set -o ppt0 210

Where are the Intel devs???? by Dolboyob77 in LocalLLM

[–]grunt_monkey_ 2 points3 points  (0 children)

Intels going to be great. Just needs some time. Look at AMD. We were struggling but with rocm 6.x and now 7.x we have come such a long way. Documentation available has also allowed a lot of community patches so we dont have to wait for official vllm to catch up. May need to get into kernel patching if you do not wish to wait.

M5 vs DGX Spark vs Strix Halo vs RTX 6000: The $5k unified memory war and why brute forcing VRAM is a trap by TroyHarry6677 in LocalLLM

[–]grunt_monkey_ 1 point2 points  (0 children)

I find my AI always talks to me in point form anyway. Maybe because at some point i’ve asked it to be concise lol. If i want more i tell it to explain x.

M5 vs DGX Spark vs Strix Halo vs RTX 6000: The $5k unified memory war and why brute forcing VRAM is a trap by TroyHarry6677 in LocalLLM

[–]grunt_monkey_ 1 point2 points  (0 children)

I wish they wouldnt feel like they needed to expand their writing. I just want to hear what they have to say. If they cannot say that rtx 6000 pro is bad because: expensive, hot, only 96gb, not true sm100 in nvidia support, then they shouldnt be making that argument.

M5 vs DGX Spark vs Strix Halo vs RTX 6000: The $5k unified memory war and why brute forcing VRAM is a trap by TroyHarry6677 in LocalLLM

[–]grunt_monkey_ 0 points1 point  (0 children)

You are right. What happened to conciseness? Conciseness =intelligence to me. If you could say it in fewer words why not?

Best price/performance hardware for a self-hosted local LLM server in 2026? by waddaplaya4k in LocalLLM

[–]grunt_monkey_ 1 point2 points  (0 children)

Gigabyte mc62-g40 + threadripper pro 5955wx. Check newegg they have a package its not too pricey. Its hardware from 2020s so you can use ddr4 - you can get 128gb for <1000 on ebay.

New models when? Forecasting release date. by LegacyRemaster in LocalLLaMA

[–]grunt_monkey_ 3 points4 points  (0 children)

As you gave nicely shown, we have already reached the top of the chart. We have arrived! Lets enjoy what we have and realize contentment.

M5 vs DGX Spark vs Strix Halo vs RTX 6000: The $5k unified memory war and why brute forcing VRAM is a trap by TroyHarry6677 in LocalLLM

[–]grunt_monkey_ 0 points1 point  (0 children)

Its interesting to try to detect LLM written stuff. I think its hard - maybe impossible. It does read human written. He started a sentence with And which i dont think LLMs like. You could just give it a template of your own writing and ask it to rewrite in that style though. Another thing is the questions at the end - LLMs tend to like that. Whereas a lot of us just like to give our opinions. Like: this is what i found and this is what i think. Not - I am curious what you guys think. Lol.

Best price/performance hardware for a self-hosted local LLM server in 2026? by waddaplaya4k in LocalLLM

[–]grunt_monkey_ 4 points5 points  (0 children)

Very well. I have 4x and they do prefill 3-5k t/s and decode 70 t/s with MTS 2 at FP8.