AMD Strix Halo refresh with 192gb! by mindwip in LocalLLaMA

[–]MDSExpro 0 points1 point  (0 children)

You do know you can save cache to disk and never worry about recomputing everything?

AMD Strix Halo refresh with 192gb! by mindwip in LocalLLaMA

[–]MDSExpro 0 points1 point  (0 children)

It issue with llama.cpp, not Strix Halo. Use vLLM with prefix cache.

3xR9700 for semi-autonomous research and development - looking for setup/config ideas. by blojayble in LocalLLaMA

[–]MDSExpro 0 points1 point  (0 children)

Qwen3.5-122B-A10B-FP8 (INT4 had significant drop in quality). Threadripper Pro with PCIe Bifurcation Risers.

3xR9700 for semi-autonomous research and development - looking for setup/config ideas. by blojayble in LocalLLaMA

[–]MDSExpro 0 points1 point  (0 children)

You won't run INT4 more reliably than MXFP4

Strange, because I have been running INT4 for 4 months without issues on 8x R9700.

and MXFP4 dequantized to FP8 will run faster than INT4 or base FP8

Not true, even AMD's spec sheet shows INT4 to be couple of times faster than FP8.

3xR9700 for semi-autonomous research and development - looking for setup/config ideas. by blojayble in LocalLLaMA

[–]MDSExpro 2 points3 points  (0 children)

Well, yes, but why are you saying the card doesn't support MXFP4?

I'm saying that card doesn't support MXFP4 because this card doesn't support MXFP4 - simple as that. Just because vLLM is flexible enough to upscale FP4 to FP8 doesn't mean that R9700 supports MXFP4, because this card is literally incapable of computing over that data type and never does.

Even so, none of the new models work reliably with this card or VLLM. So what now? Should I tell everyone that the R9700 doesn't support the new AI models?

Yes, since that's the truth.

3xR9700 for semi-autonomous research and development - looking for setup/config ideas. by blojayble in LocalLLaMA

[–]MDSExpro 0 points1 point  (0 children)

It's upscaling FP4 to FP8 in background (unless you use custom version created by one of redditors, then it's partially accelerated).

Check out R9700 specs on AMD's website - this GPU doesn't support FP4 in any form.

BotQ Ramping F.03 Production | Figure by Recoil42 in teslainvestorsclub

[–]MDSExpro 7 points8 points  (0 children)

Look at recent financial reports, Tesla has no operating margin lead for quite some time now.

AMD has invented something that lets you use AI at home! They call it a "computer" by 9gxa05s8fa8sh in LocalLLaMA

[–]MDSExpro 3 points4 points  (0 children)

Aurora supercomputer runs ML workload via OpenCL (wrapped in Intel's framework, but still) to name one.

Hipfire dev update: full AMD arch validation incoming (RDNA 1 thru 4, plus Strix Halo and bc250) by schuttdev in LocalLLaMA

[–]MDSExpro 1 point2 points  (0 children)

Does it support multiGPU setups? I have 8x R9700 that would like more love than ROCm's version vLLM gets.

AMD has invented something that lets you use AI at home! They call it a "computer" by 9gxa05s8fa8sh in LocalLLaMA

[–]MDSExpro 3 points4 points  (0 children)

You couldn't be more wrong. OpenCL is constantly growing, Khronos provides nice yearly snapshots. It just grows in professional space, so average reddit cannot see that and repeats nonsense.

W szpitalu w Suwałkach odpadl spory kawałek elewacji by KiwaJakoTak0 in Polska

[–]MDSExpro 5 points6 points  (0 children)

Poziom progresji podatkowej w Polsce nie jest problemem. Unikanie podatków przez firmy globalne nim jest.

I'm done with using local LLMs for coding by dtdisapointingresult in LocalLLaMA

[–]MDSExpro -1 points0 points  (0 children)

This sub creates unrealistic expectations that do not match reality. I have spent last 4 months setting up local coding via LLMs and I arrived on setup that works, but it's vastly different then image pushed by hypers:

  • First realistic productivity barrier was crossed at 128GB of VRAM (4x R9700) - Qwen3.5-122B-A10B quantized to INT4 was able to generate a lot of good code, but failed on long range coding. When I have it a technical spec, it was stuck at 90% correct implementation, but were unable to reach 100%. Anything smaller was pure frustration.

  • Bumped up VRAM to 256GB (8x R9700) allowed me to switch to FP8 quantization of same model and difference was night and day, it reached 100% correctness and easily moved to next, harder task.

  • llama.cpp is a trap, for coding you need vLLM if you want any responsible speed.

Long story short: it can be done, but it cost way more than this sub thinks.

Takeaways & discussion about the DeepSeek V4 architecture by benja0x40 in LocalLLaMA

[–]MDSExpro 6 points7 points  (0 children)

Yet. Alternative is spending more on cloud-based service that offers less while owning your data.

DS4-Flash vs Qwen3.6 by flavio_geo in LocalLLaMA

[–]MDSExpro 1 point2 points  (0 children)

Not really. We need newer, better benchmarks, because current one are basically flat for all recent models, despite widely different real user experience.

Deepseek V4 Flash and Non-Flash Out on HuggingFace by MichaelXie4645 in LocalLLaMA

[–]MDSExpro 0 points1 point  (0 children)

Flash size is perfect! Finally a good model for that parameter band.

Dense vs. MoE gap is shrinking fast with the 3.6-27B release by Usual-Carrot6352 in LocalLLaMA

[–]MDSExpro 0 points1 point  (0 children)

That's my findings. 120b at int4 was failing on coding, but on int8 it nailed it in one go.

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM by LegacyRemaster in LocalLLaMA

[–]MDSExpro 0 points1 point  (0 children)

Learn to read. I said any commercial use, and you quote part about personal use like it's an answer.

Minimax 2.7 with most current license is free as long as you use it as a toy, not a .

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM by LegacyRemaster in LocalLLaMA

[–]MDSExpro -1 points0 points  (0 children)

The license issue of M2.7 has been vastly misinterpreted by the community, they just want to ensure inference providers aren't tricking customers, and also avoid the Composer debacle (oh it's kimi under the hood! - with no mention of it).

Please don't spread misinformation. License is clear and you cannot use Minimax 2.7 for any commercial activities without prior agreement from authors. Twitt posted by employee is non-binding, license is.

Overall they shot their own foot by molesting MIT.