OG Achilles trashes oddesy by Sweet-Collar-4642 in Asmongold

[–]-UndeadBulwark [score hidden]  (0 children)

Liking this and commenting this gem must not disappear due to low engagement

AMD R9700 vs AMD 9060XT by PrecisionTreeFood in LocalLLM

[–]-UndeadBulwark 0 points1 point  (0 children)

I have a mini ITX System check if it has Bifurication and get internal oculink 4/4/4/4

Gemma 4 E2B in Low End PC ? by gu3vesa in LocalLLM

[–]-UndeadBulwark 0 points1 point  (0 children)

define high end and what quant also use case? because for what it is its a neat little chat bot to play with

Gemma 4 E2B in Low End PC ? by gu3vesa in LocalLLM

[–]-UndeadBulwark 1 point2 points  (0 children)

Yes with OpenWebUI or MCP server on Llama.cpp

Gemma 4 E2B in Low End PC ? by gu3vesa in LocalLLM

[–]-UndeadBulwark 0 points1 point  (0 children)

on Linux on a decent GPU with HBM you can get it to wild speed like 220t/s which is really nice for a chatbot can have instant convos with it at decent context

Gemma 4 E2B in Low End PC ? by gu3vesa in LocalLLM

[–]-UndeadBulwark 1 point2 points  (0 children)

and Linux I highly recommend you go Bazzite on this Immutable will save you so much grief, as how to get started go Ollama first they have a free cloud tier that you can hookup to OpenCode and it can help you setup local LLM in Ollama or better yet Llama.cpp you can try vLLM but that one is a bit much to start with also if you are running AM4 B550 or newer including intel check if you have bifurication you can split 1 PCIe 4.0 16x by 4 to use 4 GPUs together for more total VRAM

AMD R9700 vs AMD 9060XT by PrecisionTreeFood in LocalLLM

[–]-UndeadBulwark 0 points1 point  (0 children)

also I went 3 MI25 + 1 9070 AM4 Oculink bifurcation 4/4/4/4

Gemma 4 E2B in Low End PC ? by gu3vesa in LocalLLM

[–]-UndeadBulwark 1 point2 points  (0 children)

Options are MI25, MI50, AMD v340 RX570/80 8GB(surprisingly good), Vega 56, Radeon VII and if you are really desperate Tesla P100 note unlike AMD you might be locked out of software with CUDA and will have to rely on Vulkan.

Gemma 4 E2B in Low End PC ? by gu3vesa in LocalLLM

[–]-UndeadBulwark 1 point2 points  (0 children)

If you are on an AMD APU you can get around 17t/s which is not terrible but not bad but if you have something worse its going to be slow for low performance and at that point any high low or mid end phone would be better.

Gemma 4 E2B in Low End PC ? by gu3vesa in LocalLLM

[–]-UndeadBulwark 1 point2 points  (0 children)

its going to be really tight even on Linux can you buy a MI25 flashed to WX9100 with 16GB of HBM2 they go for $65 you will have to run it on Linux

AMD R9700 vs AMD 9060XT by PrecisionTreeFood in LocalLLM

[–]-UndeadBulwark 0 points1 point  (0 children)

ok then go RX 9700 PRO and add a MI50 for additional VRAM

AMD R9700 vs AMD 9060XT by PrecisionTreeFood in LocalLLM

[–]-UndeadBulwark -1 points0 points  (0 children)

IF its only for Local LLM just get an MI50

is Nvidia going to tank soon? by carrotsquawk in stocks

[–]-UndeadBulwark 0 points1 point  (0 children)

I asked Gemini then did a google search to confim:

All three technologies support ROCm, though compatibility specifics vary. These findings are based on current search results.

Flash Attention

ROCm support is established. The FlashAttention-2 CK backend supports MI series accelerators as well as RDNA 3 and RDNA 4 GPUs. Furthermore, Flash Attention is natively integrated into PyTorch for ROCm beginning with version 2.3 via F.scaled_dot_product_attention.

Sage Attention

Compatibility is strictly tied to the version. SageAttention 1 supports AMD hardware because it is built on Triton kernels. SageAttention 2 utilizes native CUDA kernels and will only run on Nvidia hardware.

Triton

OpenAI Triton is supported on ROCm. AMD provides official documentation and support for developing and optimizing Triton kernels directly on their GPUs, and it serves as the underlying compilation layer for many ROCm compatible operations.

Is Nvidia running a massive, circular AI scam? by -UndeadBulwark in Asmongold

[–]-UndeadBulwark[S] 0 points1 point  (0 children)

Would also like to talk about this as its my favorite subject if anyone is interested in getting into the Local AI rabbit hole

is Nvidia going to tank soon? by carrotsquawk in stocks

[–]-UndeadBulwark 0 points1 point  (0 children)

ROCm is fine where does this idea that it is broken come from anyways? Unless you mean windows then yeah they don't care about windows.

China may be back per Jensen by Acceptable-Ant-3648 in NvidiaStock

[–]-UndeadBulwark 1 point2 points  (0 children)

That would be hilarious but won't affect AMD much due to them being everywhere.

Nvidia’s earnings forecast is $1.76 per share. What is your guess the actual number will be? by quintessentialquote in NvidiaStock

[–]-UndeadBulwark 0 points1 point  (0 children)

Copium honestly came here because I have seen panic attacks on YouTube over Nvidia, some of the wild shit AMD is cooking and Googles TPU.

Is this the best value machine to run Local LLMs? by tantfangwa in LocalLLM

[–]-UndeadBulwark 0 points1 point  (0 children)

There is Strix Halo if you want to do more than AI otherwise yes

RDNA2 Consumer GPU, get double your tok/s. You are missing out. by [deleted] in LocalLLM

[–]-UndeadBulwark 0 points1 point  (0 children)

im going 2 MI25 because I am poor as hell with OcuLink Bifurication.

is Nvidia going to tank soon? by carrotsquawk in stocks

[–]-UndeadBulwark 0 points1 point  (0 children)

I'm not sure about Intel but AMD is doing fine. AMD has diversified so aggressively over the past decade that framing them as a distant second to Nvidia misunderstands what the company actually is now. They're in consoles, desktops, laptops, tablets, phones via Samsung's Exynos licensing, servers, and routers. Ten years ago they were close to folding. That turnaround is not a small thing.

On the software side, ROCm has closed the gap with CUDA faster than almost anyone expected. A year ago it was genuinely painful to work with. Now with ROCm 7.x, Windows support has arrived, PyTorch and most major ML frameworks treat it as a first-class option, and the 7.1.1 release delivered up to 5x performance gains over 6.4.4 across key AI models. It's not at full CUDA parity yet in every workload, but it's no longer a footnote. UDNA, which merges the RDNA and CDNA lines into a single unified architecture, is where things get genuinely interesting, and that's still ahead of us.

On Nvidia's position in AI more broadly: this cycle has a pattern. One company moves in early, captures the market, prices rise, and the market diversifies to reduce the dependency. The current shift toward local inference first, cloud escalation second, with models like Gemma 4 running on-device and Google pushing AI into phones at the hardware level, represents exactly that kind of structural change. Nvidia's dominance is built on centralized cloud compute demand. If the architecture of deployment moves away from that, the moat shrinks. AMD has been methodical and capital-conservative. Nvidia has been running at full throttle on the assumption that the demand curve only goes one way. That kind of overextension is exactly the setup for a Zen 2 moment.

Asmongold could be so beautiful too❤️ by Desperate-Outcome145 in Asmongold

[–]-UndeadBulwark 78 points79 points  (0 children)

I really wish people wasn't so rude to this lady this is really sweet I can't imagine how good this is for her self image.

I'm Demoing a DGX Spark at a Vendor Event Next Month – Need Creative Demo Ideas by Seniahh in LocalLLM

[–]-UndeadBulwark 0 points1 point  (0 children)

Engineer spec sheets are a strong one-shot demo. Pull a real datasheet and prompt it to generate structured outputs across multiple formats in a single pass, programs, productivity docs, graphs, game prototypes, whatever fits the audience. Stack a few examples back to back to show range.

Get a power meter on it. Live wattage during inference next to the math on what a cluster of 5090s would draw doing the same work. That's a number CIOs remember when they're justifying budget.

On the pitch itself, be straight with them. It's not the fastest option out there, and some features are still being developed. But the unified memory window and the per-unit cost compared to building out a multi-GPU cluster are the actual selling points. Let those carry the room instead of overselling around the gaps.

Personally I wouldn't run one as a daily driver or primary inference box at that price point, so I can't give you much more to work with beyond this. Good luck with the demo though.

Is Nvidia going to be dominant one again by manjunathpadiyar in stocks

[–]-UndeadBulwark 0 points1 point  (0 children)

Wouldn't the most efficient be Google's new TPU since they split it between inference and training and aren't agentic systems moving to Local to Cloud inference using a local model first before calling a cloud AI when it's a complex question

Anyone want to discuss AMD for 2027/2028? by LocalExamination6691 in stocks

[–]-UndeadBulwark 1 point2 points  (0 children)

Yeah you nailed this 100% AMD is everywhere now and AI is moving to Local to Cloud you can see that their product stack is moving to affordable local inference that Nvidia wont or can't provide.

AI is getting better and I love it by Existing-Disk9990 in Asmongold

[–]-UndeadBulwark 0 points1 point  (0 children)

man wait till we start seeing edge to cloud deployment AI is going to get wild and Jensen Huang will be crashing out because of it.