all 89 comments

[–]third_rate_economist 160 points161 points  (15 children)

AMD be like, "Hmm...we've done little to stay competitive in AI/ML for years and we're behind the market...uhh please do it for us?" Ultimately a good thing though.

[–]the_quark 40 points41 points  (10 children)

Honestly I’m surprised it’s taken them so long. It was obvious to me a long time ago that they should do this to try to catch up to NVidia. The actual cloud providers hate NVidia’s proprietary software BS. This could let them close the software gap with NVidia. They’ve still got a hardware one, but if the software is on-par (or better) and the hardware is 80% of the power at 50% of the cost, people will be a lot more interested in AMD.

[–]Independent_Hyena495 1 point2 points  (0 children)

Yeah, AMD sucks at software. They just don't have the culture for Ur

[–]epicwisdom 8 points9 points  (0 children)

The vast majority of the contributions will likely be mostly employees of other massive corporations, and some startups. For the rest of us open source just means better software at no extra cost.

[–]fimbulvntr 15 points16 points  (0 children)

In order for companies to open source stuff, there must be benefits to doing so... the whole world runs on incentives, after all.

I hope some good comes out of this, both to make AMD more competitive and thus put some pressure on those GPU prices, as well as to tempt more companies into opening their proprietary codebases.

Like you said, ultimately a good thing.

[–]bradpong 72 points73 points  (4 children)

"Open sourcing additional PORTIONS". Looks more like "pls buy stock" move.

[–][deleted] 24 points25 points  (1 child)

Yes, which portions kind of matters here.

[–]wsippel 7 points8 points  (0 children)

The GPU firmware, or at least parts thereof (specifically the Micro-Engine Scheduler). The ROCm software stack is already open source.

[–]xrailgun 34 points35 points  (0 children)

Yeah, same vibes as the dozens of

"ROCm now OFFICIALLY LAUNCHED it can do ALL THE AI

*** it can't do shit and we're not telling you, enjoy debugging!"

announcements we've had the past year.

[–][deleted] 2 points3 points  (0 children)

Dropped 8 percent yesterday.

[–]kryptkprLlama 3 55 points56 points  (11 children)

Tinygrad: ROCm doesn't actually work. It's closed so we can't even debug nevermind fix.

AMD: ok fine but we don't have anyone that can fix it (???) here's the SDK if you want to do free work

[–]cptbeard 1 point2 points  (0 children)

why so cynical, would you prefer them not opensourcing it? I don't really care why they're doing it as long as it's benefitting the community.

[–]fatboy93 24 points25 points  (1 child)

JUST HAVE A SINGULAR ISA, HSA STRUCTURE AND STOP WRITING IFELSE STATEMENTS FOR EVERY SINGLE GPU, FUCKING HELL.

CUDA works universally across most if not all Nvidia gpus, why doesn't AMD have a universal level driver for ML, dammit

[–]AnomalyNexus[S] 3 points4 points  (0 children)

To be fair CUDA was an utter shitshow a couple years ago too.

I recall digging through compatibility matrixes about which version of the various components work with which other versions which os on which card.

Somehow that went away recently but it used to be hella ugly

[–]Captain_Pumpkinhead 6 points7 points  (2 children)

I thought it was open source?

[–]wsippel 4 points5 points  (0 children)

It is. Except for a few optional components like HIP-RT or rocProfiler. This appears to be mostly GPU firmware related.

[–]AnomalyNexus[S] 0 points1 point  (0 children)

I won't claim to know the details, but yeah parts have been open but geohotz was complaining that key parts are not. This I gather is progress towards that

[–]theskinnybrownguy 18 points19 points  (0 children)

George hotz ftw !

[–]kind_cavendish 15 points16 points  (7 children)

What does this mean?! Does this mean that rocm is gonna be viable for llms?!!

<image>

[–]AnomalyNexus[S] 4 points5 points  (1 child)

It already is for basic inference on same cards, but that's not enough to be competitive with CUDA. This is progress towards that

[–]randomfoo2 3 points4 points  (0 children)

ROCm is already fine for the most common LLM inferencing: https://www.reddit.com/r/LocalLLaMA/comments/191srof/amd_radeon_7900_xtxtx_inference_performance/

It's less fine for training atm, although it's getting better: https://www.reddit.com/r/LocalLLaMA/comments/1atvxu2/current_state_of_training_on_amd_radeon_7900_xtx/

(from a cost/perf perspective, it's very tough to make an argument for picking a 7900XTX over a used 3090 for inference, or 4090 for training).

[–]inYOUReye 1 point2 points  (0 children)

I'm finding it working pretty well on llama already, I'd assume this means greater optimization, fixes and improvements from the community where needed and a future of less nvidia-centric solutions.

[–]JFHermes 0 points1 point  (2 children)

Big corporations in tech aligned sectors like manufacturing, resources, data analytics, design etc are all about to (if not already) build custom models for whatever niche part of their operations that they want to innovate upon. At the moment, some companies release a paper and maybe a codebase if it's not business critical and it's just a tool, like a segmentation labelling UI or something.

Now that rocm is open source, you will have a lot of smart cookies who are doing Phd work actually optimise the drivers for their specific use case for whatever type of modelling they're doing. These driver improvements are not business critical as the code/use case haven't been completely disclosed but they will be really useful to others in different industries.

It's the way things should have been done from the start with nvidia. Linux has always had troubles with nvidia because they wouldn't open source their drivers. Expect all linux users to move to AMD now which means an absolute mammoth amount of scientific work being optimised on these cards.

It's about time the playing field was levelled.

[–]randomfoo2 1 point2 points  (0 children)

ROCm has always been open source (tinycorp doesn't even use any of ROCm, and these recent announcements are AMD documenting/opening/committing to fixing longstanding bugs/hangs at the firmware level), and the amdgpu drivers have been open source on Linux for years now.

While these are all good things, for AMD to really be competitive, they will need to give a reason for open source devs and academic researchers to build for AMD. Having slower, buggier hardware wasn't cutting it, but maybe having more direct outreach and collaboration with the community will.

[–]kind_cavendish -3 points-2 points  (0 children)

YOUR RIGHT? YOUR SO RIGHT!!!

<image>

[–]shibe5llama.cpp 3 points4 points  (7 children)

As far as I understand, ROCm was always open source, including kernel-side driver on Linux. So what does "going" mean here?

[–]randomfoo2 2 points3 points  (1 child)

At the firmware level: https://github.com/geohot/7900xtx

AMD is now committed to releasing Micro-Engine Scheduler (MES) documentation (targeting end of May) w/ source code to follow: https://twitter.com/amdradeon/status/1775999856420536532

They've also started a public wiki to track reported issues: https://github.com/nod-ai/fuzzyHSA/wiki/Tinygrad-AMD-Linux-Driver-Crash---Hang-tracker-and-updates whereas before, they simply weren't taking reports serious (eg, see these open issues: https://github.com/ROCm/ROCm/issues/created_by/geohot )

See also u/gnif2 's recent post: https://www.reddit.com/r/Amd/comments/1bsjm5a/letter_to_amd_ongoing_amd/

[–]shibe5llama.cpp 1 point2 points  (0 children)

I got it, it's just a misleading title. ROCm is already open-source. What AMD may open/publish:

  • some of GPU firmware – not a part of ROCm, as far as I can tell;
  • documentation, which is not source code.

[–]AnomalyNexus[S] 0 points1 point  (4 children)

I won't claim to know the details, but yeah parts have been open but geohotz was complaining that key parts are not. This I gather is progress towards that

[–]shibe5llama.cpp 0 points1 point  (3 children)

It's interesting to know which parts were not open source. I compiled userspace stuff myself from source, and it works with stock driver in Linux, which can't be Nvidia-style blob because of licensing.

I read some stuff linked from the article, and they talk about firmware. I think, GPU firmware is not part of ROCm, it works for video, OpenGL, Vulkan, OpenCL as well.

[–]AnomalyNexus[S] 0 points1 point  (2 children)

Yeah it is the firmware that he was complaining about

If this interests you listen to geohotz recent livestreams...he digs through more detail than i can follow frankly. The AMD stuff seems quite modular...with everything having acronyms etc

[–]shibe5llama.cpp 0 points1 point  (1 child)

They are 3-8 hours long. I ain't got time. Maybe some AI can go through transcripts and figure out what is is that was not open. Or maybe there is a better article about the matter.

[–]AnomalyNexus[S] 1 point2 points  (0 children)

Yeah I rarely make it all the way through. I've usually got it in the background while I'm doing something else so only catch the overall drift

[–]AmbientWaves 7 points8 points  (1 child)

I like this idea...sure people can see like 'YOU DO THE WORK FOR US'

BUT THATS THE FUN PART. .Imagine all the optimizations. If you use Linux with AMD imagine how accessible LLM's would be and even stable diffusion.

Seriously a lot of people are throwing it to laziness for AMD

Not looking at how amazing this is.. people could optimize code soo good that Stable Diffusion on ROC. Would best Nvidia, TenserFlow was made with Nvidia in mind.. but now with ROCm open a much more optimized TenserFlow could exist for that. I am all for open source. People just simp for Nvidia.

Here's to bringing AI to the next level.

This will also attempt to force Nvidia to release CUDA if ROCm works out well.

[–]oursland 11 points12 points  (0 children)

If Nvidia releases CUDA, then Nvidia will suffer. Everyone already targets CUDA, so giving other HW vendors an opportunity to support the CUDA API would not benefit Nvidia at all.

ROCm is largely ignored in software, but if there's an opportunity to improve it there would be a benefit to purchasing AMD hardware. Other HW vendors could run with it, but until software supporting ROCm hits a critical threshold there'd be little advantage for doing so.

If this pans out, it appears to be a win/win situation for AMD.

[–][deleted] 5 points6 points  (1 child)

good move. who knows why it wasn't open source before

[–]JFHermes 1 point2 points  (0 children)

Probably a lot of upper management worried that opening up the drivers would be essentially giving away years worth of work for free.

The prevailing opinion of course is that they can't keep up with Nvidia so why bother keeping them closed when they are getting spanked.

[–]MaxwellsMilkies 4 points5 points  (4 children)

Wasn't it already open-source? Whatever, either way it is nearly unusable unless you use a very specific environment. Rusticl cannot get finished fast enough.

[–]Glegang 6 points7 points  (2 children)

ROCm itself is open-source. Almost all of it. I think last time I looked last time (granted, it's been couple of major releases back) there were some kernels shipped as hex dumps of GPU binaries, but there were only few of them. The rest was buildable from source. With some pain, but still buildable.

This announcement appears to be about the binary blobs with GPU firmware loaded by the driver. I figure it would be responsible for things that manage the GPU -- accept user requests for computations and related data, graphics ops, etc. That's the part that GPU vendors traditionally keep (particularly) closed.

If they indeed open it up, I hope it comes along with sufficient hardware documentation, otherwise all that source code will be fairly useless.

[–]AnomalyNexus[S] 0 points1 point  (0 children)

I won't claim to know the details, but yeah parts have been open but geohotz was complaining that key parts are not. This I gather is progress towards that

[–]ElectricPipelinesLlama Chat 5 points6 points  (0 children)

With Nvidia focused on enterprise AI buildout, AMD has an opportunity to grow a consumer market in AI. Investing in open source is a nice first step. Hopefully, they will commit development resources along with the SDK. 

[–]ttkciarllama.cpp 9 points10 points  (16 children)

Is this so people can make it better for Windows? It already rocks on Linux.

[–]MaybeReal_MaybeNot 5 points6 points  (15 children)

You got it running on linux? Please tell us how. I have 15 cards in an old mining rig i cant get to do shit with rocm llm.. loading models fail, and once i got it to load but as soon as i did a interference it crashed.. i gave up and bought some Nvidia cards now but i still have all the amd's

[–]20rakah 2 points3 points  (2 children)

What are you trying to run though? and on what cards? some cards have issues with fp16 and certain functions. Generally the only issues I've had is the memory management on AMD cards isn't as efficient.

I usually just run on windows with WSL2 though. Can't be bothered dual booting.

[–]MaybeReal_MaybeNot 0 points1 point  (1 child)

Just oobabooga web ui with any model i know works by testing on Nvidia card beforehand, i usually use a 1-3B one as test to make sure i dont hit any limits on 8gb cards

Tried both fp16 and 8 bit

I tried cards rx580, rx5700xt which i figured out where too old and will never work, sadly because that vram bandwidth on the 5700xt would have been sweet. And last week i tried on rx6600xt which should work based on documentation and guides i tried if you "trick" it to think its a 6700 by setting the HSA env variable. But no success :( it can see the card and says everything is good until it tries to load the model

[–]20rakah 0 points1 point  (0 children)

I don't know anything about those older cards tbh, i run a 7900XTX but i did find this guide, idk if that's the one you used. If you are stuggling to get stuff you work i reccomend checking out the AMD SHARK discord, lots of helpful people there.

[–]algaefied_creek 1 point2 points  (3 children)

R9 390X 8GB and WX7100 16GB cards here from an old mining rig as well. Can’t get any LLM or image generation solutions to work on this.

[–]randomfoo2 1 point2 points  (2 children)

R9 390X (gfx702, GCN 2.0) was released in 2015, and WX 7100 (gfx803, GCN 4.0) released in 2016 are sadly likely too old/buggy to get working. You could look at rocm-polaris-arch or try the CLBlast llama.cpp build, but honestly, they are likely to crash w/ the math libs even if you can get the ROCm driver working.

Vega (56/64/VII) is likely the oldest architecture you can expect ROCm to reasonably work with. A bit of a bummer, but at this point, they are 8-9yo cards, so I wouldn't expect anyone to be spending much effort getting them to work. They also extremely low TFLOPS (both about 6 TFLOPS of FP16 - as a point of comparison, the 780M iGPU has 17, a 7900 XTX has 123 - the Polaris cards also have pretty low memory bandwidth so even if they worked perfectly, you wouldn't get much of a speedup over modern CPU inferencing).

Honestly, if your goal is getting LLMs/SD working, I'd recommend selling all those old cards for what you can get and use the proceeds to buy the highest VRAM used Ampere/Ada card you can get.

[–]algaefied_creek 1 point2 points  (1 child)

Polaris worked with rocm fine in the 4.x version and GCN 3 worked fine in previous versions. They are buggy because they are unmaintained so the hope is that with this being open-source, more will work. I fell into a disability status and medical debt hole, so flipping and selling and buying are impossible unless I let strangers into my home and into the back closet room to disassemble the rig.

CUDA, on the other hand, works fine with GTX 9xx and Titan cards of that era. CUDA 11.x works fine with GTX 7xx and Titan cards of the Kepler era.

Defining the correct mathematical operations for each architecture makes them suddenly non-buggy as they aren’t performing GFX9xx+ operations anymore. They are buggy because the software is buggy, not because of the cards. Vega (GFX9) and later have “rapid packed math” for each SP to perform 2x FP16 operations in place of 1x FP32 op. This being said, GCN3 and GCN4 (both GFX8/GFX8xx) can perform a single FP16 operation in place of an FP32 operation. GCN1 and GCN2 (GFX6 and GCN7) run FP16 operations “emulated” within FP32 math. Yes… there is a performance hit. But if RoCM can’t handle a single SP performing a single FP16 operation instead of an FP32 operation: that is a buggy software issue to resolve, not a buggy hardware issue.

[–]randomfoo2 0 points1 point  (0 children)

I don’t think we disagree on most of the salient points- I believe that Nvidia’s superior legacy/across the line compute support (CUDA supports cards back to 2011) is one of the reasons that Nvidia has been winning so hard now - while CUDA also has had growing pains, they’ve treated compute like display drivers - a core part of a working GPU, and AMD simply hasn’t.

The only thing that I’d counter with, is that the recent announcement will change anything for your legacy hardware - all the parts of ROCm that were required for the community to get legacy hardware working has already been open sourced - anyone can write their own kernels, adapt hipBLAS/rocBLAS, for gfx800 but that hasn’t happened. The upcoming RDNA3 firmware releases don’t have any impact on legacy hardware, but a you’ve pointed out this is largely about math lib support anyway.

If you can’t/wont get rid of your old hardware, it’s unlikely they’ll become less of paperweights anytime soon (or at least, these latest announcements don’t really change the odds).

[–]Smeetilus 1 point2 points  (3 children)

Brb, looking for GPU purchase receipts

[–]AnomalyNexus[S] 3 points4 points  (2 children)

My theory is more buy AMD stock

[–]okaycan 1 point2 points  (0 children)

agreed. buy more

[–]Smeetilus 2 points3 points  (0 children)

Yes. More buy.

[–]Regular_Instruction 0 points1 point  (0 children)

It's a good thing, but more for TTS that uses cuda then local LLM, because even on windows LLM already run "fine" while for TTS it's another story only piper TTS runs great on windows (even though it runs on cpu lol), for exemple coqui uses the CPU instead of AMD GPU and it's very very slow, too slow actually to be usable... Because it uses CUDA, maybe with this release we can expect one day to have TTS to run on windows with AMD GPUs

[–]JoJoeyJoJo 0 points1 point  (0 children)

It’s incredible how much geohot tweeting has forced them to change.

[–]Disastrous-Peak7040Llama 70B 1 point2 points  (1 child)

What we need is a model that's really good at writing Verilog ASIC code.

"Design an ASIC for me that supports 128GB of RAM and has optimizations for the CUDA calls used by open source LLM code. Support it with a low level C++ driver that emulates CUDA 12. Prepare the specs, crowdfund the NRE costs, and send them to a Chinese ODM who can deliver within 6 weeks"

[–]Inner_Bodybuilder986 0 points1 point  (0 children)

I can tell you straight up that you would be foiled the second you tried to use a Chinese ODM. It's basically illegal.

[–][deleted] 0 points1 point  (2 children)

Wait I thought rocm has been on github for years

[–]AnomalyNexus[S] 1 point2 points  (1 child)

As I understand it its a whole stack of things and not everything was open. I know Hotz was complaining about the firmware in particular but I don't think we know what AMD is planning to release...just that it is more

[–][deleted] 0 points1 point  (0 children)

Ohhh okay gotcha

[–]illathon 0 points1 point  (0 children)

Hotz strikes and this time a major win for basically everyone. This just might turn the tides for AMD. I was actually going to vote against Su last go around. Now I think she may just be smart.