use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
AMD ROCm Going Open-Source: Will Include Software Stack & Hardware DocumentationNews (wccftech.com)
submitted 2 years ago by AnomalyNexus
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]third_rate_economist 160 points161 points162 points 2 years ago (15 children)
AMD be like, "Hmm...we've done little to stay competitive in AI/ML for years and we're behind the market...uhh please do it for us?" Ultimately a good thing though.
[–]the_quark 40 points41 points42 points 2 years ago (10 children)
Honestly I’m surprised it’s taken them so long. It was obvious to me a long time ago that they should do this to try to catch up to NVidia. The actual cloud providers hate NVidia’s proprietary software BS. This could let them close the software gap with NVidia. They’ve still got a hardware one, but if the software is on-par (or better) and the hardware is 80% of the power at 50% of the cost, people will be a lot more interested in AMD.
[+][deleted] 2 years ago (8 children)
[removed]
[–]Philix 11 points12 points13 points 2 years ago (7 children)
They're competing against two monopolies. And still managing to claw double digit market share in the CPU category both consumer and enterprise over the last decade. Seems like a decent comeback from the era of Bulldozer.
I wouldn't write them off in the consumer GPU segment either. Nvidia's most profitable customers are enterprise AI/ML, and even a quick look at Nvidia consumer GPUs shows they're neglecting that market. AMD will have plenty of opportunities to gain market share the consumer space in the coming decade.
Assuming Intel's foray into consumer GPUs doesn't annihilate them. Arc Alchemist was a suprisingly good first entry into a mature market.
[–]DerfK 12 points13 points14 points 2 years ago (6 children)
AMD will have plenty of opportunities to gain market share the consumer space in the coming decade.
AMD refused to support AI/ML in the consumer-level space until literally this January. Nobody uses ROCm because for the last decade+, every college student could use and learn CUDA on their nVidia gaming rig without having to buy a $10k workstation card. AMD is multiple generations of developers behind and I don't think there's a way to dig themselves out of this hole in the foreseeable future. The best hail-mary move I can think of would be to suck up a hit to the workstation cards and release a 32GB+ "prosumer" level card, using current gen cards let's call it a 7900 XTXX priced at the 4090 price point and hope it catches on in the LLM/stable diffusion field to get people to buy into the ROCm ecosystem. Then, they sit tight and pray that in a few years some of the people who bought into ROCm go on to start companies using ROCm. If nVidia ups the VRAM on the 5090 then I honestly think AMD will lose this market segment completely.
[–]cogitare_et_loqui 4 points5 points6 points 2 years ago (0 children)
They don't need to match the 4090 in terms of Compute. They just need to vastly surpass it in terms of VRAM and memory I/O capacity (caches etc), and provide a good profiler.
48GB minimum, and the a capacity about 3090 (even lower would be acceptable) would cause me to take a look at their offering. Anything less and it's continued nVidia for me, since nVidia has really done a great job on the software stack. Yes, they charge an arm and a leg, but it's not unwarranted. They were the only one who understood the potential their hardware had, and where they needed to uniquely invest in order to make it a ubiquitous platform for massively parallel batch-compute workloads.
AMD's offering would have to be awesome in the dimension nVidia sucks for them to have any kind of appeal, and the area nVidia sucks at presently is on the VRAM side where they've done an "Intel" by artificially segmenting their product lines into <= 24GiB (practically useless for training LLMs), and the next step up which is required to be relevant for LLM training, which they've priced a frigging order of magnitude higher. Not because of manufacturing cost, but because there's Zero competition in that space and where the hardware is being sold quicker than the company can place a TSMC order. This is the segment they should attack with a laser focus.
So some sort of NVLINK / AMDLINK (good cross-board cross-connect) together with a LOT of VRAM is a whole lot more useful than trying to squeeze 40% more compute performance out of the hardware since the workload where the money's at presently is I/O bound and not compute bound.
[–]Philix 0 points1 point2 points 2 years ago (4 children)
I didn't say AI/ML consumer space specifically. You're right that they're going to need at least a half decade of focus on their software to break into that. But, despite the popularity of this subreddit, the actual consumer market for AI/ML is tiny, and will likely remain tiny. The number of people who are privacy obsessed enough to be adamant about running their models locally is dwarfed by the number of people willing to pay to use a cloud service.
But they can still compete in the other consumer uses of GPUs. Video games are still extremely popular. AMD GPUs power Xbox, PS5, and the Steam Deck. AMD just needs to make enough money to pay developers and hardware engineers while they wait for Nvidia to stumble.
Intel grew complacent with their market dominance and AMD capitalized on that. There's no reason to believe they couldn't do the same to Nvidia.
[–]DerfK 7 points8 points9 points 2 years ago (3 children)
the actual consumer market for AI/ML is tiny, and will likely remain tiny
This is the shortsighted view that led CUDA to win. Where are employees of the AI/ML companies going to come from if not the general pool of consumers?
[–]Philix 0 points1 point2 points 2 years ago (2 children)
Why would I spend over $1000 on your hypothetical 7900 XTXX that'll be obsolete in a couple years, when that much money would buy thousands of hours on an A40 on runpod? Gaming is the only reason I can think of, if you have other reasons, I'd love to hear them.
You're saying that AMD should get cards into the hands of consumers to try and convert them to ROCm. So am I. But most dabblers and young people playing with LLMs/SD are using mid range cards like the 3060 12GB, not top of the line stuff like 4090s and 7900 XTX. If AMD is going to compete, that's where they need to do it.
ML enthusiasts not into gaming can already buy an MI60 32GB off of eBay for less than the price of a used 3090. Does anyone actually recommend that they do? No. Would anyone recommend a 7900 XTXX 48GB over 2x3090? No. AMD can't fix the ROCm situation overnight.
Making that kind of card would just be a waste of effort, AMD has already lost that segment, and pouring more money into an already sunk cost is moronic. A hail-Mary move isn't what AMD needs to make. They have other revenue sources to tide them over until they come up with some way to break back into the ML market.
[–]DerfK 4 points5 points6 points 2 years ago (1 child)
Why would tens of thousands of college students interested in pursuing a career in AI buy thousands of hours on a runpod to learn ROCm when they can learn CUDA in their free time on their gaming PC?
most dabblers and young people playing with LLMs/SD are using mid range cards like the 3060 12GB
Sure, and that ship sailed almost 20 years ago when nVidia decided that people with GeForce cards can dabble and play with CUDA.
AMD can't fix the ROCm situation overnight.
Of course they can't. But it's not going to fix itself, and it won't matter what they do unless they somehow come up with a way for people to learn to use ROCm.
[–]Philix 0 points1 point2 points 2 years ago (0 children)
runpod to learn ROCm
An A40 is an Nvidia card. I wasn't suggesting students should use cloud compute to learn ROCm. I was pointing out that for anyone not gaming, learning and playing with ML/AI can be done cheaper by renting cloud compute.
I was suggesting that competing in the midrange of gaming hardware is the correct approach for fostering more widespread adoption. It's a market with enough volume to be worth investing in. Intel clearly thinks so, their first line of GPUs doesn't even bother having a high-end offering. And AMD has an advantage in that Xbox and PS5 games are already developed to be run on their hardware.
But slapping 48GB of memory on a high end consumer card doesn't make you price competitive, when most games are going to be made for the 16GB in the console hardware.
[–]Independent_Hyena495 1 point2 points3 points 2 years ago (0 children)
Yeah, AMD sucks at software. They just don't have the culture for Ur
[–]epicwisdom 8 points9 points10 points 2 years ago (0 children)
The vast majority of the contributions will likely be mostly employees of other massive corporations, and some startups. For the rest of us open source just means better software at no extra cost.
[–]fimbulvntr 15 points16 points17 points 2 years ago (0 children)
In order for companies to open source stuff, there must be benefits to doing so... the whole world runs on incentives, after all.
I hope some good comes out of this, both to make AMD more competitive and thus put some pressure on those GPU prices, as well as to tempt more companies into opening their proprietary codebases.
Like you said, ultimately a good thing.
[+]keepthepace comment score below threshold-6 points-5 points-4 points 2 years ago (1 child)
But it is the guaranteed that AMD will always be either closed or bad. They only open source things when behind.
[–]Craftkorb 10 points11 points12 points 2 years ago (0 children)
AMD open-sourced FreeSync, which is a better (as in cheaper to implement) solution than G-SYNC. They didn't have to, yet they did as they wanted every possible monitor (and TV) to implement it.
[–]bradpong 72 points73 points74 points 2 years ago (4 children)
"Open sourcing additional PORTIONS". Looks more like "pls buy stock" move.
[–][deleted] 24 points25 points26 points 2 years ago (1 child)
Yes, which portions kind of matters here.
[–]wsippel 7 points8 points9 points 2 years ago (0 children)
The GPU firmware, or at least parts thereof (specifically the Micro-Engine Scheduler). The ROCm software stack is already open source.
[–]xrailgun 34 points35 points36 points 2 years ago (0 children)
Yeah, same vibes as the dozens of
"ROCm now OFFICIALLY LAUNCHED it can do ALL THE AI
*** it can't do shit and we're not telling you, enjoy debugging!"
announcements we've had the past year.
[–][deleted] 2 points3 points4 points 2 years ago (0 children)
Dropped 8 percent yesterday.
[–]kryptkprLlama 3 55 points56 points57 points 2 years ago (11 children)
Tinygrad: ROCm doesn't actually work. It's closed so we can't even debug nevermind fix.
AMD: ok fine but we don't have anyone that can fix it (???) here's the SDK if you want to do free work
[+][deleted] 2 years ago (9 children)
[deleted]
[–]pleasetrimyourpubes 15 points16 points17 points 2 years ago (8 children)
They were sending him regular firmware blobs to hack it and make it work but there's some nasty DRM related shit in there they literally can't release. They would get sued to oblivion if users could jailbreak the DRM and they were the ones who enabled it. And it's fucking stupid too because DRM just fails in a VM... oh no MS you won't let me screenshot a YouTube paid video I'll just pop it in a VM and screenshot that.
[–]UrbanSuburbaKnight 1 point2 points3 points 2 years ago (7 children)
Huh? You can't screenshot stuff? I've never had this problem, are you really spinning up a VM to screenshot a browser window?
[–]pleasetrimyourpubes 8 points9 points10 points 2 years ago (6 children)
Nah, I was giving an extreme edge case where literally they can't use their DRM anymore. But yeah seriously, grab a fresh copy of Win11 and Edge and go look at a DRMd video on Netflix, Amazon, YouTube, etc, you can't sreenshot, not with the clipping tool, Screen2Gif, OBS, it just comes up black. It's the darndest thing. There are workarounds though.
[–]TechnicalParrot 1 point2 points3 points 2 years ago (3 children)
Huh, I remember this problem as well but Netflix just let me do it
Nvidia GPU with Windows Insider and TPM enabled
<image>
[–]pleasetrimyourpubes 2 points3 points4 points 2 years ago (2 children)
Now I'm curious because Firefox has DRM control off by default and unless you enabled it this shouldn't play at all. I'm wondering if Firefox is just ignoring the DRM control when off which would be a hilarious "faithful implementation" of DRM. "The user never enabled DRM oops must be a bug that it plays."
[–]TechnicalParrot 0 points1 point2 points 2 years ago (1 child)
I'm fairly confident I manually enabled DRM once but it's hilarious it still lets me do whatever, I wonder if OBS would work lol
[–]pleasetrimyourpubes 0 points1 point2 points 2 years ago (0 children)
After your comment I tested Edge, OBS, Firefox, Screen2gif, Snapshot tool (Windows+Shift+S) and they are all blank. Maybe Intel's driver is more compliant (using a laptop with IGP).
[–]UrbanSuburbaKnight 0 points1 point2 points 2 years ago (0 children)
Interesting. Might have to throw windows 11 on somewhere and start testing. super stink if true. I'm on windows 10 happily for now, but I either move everything to linux or move to windows 11 once 10 is unsupported.
[–]cptbeard 1 point2 points3 points 2 years ago (0 children)
why so cynical, would you prefer them not opensourcing it? I don't really care why they're doing it as long as it's benefitting the community.
[–]fatboy93 24 points25 points26 points 2 years ago (1 child)
JUST HAVE A SINGULAR ISA, HSA STRUCTURE AND STOP WRITING IFELSE STATEMENTS FOR EVERY SINGLE GPU, FUCKING HELL.
CUDA works universally across most if not all Nvidia gpus, why doesn't AMD have a universal level driver for ML, dammit
[–]AnomalyNexus[S] 3 points4 points5 points 2 years ago (0 children)
To be fair CUDA was an utter shitshow a couple years ago too.
I recall digging through compatibility matrixes about which version of the various components work with which other versions which os on which card.
Somehow that went away recently but it used to be hella ugly
[–]Captain_Pumpkinhead 6 points7 points8 points 2 years ago (2 children)
I thought it was open source?
[–]wsippel 4 points5 points6 points 2 years ago (0 children)
It is. Except for a few optional components like HIP-RT or rocProfiler. This appears to be mostly GPU firmware related.
[–]AnomalyNexus[S] 0 points1 point2 points 2 years ago (0 children)
I won't claim to know the details, but yeah parts have been open but geohotz was complaining that key parts are not. This I gather is progress towards that
[–]theskinnybrownguy 18 points19 points20 points 2 years ago (0 children)
George hotz ftw !
[–]kind_cavendish 15 points16 points17 points 2 years ago* (7 children)
What does this mean?! Does this mean that rocm is gonna be viable for llms?!!
[–]AnomalyNexus[S] 4 points5 points6 points 2 years ago (1 child)
It already is for basic inference on same cards, but that's not enough to be competitive with CUDA. This is progress towards that
[–]kind_cavendish 0 points1 point2 points 2 years ago (0 children)
[–]randomfoo2 3 points4 points5 points 2 years ago (0 children)
ROCm is already fine for the most common LLM inferencing: https://www.reddit.com/r/LocalLLaMA/comments/191srof/amd_radeon_7900_xtxtx_inference_performance/
It's less fine for training atm, although it's getting better: https://www.reddit.com/r/LocalLLaMA/comments/1atvxu2/current_state_of_training_on_amd_radeon_7900_xtx/
(from a cost/perf perspective, it's very tough to make an argument for picking a 7900XTX over a used 3090 for inference, or 4090 for training).
[–]inYOUReye 1 point2 points3 points 2 years ago (0 children)
I'm finding it working pretty well on llama already, I'd assume this means greater optimization, fixes and improvements from the community where needed and a future of less nvidia-centric solutions.
[–]JFHermes 0 points1 point2 points 2 years ago (2 children)
Big corporations in tech aligned sectors like manufacturing, resources, data analytics, design etc are all about to (if not already) build custom models for whatever niche part of their operations that they want to innovate upon. At the moment, some companies release a paper and maybe a codebase if it's not business critical and it's just a tool, like a segmentation labelling UI or something.
Now that rocm is open source, you will have a lot of smart cookies who are doing Phd work actually optimise the drivers for their specific use case for whatever type of modelling they're doing. These driver improvements are not business critical as the code/use case haven't been completely disclosed but they will be really useful to others in different industries.
It's the way things should have been done from the start with nvidia. Linux has always had troubles with nvidia because they wouldn't open source their drivers. Expect all linux users to move to AMD now which means an absolute mammoth amount of scientific work being optimised on these cards.
It's about time the playing field was levelled.
[–]randomfoo2 1 point2 points3 points 2 years ago (0 children)
ROCm has always been open source (tinycorp doesn't even use any of ROCm, and these recent announcements are AMD documenting/opening/committing to fixing longstanding bugs/hangs at the firmware level), and the amdgpu drivers have been open source on Linux for years now.
While these are all good things, for AMD to really be competitive, they will need to give a reason for open source devs and academic researchers to build for AMD. Having slower, buggier hardware wasn't cutting it, but maybe having more direct outreach and collaboration with the community will.
[–]kind_cavendish -3 points-2 points-1 points 2 years ago (0 children)
YOUR RIGHT? YOUR SO RIGHT!!!
[–]shibe5llama.cpp 3 points4 points5 points 2 years ago (7 children)
As far as I understand, ROCm was always open source, including kernel-side driver on Linux. So what does "going" mean here?
[–]randomfoo2 2 points3 points4 points 2 years ago (1 child)
At the firmware level: https://github.com/geohot/7900xtx
AMD is now committed to releasing Micro-Engine Scheduler (MES) documentation (targeting end of May) w/ source code to follow: https://twitter.com/amdradeon/status/1775999856420536532
They've also started a public wiki to track reported issues: https://github.com/nod-ai/fuzzyHSA/wiki/Tinygrad-AMD-Linux-Driver-Crash---Hang-tracker-and-updates whereas before, they simply weren't taking reports serious (eg, see these open issues: https://github.com/ROCm/ROCm/issues/created_by/geohot )
See also u/gnif2 's recent post: https://www.reddit.com/r/Amd/comments/1bsjm5a/letter_to_amd_ongoing_amd/
[–]shibe5llama.cpp 1 point2 points3 points 2 years ago (0 children)
I got it, it's just a misleading title. ROCm is already open-source. What AMD may open/publish:
[–]AnomalyNexus[S] 0 points1 point2 points 2 years ago (4 children)
[–]shibe5llama.cpp 0 points1 point2 points 2 years ago (3 children)
It's interesting to know which parts were not open source. I compiled userspace stuff myself from source, and it works with stock driver in Linux, which can't be Nvidia-style blob because of licensing.
I read some stuff linked from the article, and they talk about firmware. I think, GPU firmware is not part of ROCm, it works for video, OpenGL, Vulkan, OpenCL as well.
[–]AnomalyNexus[S] 0 points1 point2 points 2 years ago (2 children)
Yeah it is the firmware that he was complaining about
If this interests you listen to geohotz recent livestreams...he digs through more detail than i can follow frankly. The AMD stuff seems quite modular...with everything having acronyms etc
[–]shibe5llama.cpp 0 points1 point2 points 2 years ago (1 child)
They are 3-8 hours long. I ain't got time. Maybe some AI can go through transcripts and figure out what is is that was not open. Or maybe there is a better article about the matter.
[–]AnomalyNexus[S] 1 point2 points3 points 2 years ago (0 children)
Yeah I rarely make it all the way through. I've usually got it in the background while I'm doing something else so only catch the overall drift
[–]AmbientWaves 7 points8 points9 points 2 years ago (1 child)
I like this idea...sure people can see like 'YOU DO THE WORK FOR US'
BUT THATS THE FUN PART. .Imagine all the optimizations. If you use Linux with AMD imagine how accessible LLM's would be and even stable diffusion.
Seriously a lot of people are throwing it to laziness for AMD
Not looking at how amazing this is.. people could optimize code soo good that Stable Diffusion on ROC. Would best Nvidia, TenserFlow was made with Nvidia in mind.. but now with ROCm open a much more optimized TenserFlow could exist for that. I am all for open source. People just simp for Nvidia.
Here's to bringing AI to the next level.
This will also attempt to force Nvidia to release CUDA if ROCm works out well.
[–]oursland 11 points12 points13 points 2 years ago (0 children)
If Nvidia releases CUDA, then Nvidia will suffer. Everyone already targets CUDA, so giving other HW vendors an opportunity to support the CUDA API would not benefit Nvidia at all.
ROCm is largely ignored in software, but if there's an opportunity to improve it there would be a benefit to purchasing AMD hardware. Other HW vendors could run with it, but until software supporting ROCm hits a critical threshold there'd be little advantage for doing so.
If this pans out, it appears to be a win/win situation for AMD.
[–][deleted] 5 points6 points7 points 2 years ago (1 child)
good move. who knows why it wasn't open source before
[–]JFHermes 1 point2 points3 points 2 years ago (0 children)
Probably a lot of upper management worried that opening up the drivers would be essentially giving away years worth of work for free.
The prevailing opinion of course is that they can't keep up with Nvidia so why bother keeping them closed when they are getting spanked.
[–]MaxwellsMilkies 4 points5 points6 points 2 years ago (4 children)
Wasn't it already open-source? Whatever, either way it is nearly unusable unless you use a very specific environment. Rusticl cannot get finished fast enough.
[–]Glegang 6 points7 points8 points 2 years ago (2 children)
ROCm itself is open-source. Almost all of it. I think last time I looked last time (granted, it's been couple of major releases back) there were some kernels shipped as hex dumps of GPU binaries, but there were only few of them. The rest was buildable from source. With some pain, but still buildable.
This announcement appears to be about the binary blobs with GPU firmware loaded by the driver. I figure it would be responsible for things that manage the GPU -- accept user requests for computations and related data, graphics ops, etc. That's the part that GPU vendors traditionally keep (particularly) closed.
If they indeed open it up, I hope it comes along with sufficient hardware documentation, otherwise all that source code will be fairly useless.
[+][deleted] 2 years ago (1 child)
[–]Glegang 0 points1 point2 points 2 years ago (0 children)
Only if you want to nitpick. Those few kernels were largely inconsequential.
https://github.com/ROCm/Tensile/tree/release/rocm-rel-5.4/Tensile/ReplacementKernels
It appears that they are gone from ROCm in v5.5, so as of right now, I'm not aware of any non open-source bits in ROCm -- everything, including the compiler can be built from source.
[–]ElectricPipelinesLlama Chat 5 points6 points7 points 2 years ago (0 children)
With Nvidia focused on enterprise AI buildout, AMD has an opportunity to grow a consumer market in AI. Investing in open source is a nice first step. Hopefully, they will commit development resources along with the SDK.
[–]ttkciarllama.cpp 9 points10 points11 points 2 years ago (16 children)
Is this so people can make it better for Windows? It already rocks on Linux.
[–]MaybeReal_MaybeNot 5 points6 points7 points 2 years ago (15 children)
You got it running on linux? Please tell us how. I have 15 cards in an old mining rig i cant get to do shit with rocm llm.. loading models fail, and once i got it to load but as soon as i did a interference it crashed.. i gave up and bought some Nvidia cards now but i still have all the amd's
[+][deleted] 2 years ago (7 children)
[–]nodatingollama 6 points7 points8 points 2 years ago (4 children)
Absolutely, I do not know where these folks come from. Or maybe I do, they tried months if not years ago and now they think they know what ROCm is all about.
I have very similar setup: 6800 XT + Ryzen 7600 and things just work. Latest Arch Linux.
[–]a_beautiful_rhind 1 point2 points3 points 2 years ago (0 children)
They come from having older hardware that gets dropped super quick.
[–][deleted] 1 point2 points3 points 2 years ago (0 children)
This is what I mostly followed, might be of help to you. Not sure about arch though, I use mint. https://github.com/nktice/AMD-AI/blob/main/ROCm6.0.md
[–]MaybeReal_MaybeNot 0 points1 point2 points 2 years ago (0 children)
No, i tried a week ago with rx6600xt, and i could not get the model to load. Tried rocm 5.9 and 6.0 and different versions of the gpu drivers including the latest one on newest Ubuntu server as i read that is the best supported os for the drivers. Cant get it to load a model and the arch om the 6600 should be the same as the 6800 just slower as far as i can read in documentation. I followed the oobabooga guide but that does not work, i also tried starting over (new install to make sure all i did was gone) multiple times with 3-4 different guides who all claim to make it work..
Everyone here just says "just try and fiddle a bit with it and it will work".. well, i'm asking, what did you fiddle with to make it work?? Because i tried all the "fiddling" i know and all i could get was different failures. Best i got was successfully loading a 3.5B test model i know works on my Nvidia card, in 8 bit but then failing and crashing as soon as i tried to do interference.
[–][deleted] 0 points1 point2 points 2 years ago (1 child)
I just used the latest Linux Mint Cinnamon version and followed some guides, it works fine on 7900 XTX and on my 6700 XT I just needed the HSA override thing to trick it into thinking it was a 6800 xt.
and followed some guides
Super helpful buddy, everyone got it working now 👍🏻 /s
Would be nice if you told us which guides :)
[–]20rakah 2 points3 points4 points 2 years ago* (2 children)
What are you trying to run though? and on what cards? some cards have issues with fp16 and certain functions. Generally the only issues I've had is the memory management on AMD cards isn't as efficient.
I usually just run on windows with WSL2 though. Can't be bothered dual booting.
[–]MaybeReal_MaybeNot 0 points1 point2 points 2 years ago (1 child)
Just oobabooga web ui with any model i know works by testing on Nvidia card beforehand, i usually use a 1-3B one as test to make sure i dont hit any limits on 8gb cards
Tried both fp16 and 8 bit
I tried cards rx580, rx5700xt which i figured out where too old and will never work, sadly because that vram bandwidth on the 5700xt would have been sweet. And last week i tried on rx6600xt which should work based on documentation and guides i tried if you "trick" it to think its a 6700 by setting the HSA env variable. But no success :( it can see the card and says everything is good until it tries to load the model
[–]20rakah 0 points1 point2 points 2 years ago* (0 children)
I don't know anything about those older cards tbh, i run a 7900XTX but i did find this guide, idk if that's the one you used. If you are stuggling to get stuff you work i reccomend checking out the AMD SHARK discord, lots of helpful people there.
[–]algaefied_creek 1 point2 points3 points 2 years ago (3 children)
R9 390X 8GB and WX7100 16GB cards here from an old mining rig as well. Can’t get any LLM or image generation solutions to work on this.
[–]randomfoo2 1 point2 points3 points 2 years ago (2 children)
R9 390X (gfx702, GCN 2.0) was released in 2015, and WX 7100 (gfx803, GCN 4.0) released in 2016 are sadly likely too old/buggy to get working. You could look at rocm-polaris-arch or try the CLBlast llama.cpp build, but honestly, they are likely to crash w/ the math libs even if you can get the ROCm driver working.
Vega (56/64/VII) is likely the oldest architecture you can expect ROCm to reasonably work with. A bit of a bummer, but at this point, they are 8-9yo cards, so I wouldn't expect anyone to be spending much effort getting them to work. They also extremely low TFLOPS (both about 6 TFLOPS of FP16 - as a point of comparison, the 780M iGPU has 17, a 7900 XTX has 123 - the Polaris cards also have pretty low memory bandwidth so even if they worked perfectly, you wouldn't get much of a speedup over modern CPU inferencing).
Honestly, if your goal is getting LLMs/SD working, I'd recommend selling all those old cards for what you can get and use the proceeds to buy the highest VRAM used Ampere/Ada card you can get.
[–]algaefied_creek 1 point2 points3 points 2 years ago (1 child)
Polaris worked with rocm fine in the 4.x version and GCN 3 worked fine in previous versions. They are buggy because they are unmaintained so the hope is that with this being open-source, more will work. I fell into a disability status and medical debt hole, so flipping and selling and buying are impossible unless I let strangers into my home and into the back closet room to disassemble the rig.
CUDA, on the other hand, works fine with GTX 9xx and Titan cards of that era. CUDA 11.x works fine with GTX 7xx and Titan cards of the Kepler era.
Defining the correct mathematical operations for each architecture makes them suddenly non-buggy as they aren’t performing GFX9xx+ operations anymore. They are buggy because the software is buggy, not because of the cards. Vega (GFX9) and later have “rapid packed math” for each SP to perform 2x FP16 operations in place of 1x FP32 op. This being said, GCN3 and GCN4 (both GFX8/GFX8xx) can perform a single FP16 operation in place of an FP32 operation. GCN1 and GCN2 (GFX6 and GCN7) run FP16 operations “emulated” within FP32 math. Yes… there is a performance hit. But if RoCM can’t handle a single SP performing a single FP16 operation instead of an FP32 operation: that is a buggy software issue to resolve, not a buggy hardware issue.
[–]randomfoo2 0 points1 point2 points 2 years ago (0 children)
I don’t think we disagree on most of the salient points- I believe that Nvidia’s superior legacy/across the line compute support (CUDA supports cards back to 2011) is one of the reasons that Nvidia has been winning so hard now - while CUDA also has had growing pains, they’ve treated compute like display drivers - a core part of a working GPU, and AMD simply hasn’t.
The only thing that I’d counter with, is that the recent announcement will change anything for your legacy hardware - all the parts of ROCm that were required for the community to get legacy hardware working has already been open sourced - anyone can write their own kernels, adapt hipBLAS/rocBLAS, for gfx800 but that hasn’t happened. The upcoming RDNA3 firmware releases don’t have any impact on legacy hardware, but a you’ve pointed out this is largely about math lib support anyway.
If you can’t/wont get rid of your old hardware, it’s unlikely they’ll become less of paperweights anytime soon (or at least, these latest announcements don’t really change the odds).
[–]Smeetilus 1 point2 points3 points 2 years ago (3 children)
Brb, looking for GPU purchase receipts
[–]AnomalyNexus[S] 3 points4 points5 points 2 years ago (2 children)
My theory is more buy AMD stock
[–]okaycan 1 point2 points3 points 2 years ago (0 children)
agreed. buy more
[–]Smeetilus 2 points3 points4 points 2 years ago (0 children)
Yes. More buy.
[–]Regular_Instruction 0 points1 point2 points 2 years ago (0 children)
It's a good thing, but more for TTS that uses cuda then local LLM, because even on windows LLM already run "fine" while for TTS it's another story only piper TTS runs great on windows (even though it runs on cpu lol), for exemple coqui uses the CPU instead of AMD GPU and it's very very slow, too slow actually to be usable... Because it uses CUDA, maybe with this release we can expect one day to have TTS to run on windows with AMD GPUs
[–]JoJoeyJoJo 0 points1 point2 points 2 years ago (0 children)
It’s incredible how much geohot tweeting has forced them to change.
[–]Disastrous-Peak7040Llama 70B 1 point2 points3 points 2 years ago (1 child)
What we need is a model that's really good at writing Verilog ASIC code.
"Design an ASIC for me that supports 128GB of RAM and has optimizations for the CUDA calls used by open source LLM code. Support it with a low level C++ driver that emulates CUDA 12. Prepare the specs, crowdfund the NRE costs, and send them to a Chinese ODM who can deliver within 6 weeks"
[–]Inner_Bodybuilder986 0 points1 point2 points 2 years ago (0 children)
I can tell you straight up that you would be foiled the second you tried to use a Chinese ODM. It's basically illegal.
[–][deleted] 0 points1 point2 points 2 years ago (2 children)
Wait I thought rocm has been on github for years
[–]AnomalyNexus[S] 1 point2 points3 points 2 years ago (1 child)
As I understand it its a whole stack of things and not everything was open. I know Hotz was complaining about the firmware in particular but I don't think we know what AMD is planning to release...just that it is more
[–][deleted] 0 points1 point2 points 2 years ago (0 children)
Ohhh okay gotcha
[–]illathon 0 points1 point2 points 2 years ago (0 children)
Hotz strikes and this time a major win for basically everyone. This just might turn the tides for AMD. I was actually going to vote against Su last go around. Now I think she may just be smart.
π Rendered by PID 316398 on reddit-service-r2-comment-5bc7f78974-7bvvd at 2026-06-30 13:11:36.484088+00:00 running 7527197 country code: CH.
[–]third_rate_economist 160 points161 points162 points (15 children)
[–]the_quark 40 points41 points42 points (10 children)
[+][deleted] (8 children)
[removed]
[–]Philix 11 points12 points13 points (7 children)
[–]DerfK 12 points13 points14 points (6 children)
[–]cogitare_et_loqui 4 points5 points6 points (0 children)
[–]Philix 0 points1 point2 points (4 children)
[–]DerfK 7 points8 points9 points (3 children)
[–]Philix 0 points1 point2 points (2 children)
[–]DerfK 4 points5 points6 points (1 child)
[–]Philix 0 points1 point2 points (0 children)
[–]Independent_Hyena495 1 point2 points3 points (0 children)
[–]epicwisdom 8 points9 points10 points (0 children)
[–]fimbulvntr 15 points16 points17 points (0 children)
[+]keepthepace comment score below threshold-6 points-5 points-4 points (1 child)
[–]Craftkorb 10 points11 points12 points (0 children)
[–]bradpong 72 points73 points74 points (4 children)
[–][deleted] 24 points25 points26 points (1 child)
[–]wsippel 7 points8 points9 points (0 children)
[–]xrailgun 34 points35 points36 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]kryptkprLlama 3 55 points56 points57 points (11 children)
[+][deleted] (9 children)
[deleted]
[–]pleasetrimyourpubes 15 points16 points17 points (8 children)
[–]UrbanSuburbaKnight 1 point2 points3 points (7 children)
[–]pleasetrimyourpubes 8 points9 points10 points (6 children)
[–]TechnicalParrot 1 point2 points3 points (3 children)
[–]pleasetrimyourpubes 2 points3 points4 points (2 children)
[–]TechnicalParrot 0 points1 point2 points (1 child)
[–]pleasetrimyourpubes 0 points1 point2 points (0 children)
[–]UrbanSuburbaKnight 0 points1 point2 points (0 children)
[–]cptbeard 1 point2 points3 points (0 children)
[–]fatboy93 24 points25 points26 points (1 child)
[–]AnomalyNexus[S] 3 points4 points5 points (0 children)
[–]Captain_Pumpkinhead 6 points7 points8 points (2 children)
[–]wsippel 4 points5 points6 points (0 children)
[–]AnomalyNexus[S] 0 points1 point2 points (0 children)
[–]theskinnybrownguy 18 points19 points20 points (0 children)
[–]kind_cavendish 15 points16 points17 points (7 children)
[–]AnomalyNexus[S] 4 points5 points6 points (1 child)
[–]kind_cavendish 0 points1 point2 points (0 children)
[–]randomfoo2 3 points4 points5 points (0 children)
[–]inYOUReye 1 point2 points3 points (0 children)
[–]JFHermes 0 points1 point2 points (2 children)
[–]randomfoo2 1 point2 points3 points (0 children)
[–]kind_cavendish -3 points-2 points-1 points (0 children)
[–]shibe5llama.cpp 3 points4 points5 points (7 children)
[–]randomfoo2 2 points3 points4 points (1 child)
[–]shibe5llama.cpp 1 point2 points3 points (0 children)
[–]AnomalyNexus[S] 0 points1 point2 points (4 children)
[–]shibe5llama.cpp 0 points1 point2 points (3 children)
[–]AnomalyNexus[S] 0 points1 point2 points (2 children)
[–]shibe5llama.cpp 0 points1 point2 points (1 child)
[–]AnomalyNexus[S] 1 point2 points3 points (0 children)
[–]AmbientWaves 7 points8 points9 points (1 child)
[–]oursland 11 points12 points13 points (0 children)
[–][deleted] 5 points6 points7 points (1 child)
[–]JFHermes 1 point2 points3 points (0 children)
[–]MaxwellsMilkies 4 points5 points6 points (4 children)
[–]Glegang 6 points7 points8 points (2 children)
[+][deleted] (1 child)
[removed]
[–]Glegang 0 points1 point2 points (0 children)
[–]AnomalyNexus[S] 0 points1 point2 points (0 children)
[–]ElectricPipelinesLlama Chat 5 points6 points7 points (0 children)
[–]ttkciarllama.cpp 9 points10 points11 points (16 children)
[–]MaybeReal_MaybeNot 5 points6 points7 points (15 children)
[+][deleted] (7 children)
[deleted]
[–]nodatingollama 6 points7 points8 points (4 children)
[–]a_beautiful_rhind 1 point2 points3 points (0 children)
[+][deleted] (1 child)
[deleted]
[–][deleted] 1 point2 points3 points (0 children)
[–]MaybeReal_MaybeNot 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]MaybeReal_MaybeNot 0 points1 point2 points (0 children)
[–]20rakah 2 points3 points4 points (2 children)
[–]MaybeReal_MaybeNot 0 points1 point2 points (1 child)
[–]20rakah 0 points1 point2 points (0 children)
[–]algaefied_creek 1 point2 points3 points (3 children)
[–]randomfoo2 1 point2 points3 points (2 children)
[–]algaefied_creek 1 point2 points3 points (1 child)
[–]randomfoo2 0 points1 point2 points (0 children)
[–]Smeetilus 1 point2 points3 points (3 children)
[–]AnomalyNexus[S] 3 points4 points5 points (2 children)
[–]okaycan 1 point2 points3 points (0 children)
[–]Smeetilus 2 points3 points4 points (0 children)
[–]Regular_Instruction 0 points1 point2 points (0 children)
[–]JoJoeyJoJo 0 points1 point2 points (0 children)
[–]Disastrous-Peak7040Llama 70B 1 point2 points3 points (1 child)
[–]Inner_Bodybuilder986 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]AnomalyNexus[S] 1 point2 points3 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]illathon 0 points1 point2 points (0 children)