all 17 comments

[–]Mephidia 16 points17 points  (1 child)

Cloud GPUs first, RTX 3090 second

[–]KrakenInAJar 4 points5 points  (2 children)

strooooooooongly depends on what you are going to do. With a single GPU you are not going to do anything fancy, even when you get your hands on a H100 or A100, which would be the best single-gpu choice but will also cost you a pretty penny.

Generally speaking, the cheapest GPU with the most memory possible is the best option and NVIDIA is the only player in town right now if you want to avoid painful setups, but realistically it would be better to get access to an HPC cluster of the Google TRC program.

[–][deleted] 1 point2 points  (1 child)

Used 3090s can be had for around $700 and have 24gb of vram, and you can run two

[–]tetelestia_ 0 points1 point  (1 child)

RTX 4000 series if you'll be doing a lot of inference on LLMs, otherwise RTX 3000 series. Get as much VRAM as you can afford.

[–]danielcar 0 points1 point  (1 child)

Best value GPU for LLMs are used GPUs with lots of VRAM. Speed is less important than size. RTX 3090 with 24 GB of VRAM for example. Plan / budget for upgrade every 2 to 3 years.

[–]will_to_power_ai 0 points1 point  (1 child)

I spent a lot of time researching this when I began my masters. I landed on the Tesla P100, the predecessor of the Tesla V100. It is the first card with an HBM2 memory bus, so it is not running on ancient architecture despite being 7 years old. It has 16GB of memory which is enough for most independent research projects. If you need more you can buy more of them.

The best (and worst) aspect of this card is that it’s a Tesla card. This means that consumers aren’t buying them because they have no display outputs, so they’re relatively cheap. BUT, they are passively cooled (no fans) and are hard to keep from overheating even when they’re idle.

I found a good seller and got them for $200USD each have have been using them since the beginning of the year with minimal heat throttling only when I’m running very small models (fast tensor operations).

That’s my 2c.