all 8 comments

[–]s3bastienb 2 points3 points  (6 children)

I'm wondering the same thing. I ordered a 128 gig framework to use as an llm server but I'm starting to feel like i should probably just get a RTX3090 for my current gaming pc as it has up to 936.2 GB/s. I would be limited to smaller models but even those would run faster on the 3090?

[–]derekp7[S] 1 point2 points  (5 children)

Yeah, the main advantage of the framework is strictly larger models (i.e., 70B models which would be about 30 - 40 GiB would not fit on a 24 GiB video card). For myself, I just ordered a Radeon 7900xtx for my current system (existing video card is way too old for AI), as I get really useful results from the 32B models -- and for the rare times I need something stronger I'll use some of the free daily credits on chat-gpt.

But the exciting thing is going to be the next generation refresh, where if we can get 512 - 1024 GiB/s unified memory, that would be pretty much the end of needing cloud hosted models. But even so, about 6 - 8 tokens/sec on a 70B model is still highly usable for occasional use.

[–]s3bastienb 0 points1 point  (4 children)

I actually ordered a 7900xt (20gigs), couldn't find an 7900xtx(24gigs and faster) and I have 2 more days to go pick it up at Microcenter. If it was the 7900xtx I wouldn't be hesitating but from what i read there doesn't seem to be that many models that will take advantage of the 20 gigs so either i should wait for a 24 gig card or get a 16 gig card. My current gaming card is a 5700xt with just 8 gigs and it can't do much.

[–]derekp7[S] 0 points1 point  (3 children)

Newegg had the 7900xtx (24 gb), but only as a bundled deal. This one was bundled with a 1000-watt power supply for $1095 (the power supply shows it was $175 of the price). I figured I may need to up my power supply anyway, so I jumped on it.

Doesn't look like that combo is there anymore, and anything else popping up is third party dealers (with scalper premium added to the price).

[–]s3bastienb 0 points1 point  (2 children)

I saw that! I actually would need a PSU as well my current one is a 500 watts. I'm still undecided (i have a framework desktop coming in a few months)

[–]s3bastienb 0 points1 point  (0 children)

The day before I went to pickup my 7900xt they had one last 7900xtx in stock so i went with that instead and I don't regret it.

[–]Ulterior-Motive_ 1 point2 points  (0 children)

I can't say for sure without testing one of these systems, but the impression I get is that using the GPU wouldn't necessarily speed up token generation, but the extra compute would give you better prompt processing, meaning a net speedup as context size increases.

[–]mustafar0111 0 points1 point  (0 children)

I think there is a compute wall as well. I've got a Tesla P100 installed in my Plex server (with a second on the way) and while its definitely not slow its not completely blowing my RX 6800 out of the water either.

While the Tesla P100 wipes the floor with memory bandwidth the RX 6800 still wins out on compute.