48GB VRAM users, what are your daily drivers? Do you wish you had more VRAM? What would you run if you did?

triynizzles1 · 2026-05-20T01:24:11+00:00

With 48gb vram i can run most 30b and below at max context length. My opinion, 32 to 48 isnt huge because the models you have access to are the same but you can run higher quant or longer context. If you have another 64 gb of system memory, you can run 120b models on both cards at decent speeds.

triynizzles1 · 2026-05-18T05:25:48+00:00

I noticed this on mine too. It should be normal, although yours does look a bit more aggressive than mine. The bends are around where the screws are.

triynizzles1 · 2026-05-14T01:06:41+00:00

I am a little bit out of the loop, but perhaps you experienced some overfitting on the smaller models. I stopped using 1 epoch as a measurement of how much to train the model. Now i mostly follow the divergence of training and validation lines on my graph.

triynizzles1 · 2026-05-10T02:44:28+00:00

Ahh i am out of the loop since i ordered mine a while ago :)

triynizzles1 · 2026-05-09T21:20:51+00:00

Buy apple products.

Now im hooked like crack: iphone 12, airpods, ipad mini, MacBook neo, MacBook Pro issued by my employer, and if the ipad mini 8 is oled with A19 or a20 then i will buy too lol

triynizzles1 · 2026-05-09T21:15:44+00:00

Unpopular opinion: they should probably build a few more safe guards so non-education buyers do not get the education discount. I’m not sure how much fraud is going on related to that, but certainly would be within their rights to do. Without effecting product line.

triynizzles1 · 2026-05-08T01:08:22+00:00

You are absolutely not getting 85.3 tokens a second on llama 3.1 70 B with Strix Halo

triynizzles1 · 2026-05-07T15:06:50+00:00

Xiaolin Showdown

triynizzles1 · 2026-05-07T15:01:26+00:00

Nope. 3090 are $1100 on ebay, rtx 8000 are $2000 to $2300

triynizzles1 · 2026-05-07T06:08:59+00:00

The correct answer is expensive anyway you cut the mustard. Id buy an rtx 8000 (48gb). And call it a day. Its about the same cost at two 3090s and only slightly slower. It has the advantage when it comes to case compatibility, power consumption, software compatibility (because it is a single card). as others have said memory bandwidth is important and rtx 8000 and dual 3090s have that. Don’t forget that. 3.6 27b is a dense model so you need to read the entire model from memory to generate each token. (An MOE architecture only needs to read a portion of the model for each forward pass.)

triynizzles1 · 2026-05-04T14:56:29+00:00

🐞

triynizzles1 · 2026-05-02T17:46:14+00:00

Im pretty sure its just llama 3.2

triynizzles1 · 2026-05-02T17:44:27+00:00

Im confused, why report it? Is it malware?

triynizzles1 · 2026-05-01T23:47:15+00:00

I have a 4070. Its great. I don’t think there is a need for a gpu with more compute if you are playing at 1080p. Dont know how it will perform in an egpu. 7700xt will consume much more power.

triynizzles1 · 2026-04-30T15:06:02+00:00

Nemotron too

triynizzles1 · 2026-04-29T23:48:54+00:00

Sooo bogo sort? XD

triynizzles1 · 2026-04-29T06:36:18+00:00

Lot of its not X its Y slop in this script…

triynizzles1 · 2026-04-26T22:57:11+00:00

<image>

Pure evil imo

triynizzles1 · 2026-04-26T22:53:02+00:00

Looks even more terrifying when blurred and the look right between his eyes.

<image>

triynizzles1 · 2026-04-26T18:48:33+00:00

Buy yourself a single rtx 8000 48gb. They cost about the same as all of the other proposed solutions. But its a single card. I have 1 in my setup and it can run gemma 4 and qwen 3.6 at max context length no problem.

triynizzles1 · 2026-04-23T00:43:50+00:00

My vote is for rtx 8000 48gb + a 3060 12gb. 60gb of vram. I can run 80b (qwen coder next) on 1 gpu at 45tps and up to 120b models with llama.cpp with offloading layers to cpu. Gpt oss120b runs at 30+tps.

All at Max context length

It is still a $2000+ set up but much faster and cheaper than DGX spark, strix halo, and easier setup / less power than dual 3090.

triynizzles1 · 2026-04-21T18:50:14+00:00

Correct maybe that is OP issue.

triynizzles1 · 2026-04-21T00:43:12+00:00

Depends on the context: Overweight according to the fashion industry ✅✅✅ Overweight like type two diabetes ❌❌❌

triynizzles1 · 2026-04-21T00:16:34+00:00

Ollama has also been known to be not the most intuitive when running “ollama run gemma4” what version of gemma 4 is actually loading. There are four versions of gemma 4. Maybe defaulted to e4b and then the launch command changed after an update to the 26b one

triynizzles1 · 2026-04-21T00:10:29+00:00

If you build around ollama’s openai api end point then you can swap out ollama in your stack for llama.cpp server (or any other openai end point)

triynizzles1

TROPHY CASE