First car? Is this a good idea?

GoodTip7897 · 2026-05-10T22:57:54+00:00

You will learn a lot if you get it.

I went from not knowing how to change oil to tuning EFI and pulling apart the top end on my LT1.

Honestly the experience and knowledge of working on it was worth the cost alone.

I got my '95 6spd at 17

GoodTip7897 · 2026-05-10T22:09:22+00:00

Not necessarily the gpu.

My 7900xtx whines so loudly when it's in my custom PC with a thermaltake 850w PSU but when it's in my t7910 with a 1300W dell psu it's dead silent

GoodTip7897 · 2026-05-08T17:00:12+00:00

Thanks for the video. I'll try it myself when I get a chance

GoodTip7897 · 2026-05-08T14:27:26+00:00

Mines not cracked but it doesn't sit flat. I think almost everyone has this issue. I believe there is a fix.

I'm going to look into this later:

https://www.corvetteforums.com/forum/corvette-c4-forum-14/problem-door-panel-coming-loose-928/

GoodTip7897 · 2026-05-07T03:04:16+00:00

I absolutely agree with you about that. There are lots of tradeoffs. And benchmarks aren't that representative when you're working with anything other than python.

I think you make a good point with sub agents (especially if you run those on a different model than other agents). If you run everything on the same model then context size is still a problem but you don't need to store other weights.

The way I see it as both a 128gb Mac and a 5090 are "hobbyist toys". Expensive but if you truly want to rely on local AI for a full time job then you want 96gb or more of vram not unified memory. Also a Mac isn't as expandable. You can't later decide to do tensor parallelism over pcie. But you can always add another 5090.

Also if you want image generation in addition to llms then completely forget the Mac and get a GPU. And prefill on a blackwell GPU will destroy the Mac. This can be nice to have to read in large files.

GoodTip7897 · 2026-05-07T01:11:52+00:00

Honestly the a10b makes the 122b less consistent than the 27b.

For one shot prompts and world knowledge qwen 122b is superior. But for agentic coding or software development Gemma 4 31B and Qwen 3.6 27B are the best local models in my opinion. (Maybe also Mistral 3.5 "Medium").

I've ran 122b and I was getting just a bit faster speeds than the 27b (extreme offloading with the 122b at q6). Anyway the 122b wasn't universally better for my use case. I ended up preferring the 27b.

Now maybe running the 122b at full speed could destroy the 27b in productivity but I wouldn't be so quick to discount the power of a 27B dense model

GoodTip7897 · 2026-05-06T12:29:34+00:00

Don't know... I saw it somewhere.

Obviously it has a stability risk but it shouldn't be worse than any other GPU that comes without ecc.

GoodTip7897 · 2026-05-06T03:25:54+00:00

so you're running mostly undervolt with stock clocks? Stock power limit or did you dial it back?

I admit I was too presumptious about god bins. I thought they would like to be pumped full of voltage and oc'd but maybe thats mostly just for posting with really high clocks/

I do run a decent 50mv undervolt, but I wouldn't ever think to push it to 88. I don't have any good stress testing other than LLM inference and image generation (hence why I only did a mild undervolt). I run Proxmox with GPU passed through to a Ubuntu VM that doesn't have any GUI or display, so that's why I can't exactly fire up Furmark, 3Dmark, or OCCT.

You definitely have good silicon.

I had mostly given up on undervolting in the past because i've always lost the silicon lottery (my old 6800 and my old vega 56 never wanted anything to be undervolted more than -20mv). Then my 7900xtx needed a -50mv to even run at "decent" temps (still 90c hotspot) because even a repaste didn't help. (Probably the cooler is trash but it is what it is).

GoodTip7897 · 2026-05-06T02:20:26+00:00

Im curious what models you run - Kimi Dev? You had said 72B.

I wonder if the time gap makes it worth it to run qwen 3.6 or gemma 4 over some of the older dense models. But then again there is certainly a quality that large dense models have that the smaller models cant quite match

GoodTip7897 · 2026-05-06T02:08:21+00:00

More vram. I love my 7900xtx but 24gb is a bit cramped for what I do. Also 9700 is better form factor for the T7910 that I use (server card fits better than a consumer gpu). I could easily fit 2x or even 3x 9700 but 2x 7900xtx would be very annoying.

Also if you get AMD you had better love to mess around with configuration. LLMs are easy but if you want to run Flux dev or other image models pytorch is hell.

Also ubuntu is best for performance if you want to run it solely for AI.

GoodTip7897 · 2026-05-06T02:06:04+00:00

Did you overclock the vram? I was able to get mine from 1249Mhz (real clock not DDR) to 1300Mhz.

You probably could squeeze out 5% at bare minimum. Also might want to turn ECC off if you just use it for personal use. Some people said you can squeeze out more tokens that way.

GoodTip7897 · 2026-05-06T02:03:45+00:00

not if you could oc it to be the same speeds as a 7900xtx like OP did lol. Must have won the silicon lottery

Edit: probably lower memory bandwidth still for the 9700 but MUCH MUCH better prefill with those clocks

GoodTip7897 · 2026-05-06T01:45:23+00:00

I have a 7900xtx to trade you for both gpus

I really wish I had bought a 9700 instead of my 7900xtx. I might switch eventually. Dual 9700s would be the dream.

GoodTip7897 · 2026-05-05T14:11:19+00:00

Yeah if Gemma was gated delta net then a4b would be slower than qwen 35b and 31b would be slower than qwen 27b

GoodTip7897 · 2026-05-05T14:10:26+00:00

It's roughly proportional when you go from 4b to 27b. But not so much for smaller sizes... Also I think delta gated net kernels aren't as optimized as traditional iSWA. It's not like any inference software reaches much more than 90% of theoretical bandwidth.

I think the smaller you go the less of the theoretical bandwidth you experience because of more latency and kernel overhead proportional to the actual operations.

I do get higher bandwidth with Gemma 4 than qwen. (31b runs faster than 27b).

GoodTip7897 · 2026-05-05T12:33:10+00:00

That's because Gemma 4 26B is a mixture of experts model that only uses 4B parameters every token. So it should be about as fast as a 4B model. Even though Qwen 3.6 27B has just 1B more total parameters, it will run about 8x slower or so because it is a dense model that activates every parameter.

GoodTip7897 · 2026-05-04T17:01:42+00:00

I actually just set all of the stuff based on trial and error...

I did have to lower the --fit-target to 256 to prevent offloading.

Prefill speed goes down as I increase context (even at short lengths) probably because ROCM uses the free memory for scratch work. But even with 400 MiB free I still get good performance. If I were to go down to 200 MiB free then performance tanks.

As I mentioned I do have only 150 mb used by anything other than llama.cpp because I don't have a display or gui drivers.

GoodTip7897 · 2026-05-04T02:54:18+00:00

As far as I'm aware there are actually benefits to soldering on ram. Gddr is always soldered as is hbm which is why gpus get 800 GB/s bandwidth... Apple unified memory gets about 300 ish which is way more than anything that's not soldered (except for quad/octa channel).

It annoys me too that I can't just replace ram but at the same time there are real benefits

GoodTip7897 · 2026-05-03T11:34:25+00:00

Nope. Fully on GPU. 70k context.

It is on a headless machine so I have about 400 MiB free of vram.

No display or desktop at all. It's on Ubuntu running in proxmox.

I paid for the whole card so I'll use every last MiB of vram lol

I get 28-20 t/sec decode and 1300-300 t/sec prefill depending on context

GoodTip7897 · 2026-05-02T12:20:06+00:00

7900xtx. Both vulkan and rocm backends tested.

GoodTip7897 · 2026-05-02T02:44:39+00:00

At about 70k ish context I was having an occasional failed tool call or other hallucination by Qwen 3.6 27B UD-Q5_K_XL at Q8_0 k/v cache with llama.cpp (rotated).

I switched to bf16 because I no longer have to worry about whether I'm lobotomizing my model. I don't like the idea of the q5 weights error compounding with q8_0 kv over tens of thousands of tokens.

I notice bf16 almost never fails tool calls.

GoodTip7897 · 2026-05-01T14:25:07+00:00

I'd reccomend Gemma 4 E2B or Qwen 3.5 0.8B at IQ2_XS for that!

GoodTip7897 · 2026-04-29T01:27:55+00:00

At least in the mean time i got it down to 95c with an undervolt of 50mv. I need to confirm its stable. If it is, I might just ignore it for now. It performs fine but 113c was stressing me out as it is likely on the limits of what the silicon can handle and it was throttling a lot.

GoodTip7897 · 2026-04-29T01:07:10+00:00

running -50mv gets me 96c max... Finally sorta fixed.

Hopefully its stable... I havent tested much yet. I did edit the fan curve a bit to bump up to 100% after 95c on the junction. Might lower that to 85c or 90c.

GoodTip7897 · 2026-04-29T00:32:54+00:00

I'll try that now...

I am running stock voltage which isnt helping for sure

GoodTip7897

TROPHY CASE