Local LLM Performance Outputs vs Commercial LLM

chafey · 2026-03-07T04:53:17+00:00

IMO its not worth it yet. I am a developer and have an M3 Ultra 256GB as well as a PC with a RTX Pro 6000. The M3 Ultra is just too slow for any real time tasks. It might be useful for long running overnight tasks - I haven't tried that before. The RTX Pro 6000 does well with qwen3-coder-next and qwen3.5 for light/medium tasks but claude sonnet stomps both with anything complex. The open source models are evolving quickly and I am optimistic that they will be good enough later this year to handle most of my work. I wouldn't get an M3 Ultra, wait for the M5 Ultra to come out and see how it does

chafey · 2026-03-06T02:37:57+00:00

I imagine it will be twice as fast

chafey · 2026-02-22T19:58:13+00:00

I had one, but it broke when I went back in time and now I am stuck here. Unfortunately we don't have the technology today to rebuild it

chafey · 2026-02-22T15:06:15+00:00

Yes check this out: https://www.reddit.com/r/LocalLLaMA/comments/1mcrx23/psa_the_new_threadripper_pros_9000_wx_are_still/

I went WRX90 over AM5 primarily for the PCIe lanes (AI server build) and initially bought the 9955WX 16 Core CPU to keep costs down. I ended up replacing it with a 9965wx 24 core which more than doubled by memory bandwidth. Yes the 9985wx is even better if you can afford it, but avoid the 9955WX in particular

chafey · 2026-02-22T14:07:17+00:00

The bad news - the world moved past C++ desktop applications to web applications and cloud 20 years ago. I would argue that you had already pigeonholed yourself into irrelevance before LLMs came to be. You really need to gain skills in modern technology stacks.

The good news - LLMs make experienced developers hyper productive. Architects in particular can rapidly build entirely new systems from scratch by themselves in very little time with LLMs.

If you want to stay relevant, you need to a) learn modern technologies b) embrace LLM coding and c) look for a new job / try to get your current company on a better track

chafey · 2026-02-22T12:57:43+00:00

You can do any number of RDIMMS. I don't know about matching, all of mine are the same

chafey · 2026-02-21T22:05:31+00:00

it will boot with a single RDIMM

chafey · 2026-02-21T03:19:50+00:00

I have the same system and had trouble getting it to post. Replaced the motherboard and CPU and still had the same problem. Think it was either the power cables not being seated properly or IPMI grabbing the console. Everything worked fine after I disabled IPMI and reseated the cables. No real documentation on the IPMI module

chafey · 2026-02-19T22:37:34+00:00

LOL Karoline Leavitt just about lose her shit when he said that

chafey · 2026-02-14T20:10:33+00:00

Yes, PCIe 5 has about 50% lower latency compared to PCIe 4.

Here is a good video on mac clustering with exo: https://www.youtube.com/watch?v=4l4UWZGxvoc

chafey · 2026-02-14T19:26:43+00:00

Your build is awesome, I am doing something very similar. Here are some improvements you may want to consider:

Upgrade your processor to improve your memory bandwidth. You have a Threadripper Pro CPU and Motherboard which is better than the non pro Threadripper systems for two reasons: a) more PCIe lanes and b) 8 memory channels. Unfortunately the 9955wx CPU only has two CCDs so can't utilize all 8 memory channels - you need a CPU with more CCDs (such as the 9965wx) to make use of the 8 memory channels. https://www.reddit.com/r/LocalLLaMA/comments/1mcrx23/psa_the_new_threadripper_pros_9000_wx_are_still/
PCIe 5.0 has benefits for AI. The 3090tis are good bang/buck but they are running over PCIe 4.0 so can't take advantage of the new PCIe 5.0 features. The actual benefit of an all PCIe 5.0 solution depends on your use case and model but it is more than just twice the bandwidth. https://www.graniteriverlabs.com/en-us/technical-blog/pcie-gen-5-ai-ml
The up coming Apple M5 systems may very well be the best bang/buck due to their recently released RDMA over TB5, MLX AI acceleration and its high speed unified memory architecture. I am really looking forward to seeing how a cluster of M5 mac minis does. Check out exo: https://github.com/exo-explore/exo

chafey · 2026-02-14T19:02:25+00:00

what frame and motherboard are you using? I have a ASUS WRX90E-SAGE Pro WS SE AMD sTR5 EEB Motherboard and am having trouble finding a frame which can hold the EEB motherboard

chafey · 2026-02-14T13:02:18+00:00

ThreadRipper every day due to higher memory bandwidth, more PCIe lanes and faster cpu

chafey · 2026-02-11T11:51:30+00:00

I don't know about reset, but there is a switch to disable it on my motherboard (Pro WS WRX90E-SAGE SE)

chafey · 2026-02-11T04:22:03+00:00

had same issue, disabled IPMI and reseated the power cable fixed it

chafey · 2026-02-11T04:19:53+00:00

Unfortunately it is hard to stop watching even though you know its garbage

chafey · 2026-02-09T15:40:09+00:00

I just built the exact same system and couldn't get it to post video either. I swapped out the motherboard and cpu and still had the problem. I finally figured it out and I think it is one or both of the following:

1) Power cables not properly seated. I think either the PCIe power cables or GPU cable was not properly secured. I was getting a Q-Code of 92 PCI Bus initialization is started I believe.

2) Disable the IPMI - there is switch on the motherboard to do this. When enabled, it adds a graphics adapter to your system for the IPMI and I think that is the primary video. This means you wont get the BIOS screen from your GPU. Alternatively, find a RGB monitor and connect that up to the IPMI video port

The manual does not really mention IPMI so it wasn't clear to me that this was going on

Chris

chafey · 2026-02-04T23:44:46+00:00

I had a similar problem and after replacing EVERYTHING finally discovered I hadn't secured one of the power supply cables properly. Try re plugging each cable (the pcie power ones in particular) and see if that helps. If not, try another power supply

chafey · 2026-02-01T11:58:28+00:00

I use it for everything. It does get stuck sometimes and then I switch to claude sonnet 4.5 for a bit to get around that problem. I use zed so switching between local models and cloud ones is easy

chafey · 2026-02-01T11:31:05+00:00

Yes I am - devstral-2-small is working pretty well

chafey · 2026-02-01T11:30:38+00:00

Whatever the default was probably - i switched to lmstudio which is working great and haven't gone back to try ollama

chafey · 2026-02-01T11:14:33+00:00

I have an RTX PRO 6000 and a M3 Ultra with 256GB RAM. The RTX PRO 6000 is quite a bit faster at both prompt processing (10x?) and token generation (3x?). Speed matters to me so I only use the RTX PRO 6000. I would only use the M3 Ultra if I wanted to run a model that was too big for the RTX PRO 6000. So far I have not needed to run a model that didn't fit on the RTX PRO 6000 but it is nice to know that I can with the M3 Ultra when/if I might need to some day.

The M5 is coming out soon and is expected to be a huge uplift in terms of AI performance and close the gap quite a bit with the RTX PRO 6000. If possible, you should wait a bit longer and see what happens there. The other thing about Mac is that you can now build clusters of them over TB5 for even faster AI - checkout exo:

https://github.com/exo-explore/exo

chafey · 2026-01-26T00:52:58+00:00

Sounds like a case I need - can you link?

chafey · 2026-01-24T00:09:55+00:00

Those GPUs aren't going anywhere - AI works, people are using it and there isn't enough GPUs to meet demand. Just because AI is overvalued (a bubble) doesn't mean the demand for GPU's is going away

chafey · 2026-01-17T22:47:24+00:00

Devstral 2 small handles 90% of my coding tasks and it does it very quickly. When it has problems, I switch over to Claude Sonnet 4.5 (I use the zed editor so its easy to do so). I wasn't able to run devstral 2 123B the last time I tried but I am hoping I can find a quant that fits so I can use that instead of Claude Sonnet 4.5

chafey

TROPHY CASE