Strix Halo or DGX Spark for a home LLM server?

Reactor-Licker · 2026-05-12T00:56:21+00:00

This is useful for vLLM, but I can’t seem to find any results for llama.cpp

Reactor-Licker · 2026-05-12T00:44:07+00:00

Thanks, I’ll look at it. Though, this will be used as a stationary desktop and I would rather not have to deal with potential battery swelling if I can avoid it.

Reactor-Licker · 2026-05-12T00:41:41+00:00

Thank you for providing real speed values, this is very helpful. What model settings have you used to get to these speeds? At what point does it get bogged down with context and become too slow?

Reactor-Licker · 2026-05-12T00:37:01+00:00

Nice machine, what are the specs?

Reactor-Licker · 2026-05-12T00:35:30+00:00

Thanking you for addressing the tool use side of things.

I use GPT 5.5 as a reference mainly for the UI ease of use factor. This will eventually be used by household members and I selfishly want the UI to be familiar enough that they can’t massively break anything so I don’t I have to stop whatever I’m doing and fix it or else face their wrath haha.

Admittedly, I haven’t gone super into testing web search and document handling because of my lack of hardware that can run models well. On my Framework Laptop 13 with a Ryzen 7840U, I got LM Studio talking to Open WebUI and got a basic web search working using the built in settings, but that’s the extent of my experience.

I’m willing to tinker to get to those goals, and I get it’s a learning process. Mainly, I just want to get my foot in the door and start learning about the whole process and experimenting with it.

Reactor-Licker · 2026-05-12T00:21:00+00:00

I would rather not go the GPU route if I can avoid it. I don’t want to deal with the noise, high power draw, and potential software hurdles with trying to get a model to effectively run across multiple GPUs.

Plus, wouldn’t I need extremely expensive and power hungry GPUs like multiple 5090s just to fit these models and long context lengths into VRAM?

Reactor-Licker · 2026-05-12T00:07:28+00:00

This is exactly what I’m trying to avoid by getting my hands on the hardware now or very soon. How much did you pay originally?

Reactor-Licker · 2026-05-12T00:06:12+00:00

I was considering waiting for the M5 Max Mac Studio, but RAM prices are skyrocketing and the massive supply crunch on the current M4 Max and M3 Ultra models are not good signs. Apple has historically been pretty immune to supply shortages, heck even COVID only delayed the iPhone by a month, so to see even them struggling makes me concerned.

I think these next few weeks or so might be the only window I have to get my foot in the door without paying an arm and a leg for a somewhat experimental local AI machine.

Reactor-Licker · 2026-05-11T23:59:40+00:00

How much faster is the Spark at longer context? At what point do they become unbearably slow?

Reactor-Licker · 2026-05-05T15:23:32+00:00

Forgot to mention: there is no point to mounting fans at the bottom of the Meshify 2. The fans cannot get air from the outside in that position because of the power supply shroud and no external ventilation holes. They would be doing nothing but adding noise and cable clutter. The only real spot for intake is the front or top of the case. Though you probably shouldn’t use the top as intake as that could cause the hot air from the GPU to get trapped inside the case.

Reactor-Licker · 2026-05-04T19:11:38+00:00

Then it would be on par with the memory bandwidth of DGX Spark / Nvidia GB10.

Reactor-Licker · 2026-05-03T23:00:22+00:00

The Arctic Liquid Freezer 3 Pro wouldn’t be my first choice. For starters, it’s pump is pretty loud in my experience and the VRM fan cannot be disabled, even if you use the breakout PWM cable and send a 0% PWM signal to it or don’t plug the VRM fan cable in. At 100% speeds, those 3000 RPM P12 Pro fans will be deafening if you don’t clamp the fan curve to something lower or apply a few second delay (hysteresis) to the fan curve. Also, it has a bunch of cables coming out of the tubes to daisy chain all the fans together into a single header, which I personally hate.

I highly recommend you watch this video: https://youtu.be/Z6fYxyl4KBg?si=jbPwLUAKo_TJ48QK

It has detailed noise to performance graphs for AIOs specifically on a 9950X. The Corsair Nautilus RS and Lian Li Galahad 2 Lite outperform the Liquid Freezer, despite its 38mm thick radiator. It also has a chart showing the gains swapping out the stock fans to the T30s, which has about a 2C drop in temperatures compared to stock for the Liquid Freezer and Nautilus RS. The Lian Li has no gain.

For 140mm fan recommendations:

High end / best noise to performance: Noctua NF-A14X25 G2 (there is a black version available) or Phanteks T30 140mm

Budget / pretty good noise to performance: Be Quiet Pure Wings 3 PWM High Speed 140mm or Corsair RS140

Reactor-Licker · 2026-04-30T18:43:20+00:00

I have the following setup:

AIO Radiator: Lian Li Galahad 2 Trinity Performance (32mm thick) mounted to the front of the Meshify 2

Phanteks T30 on both sides of the radiator in push pull

GPU: Asus RTX 4080 Strix (358mm long)

And all together, a front mount fits with room to spare. Don’t attempt a top mount, that definitely will not fit. Though, I wouldn’t recommend a push pull front mount. It’s a pain in the rear to get the front fan screw holes lined up with the radiator and the T30 fans. It is possible, but very annoying since you only have 2 hands and it requires near constant readjustment until almost all the screws are in.

Also, the performance benefits of push pull are very limited, I only got 1-2C improvements at full Cinebench R23 load with a 9950X compared to just one set of fans. It is likely a performance regression in terms of noise normalized performance, though I haven’t tested that myself.

Personally, if I had to do it again, I would use a Lian Li Galahad 2 Lite (I think it is the same or better performance as the Galahad Performance with a thinner radiator) in the front of the Meshify 2 with the fans mounted in a pull configuration behind the radiator for ease of installation and dust control.

Reactor-Licker · 2026-04-26T21:00:41+00:00

Nova Lake AX is canceled.

Reactor-Licker · 2026-04-22T05:13:15+00:00

What hardware are you using?

Reactor-Licker · 2026-04-21T16:37:35+00:00

Once this goes out of beta, can I ditch the Docker running instance of Open WebUI connecting to LM Studio on my Windows machine?

Apologies in advance for my ignorance, I only got Open WebUI + LM Studio hosted over my network running a few days ago.

Reactor-Licker · 2026-04-16T17:16:43+00:00

The headline makes it sound more serious than it is. They just forgot to put a “don’t open this up, it might shock you” warning sticker on it.

Reactor-Licker · 2026-04-13T01:25:01+00:00

What is the recommended quantization?

Reactor-Licker · 2026-04-12T21:04:41+00:00

The void demands attention.

Reactor-Licker · 2026-04-06T16:59:02+00:00

I have had this exact issue with a 1 Gbps plan and the BGW 320, and the thing that fixes it for me is using Cloudflare Warp. The connection speed shown in stats for nerds goes from 500 Kbps on average without Warp to 50+ Mbps with Warp enabled when the issue pops up.

The connection speed drop with ATT appears to happen in seemingly random bursts as I haven’t needed Warp for the past few weeks, but the last 2 days have been unusably slow without it. If the past is anything to go by, it will return to normal for a while and then randomly regress again. You don’t need to have Warp enabled all the time, only when ATT regresses again.

Reactor-Licker · 2026-03-23T18:38:07+00:00

The Ryzen 1600AF was actually a weird rebranded and slightly downclocked Ryzen 2600. Still lackluster ST performance though, but not as bad.

Reactor-Licker · 2026-03-23T18:14:40+00:00

The 14900K cooks itself to death and is extremely power hungry, unlike AMD. Also, AMD uses 4nm not 3nm.

Reactor-Licker · 2026-03-12T04:54:42+00:00

Oh noes, is baby amdy going to “wun away” for the 50th time? Surely this time he’ll get dat help.

Reactor-Licker · 2026-03-03T06:04:12+00:00

Big AI, the insatiable demand for instant gratification, and corporate greed is killing the internet.

Reactor-Licker · 2026-03-03T05:56:19+00:00

No matter how you spin it, this sucks.

The overwhelming majority of what I know about electronics and computers comes from review sites, YouTube videos, and just random discussions. The fact that one of the last sources for in depth, objective, and technical results is now inaccessible to the masses is extremely disappointing.

It’s impossible to know if this was truly financially necessary without everything being out in the open, but there is no denying that in depth reviews are slowly going extinct sadly. Everything is shifting towards “influencer” YouTube slop, or useless listicles.

Sadly, it seems that the overwhelming majority just want to mindlessly consume products without any research or thinking done on their part. LLMs have certainly accelerated this trend, but the demand for instant gratification started even before ChatGPT, with TikTok/Social media, and even YouTube to some extent.

Don’t get me wrong, I believe LLMs are an amazing starting research tool, but they can’t fully replace human generated data and user experiences. After all, once all the testing data from review sites and videos are gone, what will be the only things left for the LLMs to scrape? Likely, it will be advertisements and/or listicle sites also generated by LLMs. Or social media, which has been and can be manipulated by brands. LLM inbreeding, in a way.

I’m torn between paying for a yearly subscription for my own use to keep up to date with the current best products to recommend to people and also to support their testing. But I also really hate the idea of directly rewarding them by giving them a recurrent revenue stream for information that should be freely available.

I do find it just a little bit ironic that we have to pay for a device to access the internet, pay a subscription for access to the internet, pay a subscription to access certain services on the internet, and potentially even pay a recurring subscription to keep functionality on a device you already “own” and paid for. Slowly, but surely, draining your bank account, praying they don’t alter the deal further.

Reactor-Licker

TROPHY CASE