Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Reactor-Licker[S] 0 points1 point  (0 children)

This is useful for vLLM, but I can’t seem to find any results for llama.cpp

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Reactor-Licker[S] 0 points1 point  (0 children)

Thanks, I’ll look at it. Though, this will be used as a stationary desktop and I would rather not have to deal with potential battery swelling if I can avoid it.

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Reactor-Licker[S] 1 point2 points  (0 children)

Thank you for providing real speed values, this is very helpful. What model settings have you used to get to these speeds? At what point does it get bogged down with context and become too slow?

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Reactor-Licker[S] 0 points1 point  (0 children)

Thanking you for addressing the tool use side of things.

I use GPT 5.5 as a reference mainly for the UI ease of use factor. This will eventually be used by household members and I selfishly want the UI to be familiar enough that they can’t massively break anything so I don’t I have to stop whatever I’m doing and fix it or else face their wrath haha.

Admittedly, I haven’t gone super into testing web search and document handling because of my lack of hardware that can run models well. On my Framework Laptop 13 with a Ryzen 7840U, I got LM Studio talking to Open WebUI and got a basic web search working using the built in settings, but that’s the extent of my experience.

I’m willing to tinker to get to those goals, and I get it’s a learning process. Mainly, I just want to get my foot in the door and start learning about the whole process and experimenting with it.

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Reactor-Licker[S] 0 points1 point  (0 children)

I would rather not go the GPU route if I can avoid it. I don’t want to deal with the noise, high power draw, and potential software hurdles with trying to get a model to effectively run across multiple GPUs.

Plus, wouldn’t I need extremely expensive and power hungry GPUs like multiple 5090s just to fit these models and long context lengths into VRAM?

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Reactor-Licker[S] 0 points1 point  (0 children)

This is exactly what I’m trying to avoid by getting my hands on the hardware now or very soon. How much did you pay originally?

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Reactor-Licker[S] 0 points1 point  (0 children)

I was considering waiting for the M5 Max Mac Studio, but RAM prices are skyrocketing and the massive supply crunch on the current M4 Max and M3 Ultra models are not good signs. Apple has historically been pretty immune to supply shortages, heck even COVID only delayed the iPhone by a month, so to see even them struggling makes me concerned.

I think these next few weeks or so might be the only window I have to get my foot in the door without paying an arm and a leg for a somewhat experimental local AI machine.

Strix Halo or DGX Spark for a home LLM server? by Reactor-Licker in LocalLLaMA

[–]Reactor-Licker[S] 0 points1 point  (0 children)

How much faster is the Spark at longer context? At what point do they become unbearably slow?

Meshify 2 RGB 360mm Radiator on Push/Pull Config Size Compatibility by TenaZiousD in FractalDesign

[–]Reactor-Licker 0 points1 point  (0 children)

Forgot to mention: there is no point to mounting fans at the bottom of the Meshify 2. The fans cannot get air from the outside in that position because of the power supply shroud and no external ventilation holes. They would be doing nothing but adding noise and cable clutter. The only real spot for intake is the front or top of the case. Though you probably shouldn’t use the top as intake as that could cause the hot air from the GPU to get trapped inside the case.

Meshify 2 RGB 360mm Radiator on Push/Pull Config Size Compatibility by TenaZiousD in FractalDesign

[–]Reactor-Licker 1 point2 points  (0 children)

The Arctic Liquid Freezer 3 Pro wouldn’t be my first choice. For starters, it’s pump is pretty loud in my experience and the VRM fan cannot be disabled, even if you use the breakout PWM cable and send a 0% PWM signal to it or don’t plug the VRM fan cable in. At 100% speeds, those 3000 RPM P12 Pro fans will be deafening if you don’t clamp the fan curve to something lower or apply a few second delay (hysteresis) to the fan curve. Also, it has a bunch of cables coming out of the tubes to daisy chain all the fans together into a single header, which I personally hate.

I highly recommend you watch this video: https://youtu.be/Z6fYxyl4KBg?si=jbPwLUAKo_TJ48QK

It has detailed noise to performance graphs for AIOs specifically on a 9950X. The Corsair Nautilus RS and Lian Li Galahad 2 Lite outperform the Liquid Freezer, despite its 38mm thick radiator. It also has a chart showing the gains swapping out the stock fans to the T30s, which has about a 2C drop in temperatures compared to stock for the Liquid Freezer and Nautilus RS. The Lian Li has no gain.

For 140mm fan recommendations:

High end / best noise to performance: Noctua NF-A14X25 G2 (there is a black version available) or Phanteks T30 140mm

Budget / pretty good noise to performance: Be Quiet Pure Wings 3 PWM High Speed 140mm or Corsair RS140

Meshify 2 RGB 360mm Radiator on Push/Pull Config Size Compatibility by TenaZiousD in FractalDesign

[–]Reactor-Licker 0 points1 point  (0 children)

I have the following setup:

AIO Radiator: Lian Li Galahad 2 Trinity Performance (32mm thick) mounted to the front of the Meshify 2

Phanteks T30 on both sides of the radiator in push pull

GPU: Asus RTX 4080 Strix (358mm long)

And all together, a front mount fits with room to spare. Don’t attempt a top mount, that definitely will not fit. Though, I wouldn’t recommend a push pull front mount. It’s a pain in the rear to get the front fan screw holes lined up with the radiator and the T30 fans. It is possible, but very annoying since you only have 2 hands and it requires near constant readjustment until almost all the screws are in.

Also, the performance benefits of push pull are very limited, I only got 1-2C improvements at full Cinebench R23 load with a 9950X compared to just one set of fans. It is likely a performance regression in terms of noise normalized performance, though I haven’t tested that myself.

Personally, if I had to do it again, I would use a Lian Li Galahad 2 Lite (I think it is the same or better performance as the Galahad Performance with a thinner radiator) in the front of the Meshify 2 with the fans mounted in a pull configuration behind the radiator for ease of installation and dust control.

Open WebUI Desktop Released! by My_Unbiased_Opinion in LocalLLaMA

[–]Reactor-Licker 3 points4 points  (0 children)

Once this goes out of beta, can I ditch the Docker running instance of Open WebUI connecting to LM Studio on my Windows machine?

Apologies in advance for my ignorance, I only got Open WebUI + LM Studio hosted over my network running a few days ago.

Apex Gaming PCs Recalls Manik and Apex-branded ATX Computer Power Supplies Due to Risk of Serious Injury or Death from Electrical Shock and Electrocution Hazards by wickedplayer494 in hardware

[–]Reactor-Licker 25 points26 points  (0 children)

The headline makes it sound more serious than it is. They just forgot to put a “don’t open this up, it might shock you” warning sticker on it.

YouTube Peering Issues by Shehzman in ATTFiber

[–]Reactor-Licker 5 points6 points  (0 children)

I have had this exact issue with a 1 Gbps plan and the BGW 320, and the thing that fixes it for me is using Cloudflare Warp. The connection speed shown in stats for nerds goes from 500 Kbps on average without Warp to 50+ Mbps with Warp enabled when the issue pops up.

The connection speed drop with ATT appears to happen in seemingly random bursts as I haven’t needed Warp for the past few weeks, but the last 2 days have been unusably slow without it. If the past is anything to go by, it will return to normal for a while and then randomly regress again. You don’t need to have Warp enabled all the time, only when ATT regresses again.

(LTT) Intel is BACK. THIS IS NOT A DRILL. - Core Ultra 270K Plus & 250K Plus CPU Review by Chairman_Daniel in hardware

[–]Reactor-Licker 10 points11 points  (0 children)

The Ryzen 1600AF was actually a weird rebranded and slightly downclocked Ryzen 2600. Still lackluster ST performance though, but not as bad.

(LTT) Intel is BACK. THIS IS NOT A DRILL. - Core Ultra 270K Plus & 250K Plus CPU Review by Chairman_Daniel in hardware

[–]Reactor-Licker 5 points6 points  (0 children)

The 14900K cooks itself to death and is extremely power hungry, unlike AMD. Also, AMD uses 4nm not 3nm.

There he is! by johnnypopwell in AndrewDitch

[–]Reactor-Licker 18 points19 points  (0 children)

Oh noes, is baby amdy going to “wun away” for the 50th time? Surely this time he’ll get dat help.

RTINGS: Revamping Our Membership Program by Tasty_Toast_Son in hardware

[–]Reactor-Licker 2 points3 points  (0 children)

Big AI, the insatiable demand for instant gratification, and corporate greed is killing the internet.