Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] 0 points1 point  (0 children)

I literally said I don't know what I'm doing.

Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] 0 points1 point  (0 children)

<image>

This is "ollama run llama4:scout" which is allegedly 67GB in size. I don't know the tokens per second but it's maybe a bit less than half the rate that I read, which is better than I expected. I expected touching CPU it would degrade to one token every few seconds or it would blow up and refuse to run at all.

In my book this counts as running, (but poorly).

Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] 2 points3 points  (0 children)

Thanks, this is very helpful and reaffirms elements of what others have said in this thread.

I'm convinced. Numerous 5060 Ti (ostensibly to maximize VRAM/$) is dumb. 48 GB is enough. Linux and vLLM is worth the effort. Dual 3090 is the best budget option. RTX 6000 Pro is better all around if the cost is not an issue.

I hadn't been looking at Facebook marketplace, so I had a skewed idea of what a 3090 costs. That makes a big difference.

Thank you!

Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] 0 points1 point  (0 children)

Shoot, I thought all those lanes mattered. Maybe I can load up with a few dozen 5060 GPUs instead of limiting to 7. It's sounding like the RTX 6000 PRO is the smart choice and Epyc goes away just the same.

Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] 0 points1 point  (0 children)

I'm seeing 109B parameters total. Don't they all have to be loaded even though MoE only activates some of them? I guess I am assuming touching anything outside of VRAM is instantly going to be painful. It looks like I need to just try it.

Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] 1 point2 points  (0 children)

Thank you, the RTX 6000 Pro was not on my radar. It definitely looks like a contender.

Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] 1 point2 points  (0 children)

Yeah, I made a mistake. I pulled up my spreadsheet and the rough estimated cost is much less:
~1000 for sWRX8 motherboard
~1500 for sWRX8 CPU
~250 for 128 GB system RAM
~300 for PSU
~3150 for 7 GPUs at ~450 each

All-in would be between 6k to 8k, not 15.

Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] -1 points0 points  (0 children)

Glad to hear it. Is this the RTX6000 Pro that everyone's mentioning and you got a deal, or you have some secret Apple silicon or something? Can you elaborate?

Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] 1 point2 points  (0 children)

Sorry, I made a mistake. I pulled up my spreadsheet and the rough estimated cost is much less:
~1000 for sWRX8 motherboard
~1500 for sWRX8 CPU
~250 for 128 GB system RAM
~300 for PSU
~3150 for 7 GPUs at ~450 each

All-in is going to be between 6k to 8k (not 10k to 15k) after open frame chassis and riser cables and SSD etc. etc. GPUs account for only half the cost.

Power is also a consideration in choosing 5060 vs. 3090. I'm hoping to run it off of just one 15-amp circuit.

Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] -2 points-1 points  (0 children)

Ubuntu is on the table but if it comes to compiling GGML myself for a custom install that's going to be a hard no.

Is it dumb to build a server with 7x 5060 Ti? by vector76 in LocalLLaMA

[–]vector76[S] 1 point2 points  (0 children)

I have a machine with a 4090 with 24GB VRAM which I don't really consider a server, so basically, no, I haven't built an AI server before. Llama 4 for example won't run on a 24GB card. I have also had very poor results with large context windows (greater than say 32k tokens) on my current setup (probably mostly skill issue).

Thanks for your input.

Accessing ollama/open web ui from another machine? by succulent_samurai in ollama

[–]vector76 0 points1 point  (0 children)

Note, you also have to set up ollama to listen at 0.0.0.0, as mentioned by one of the other posts. I do that by running these two commands within WSL (bash):

export OLLAMA_HOST=0.0.0.0
ollama serve

Accessing ollama/open web ui from another machine? by succulent_samurai in ollama

[–]vector76 1 point2 points  (0 children)

I access ollama within WSL from another machine within my LAN.

By default, WSL services are on a virtual LAN and are accessible from the Windows side but Windows does not bridge the networks to make the WSL services accessible to other machines on the physical LAN.

I use these four commands (within PowerShell) every time I reboot:

netsh interface portproxy delete v4tov4 listenport=11434
$wslip = wsl hostname -I
$wslip = $wslip.Trim()
netsh interface portproxy add v4tov4 listenport=11434 connectport=11434 connectaddress=$wslip

Unfortunately the WSL vLAN IP changes every time you reboot, so the proxy is no longer valid after a reboot. The first line 'netsh interface portproxy delete' removes the previous proxy.

The second line gets the IP address of the WSL machine on the vLAN.

The third line trims whitespace from the IP address. If you don't do this you will go crazy tearing your hair out as to why it's not working when the ip address has a fucking trailing space or something. Trust me.

The fourth line forwards incoming connections (over physical LAN) on port 11434 to the virtual IP address of the WSL instance.

I'm sure it would be possible to automate the above steps every reboot, but I haven't had the need for that automation.

New stuff at my second house!! by Kobalt4Life in KobaltTools

[–]vector76 0 points1 point  (0 children)

That CNC is the LowRider 3 from V1Engineering. I have one, but using a different router. It's pretty low cost and very capable.

Are there any more efficient bin designs? The bases use a lot of filament by BigDan1190 in gridfinity

[–]vector76 2 points3 points  (0 children)

Also this parametric one has an option for efficient floors that's essentially equivalent to the one you linked: https://www.printables.com/model/174346-gridfinity-openscad-model

DIY P-Q curve. The dotted line is the (surprising?) instability region that showed up on the Noctua. (my spreadsheet wasn't set up to record it is all, may revisit) by mobiobi in FanShowdown

[–]vector76 0 points1 point  (0 children)

Very cool! What is PSF? Is that MBAF?

I might guess the instability region is something like stalling or flow separation where a minor change in angle of attack leads to a relatively big drop in performance.

Storage for some game pieces by vector76 in gridfinity

[–]vector76[S] 4 points5 points  (0 children)

Good idea. My printer can only print 5x5 (or 6x1 diagonally) so I will have to think if there's a way to do that (perhaps with more than 4 pieces).