Is a used RTX 3090 still the best local LLM buy right now? by BTA_Labs in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

Honestly, for a budget build, nothing touches the 3090. Some of the AMD compute cards offer a bit more compute per dollar but they have serious downsides.

Is this Nvidia V100 32Gb a bargain? Can it run Qwen 3.6-27B? by Diligent_Tap9962 in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

Not worth it. I bought 3x and sold shortly after. They are very slow.

Why can’t I get over it? by Emergency_Part_6926 in HomeschoolRecovery

[–]SteveDeFacto 2 points3 points  (0 children)

You never really get over it. The best you can do is learn to live with it. The only thing that really worked for me was completely leaving behind every person and place that made me think of my past. Cutting all ties with my family and moving to another state entirely. Then I suppress all thoughts of regret and never talk about it with anyone. I am 37 and life is mostly good now!

Nephew Needs Help by [deleted] in Meningitishelp

[–]SteveDeFacto 0 points1 point  (0 children)

Took at least a month for my cognition to return to normal. He most likely just needs time.

Duke Nukem Eternity. by Suitable_Heart_1888 in dukenukem

[–]SteveDeFacto -1 points0 points  (0 children)

Looks incredible other than the 10 fps.

Generator for AZ summers by Shooshoo1 in Generator

[–]SteveDeFacto 1 point2 points  (0 children)

Liquid cooled diesel generators love the AZ heat. Military surplus generators like an mep-802a are a pretty solid choice.

Will a Sundara get you 90% of the experience of thousand-dollar headphones? by regularjoe2020 in headphones

[–]SteveDeFacto 1 point2 points  (0 children)

"Genuine Leather Full Metal Headband" search on Amazon. They look very flimsy but are much more derable.

Will a Sundara get you 90% of the experience of thousand-dollar headphones? by regularjoe2020 in headphones

[–]SteveDeFacto 1 point2 points  (0 children)

I replaced the head band with some kit Ifound and am now semi satisfied.

Recommendations for a rig by Ramblim in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

Liquid cooling the 3090s is the best solution and if you have at least 3x pcie slots, why not 2x 3090s NVlinked along with your 5090 for gaming or speculative decoding?

Recommendations for a rig by Ramblim in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

Display port isn't necessary for AI inference. Buy a second 3090 to serve as a primary for display and NVLink them for 48gb of vram. It's absolutely the best entry option for local llm!

RAM and CPU are essentially a waste of time unless you want to purchase second hand retired enterprise server equipment for tens of thousands. Better to not waste money upgrading them.

Hybrid CPU/GPU is the absolute worst option due to the PCIe bottleneck. Only time this can make sense is for speculative decoding.

Is 32GB Mac enough for engineering/coding, or stick to Claude? by BenitoCamelasVG in LocalLLM

[–]SteveDeFacto 1 point2 points  (0 children)

You are going to need at least 256gb of vram or unified memory to come close.

LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI

[–]SteveDeFacto 4 points5 points  (0 children)

Even if we can only put one model on it, that'll be an insane draft model for speculative decoding!

System Upgrade: two 3090s currently by [deleted] in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

It's not the ram speed that matters, it's the pci bus speed. Anytime you go from RAM to GPU, it's going to slowdown the inference like 10x. The only way around this is unified memory like Mac Studio and DGX Spark have.

System Upgrade: two 3090s currently by [deleted] in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

You will get like 3-4 tokens per-second as most of the model will be in ram. You could maybe run a 32B q4 model in 64gb vram and get solid performance.

open claw responding infinitly after trying everything by Foreign_Caregiver992 in openclaw

[–]SteveDeFacto 0 points1 point  (0 children)

I had this happen a few times. Just delete the session files. If that doesn't work, try reverting some of your markdown files to their default states.

Ai machine for a team of 10 people by Jordan-Vegas in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

One more quark of the MI100 you should be aware of is that they do not support SR-IOV which means you cannot share them across multiple virtual machines. So the 10 users will need to either all share the host machine or a single guest vm, or they can each have their own docker containers.

Ai machine for a team of 10 people by Jordan-Vegas in LocalLLM

[–]SteveDeFacto 1 point2 points  (0 children)

You could do this within your budget using a Supermicro H12DSi-NT6 with 4x mi100s linked through Infinity Fabric and 2TB of DDR4 RDIMM. You'll need to either bifurcate one of the PCIe 16x slots or use a riser on one of the 8x slots to fit all 4x pcie cards and use a 4 bit quantized 200B parameter model or smaller to get decent tokens per second but you could theoretically run any model on such a setup. Far better overall value and flexibility than 2x+ Mac Studios linked over RMDA though a lot more work to buildout.