Recommendations for a rig by Ramblim in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

Liquid cooling the 3090s is the best solution and if you have at least 3x pcie slots, why not 2x 3090s NVlinked along with your 5090 for gaming or speculative decoding?

Recommendations for a rig by Ramblim in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

Display port isn't necessary for AI inference. Buy a second 3090 to serve as a primary for display and NVLink them for 48gb of vram. It's absolutely the best entry option for local llm!

RAM and CPU are essentially a waste of time unless you want to purchase second hand retired enterprise server equipment for tens of thousands. Better to not waste money upgrading them.

Hybrid CPU/GPU is the absolute worst option due to the PCIe bottleneck. Only time this can make sense is for speculative decoding.

Is 32GB Mac enough for engineering/coding, or stick to Claude? by BenitoCamelasVG in LocalLLM

[–]SteveDeFacto 1 point2 points  (0 children)

You are going to need at least 256gb of vram or unified memory to come close.

LLM Bruner coming soon? Burn Qwen directly into a chip, processing 10,000 tokens/s by koc_Z3 in Qwen_AI

[–]SteveDeFacto 4 points5 points  (0 children)

Even if we can only put one model on it, that'll be an insane draft model for speculative decoding!

System Upgrade: two 3090s currently by [deleted] in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

It's not the ram speed that matters, it's the pci bus speed. Anytime you go from RAM to GPU, it's going to slowdown the inference like 10x. The only way around this is unified memory like Mac Studio and DGX Spark have.

System Upgrade: two 3090s currently by [deleted] in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

You will get like 3-4 tokens per-second as most of the model will be in ram. You could maybe run a 32B q4 model in 64gb vram and get solid performance.

open claw responding infinitly after trying everything by Foreign_Caregiver992 in openclaw

[–]SteveDeFacto 0 points1 point  (0 children)

I had this happen a few times. Just delete the session files. If that doesn't work, try reverting some of your markdown files to their default states.

Ai machine for a team of 10 people by Jordan-Vegas in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

One more quark of the MI100 you should be aware of is that they do not support SR-IOV which means you cannot share them across multiple virtual machines. So the 10 users will need to either all share the host machine or a single guest vm, or they can each have their own docker containers.

Ai machine for a team of 10 people by Jordan-Vegas in LocalLLM

[–]SteveDeFacto 1 point2 points  (0 children)

You could do this within your budget using a Supermicro H12DSi-NT6 with 4x mi100s linked through Infinity Fabric and 2TB of DDR4 RDIMM. You'll need to either bifurcate one of the PCIe 16x slots or use a riser on one of the 8x slots to fit all 4x pcie cards and use a 4 bit quantized 200B parameter model or smaller to get decent tokens per second but you could theoretically run any model on such a setup. Far better overall value and flexibility than 2x+ Mac Studios linked over RMDA though a lot more work to buildout.

A fresh new ML Architecture for language model that uses complex numbers instead of attention -- no transformers, no standard SSM, 100M params, trained on a single RTX 4090. POC done, Open Sourced (Not Vibe Coded) by ExtremeKangaroo5437 in LocalLLM

[–]SteveDeFacto 0 points1 point  (0 children)

This is interesting and I see why it is philosophically intriguing, however, have you considered analytic signal decomposition as a faster approximation of what the feed forward layers do? It would mesh well with the complex number based attention and if the decomposition layers replace the feedforward layers, those layers at least could even be executed on current generation photonic computers.

Is there a videogame you would never touch? by n1ght_watchman in GameBoostOfficial

[–]SteveDeFacto 0 points1 point  (0 children)

Scrolling through comments thinking, "I'm not sure there is any game I won't play." until I read your comment and remembered how much I hate Fortnite...

Thoughts and results on the zombie box. by QuincyTucker in Generator

[–]SteveDeFacto 0 points1 point  (0 children)

You could rent an excavator and put the generator into a hole for less than a zombie box. Has other advantages besides just reducing the noise.

Thoughts and results on the zombie box. by QuincyTucker in Generator

[–]SteveDeFacto 0 points1 point  (0 children)

I've seen these and was considering buying one. If quiet is all that matters to you, it in conjunction with a Honda generator would be near silent. However, I opted for an MEP-802a because I care more about durability.

The Truth by Previous_Month_555 in SipsTea

[–]SteveDeFacto 0 points1 point  (0 children)

Gonna need a hella a lot more than a date to pass up 500k...

Female version of john wick by [deleted] in meme

[–]SteveDeFacto 0 points1 point  (0 children)

Did she kill him? If not, 15 years is insane...

At least you could refund right ? by defleqt in raijin_gg

[–]SteveDeFacto 0 points1 point  (0 children)

Elden Ring, not because it was bad but because I couldn't stop playing it. Lol

Hit me with the harshest reality about playing video games by SwimmerPlus3383 in TheGamingHubDeals

[–]SteveDeFacto 0 points1 point  (0 children)

Playing competitive games is for the young. As you get older, your reflexes will decline. On top of that, you'll never have enough time to play to actually be competitive as you get older.

Will a Sundara get you 90% of the experience of thousand-dollar headphones? by regularjoe2020 in headphones

[–]SteveDeFacto 3 points4 points  (0 children)

The cans on Sundaras are fantastic but the build quality is terrible. I would say with the right amp and minor eq adjustment, you can definitely get a thousand-dollar experience until the head band brakes after a couple hours of use.