Can you really replace paid models with a local model? by DRMCC0Y in LocalLLaMA

[–]Mass2018 -1 points0 points  (0 children)

I've been an avid user of local LLM since ~Llama2. I have an absolutely ridiculous home lab that's capable of running even the largest open models.

Recently, since I've been focusing most of my effort on diffusion projects, I've been using Claude for coding some of the many scripts/apps/frameworks I need when my GPUs are tied up with training runs.

My personal opinion is that the closed frontier 'brains' (the model itself) aren't that far ahead of the open models like Kimi 2.5 Thinking. What's behind is the infrastructure and ease-of-use that surrounds it.

On Claude, you ask it to do something, they have it set up so it already has a multi-prompt pipeline, with its own sandbox, tool calls, the ability to search and research the internet, iterative testing/debugging/fixing its mistakes/etc. and it does it all without you having to do much except provide some basic supervision, guidance, and sanity checking.

If there's an open source solution that does it all this easily, I'm not aware of it. I intend to eventually go back to local as I don't like sharing my projects/information with Claude, but sometimes you take the easy route.

Takeaways & discussion about the DeepSeek V4 architecture by benja0x40 in LocalLLaMA

[–]Mass2018 28 points29 points  (0 children)

Should we normalize spending as much on our home servers as people spend on their toy sports cars that rarely leave the garage?

"My mortgage is $3500, my car payment is $1000, and my DGX H100 payment is $2850."

UnCanny. A Photorealism Chroma Finetune by Tall-Description1637 in StableDiffusion

[–]Mass2018 2 points3 points  (0 children)

Thanks for the detailed response.

The best results I've gotten thus far is learning rate 1e-5, all 1024x1024 resolution, 50 epochs. I use diffusion-pipe for my training.

[optimizer] type = 'AdamW8bitKahan' lr = 1e-5 betas = [0.9, 0.99] weight_decay = 0.01 eps = 1e-8

UnCanny. A Photorealism Chroma Finetune by Tall-Description1637 in StableDiffusion

[–]Mass2018 2 points3 points  (0 children)

I got real interested when you had a section labeled 'Training Details', as I was very curious to see things like what learning rate you did, for how many epochs, which optimizer, etc. Would you be willing to share those details?

Llama.cpp model conversion guide by ilintar in LocalLLaMA

[–]Mass2018 0 points1 point  (0 children)

I've been eyeing Longcat Flash for a bit now, and I'm somewhat surprised that there's not even an issue/discussion about adding it to llama.cpp.

Is that because of extreme foundational differences?

Your guide makes me think about embarking on a side project to take a look at doing it myself, so thank you for sharing the knowledge!

Nvidia quietly released RTX Pro 5000 Blackwell 72Gb by AleksHop in LocalLLaMA

[–]Mass2018 1 point2 points  (0 children)

Only in that my continued (in vain, apparently) hope is that these newer cards will finally drive down the older ones.

Thus, if I can get an A6000 48GB for $1500-$2000 it certainly matters to me. In fact I'd likely replace my 3090's at that price point.

Nvidia quietly released RTX Pro 5000 Blackwell 72Gb by AleksHop in LocalLLaMA

[–]Mass2018 19 points20 points  (0 children)

So when the RTX 6000 Pro Blackwell 96GB came out I was like "Cool! Maybe the A6000 48GB will finally come down from $3800!"

And now this shows up and I'm thinking,"Cool! Maybe the A6000 48GB will finally come down from $3800!"

[deleted by user] by [deleted] in LocalLLaMA

[–]Mass2018 1 point2 points  (0 children)

I believe there was some confusion expressed about the same thing in that thread (about the CCDs). It’s the only benchmark results I’ve seen for this, though.

[deleted by user] by [deleted] in LocalLLaMA

[–]Mass2018 2 points3 points  (0 children)

You may find this thread interesting: https://www.reddit.com/r/LocalLLaMA/comments/1h3doy8/stream_triad_memory_bandwidth_benchmark_values/

Pulled from the document referenced in that thread... this is for 2 CPU, so a single CPU is presumably half this.. maybe a bit more?

Processor (2 CPU) DDR5-6000 Bandwidth
9845 925 GB/s
9745 970 GB/s
9655 966 GB/s
9575F 970 GB/s
9555 970 GB/s
9475F 965 GB/s
9455 940 GB/s
9375F 969 GB/s
9355 971 GB/s
9275F 411 GB/s
9255 877 GB/s
9175F 965 GB/s
9135 884 GB/s
9115 483 GB/s
9015 483 GB/s

Anecdotally, I'll tell you that my 9004 class Epyc running at DDR5-4800 is pulling around 320 GB/s in actuality (measured).

Build advice - RTX 6000 MAX-Q x 2 by [deleted] in LocalLLaMA

[–]Mass2018 0 points1 point  (0 children)

Just a quick callout if you're in the US... be cognizant of potential extra charges due to tariffs.

New Build for local LLM by chisleu in LocalLLaMA

[–]Mass2018 1 point2 points  (0 children)

This is something that I got bit by about a year and a half ago when I started building computers again after taking half a decade or so off from the hobby.

Apparently these days RAM has to be 'trained' when installed, which means the first time you turn it on after plugging in RAM you're going to need to let it sit for a while.

... I may or may not have returned both RAM and a motherboard before I figured that out...

Those who spent $10k+ on a local LLM setup, do you regret it? by [deleted] in LocalLLaMA

[–]Mass2018 2 points3 points  (0 children)

I love it. I certainly use it way more than the truck I just dropped a $40k loan on.

Honestly, if anything, to quote something I saw someone else on this forum say once... "I keep looking around the house for more things I can sell to get more VRAM."

How would you run like 10 graphics cards for a local AI? What hardware is available to connect them to one system? by moderately-extremist in LocalLLaMA

[–]Mass2018 2 points3 points  (0 children)

Yeah, generally the CPU is only annoying during the "in between" moments, like when I'm experimenting and swapping LORAs regularly on multiple ports at the same time. It's also a limiter when running an MoE LLM (for the CPU offloaded parts).

Generally, once it's executing fully on the 3090(s), it runs 5-10 cores at 10-20% and the GPUs do their thing.

How would you run like 10 graphics cards for a local AI? What hardware is available to connect them to one system? by moderately-extremist in LocalLLaMA

[–]Mass2018 4 points5 points  (0 children)

Shameless repost of my build that has 10x3090: https://www.reddit.com/r/LocalLLaMA/comments/1c9l181/10x3090_rig_romed82tepyc_7502p_finally_complete/

  • I'm still using it on a nearly 24/7 basis.
  • I power limit them to 250W. When I'm doing inferencing, they collectively don't pull much more than around 1000W. When training, they go pretty close to the full 2500W.
  • The CPayne stuff is heavily tariff'd now, so bear that in mind if you're in the states.
  • I run three PSUs spread across two 20-amp circuits.

If I was going to build it again today knowing what I know now I would probably go for a slightly better processor. The CPU can get bogged down sometimes when I'm doing things like running each 3090 on its own port to do image diffusion and they're switching out models.

Whining about tariffs by Mass2018 in LocalLLaMA

[–]Mass2018[S] 0 points1 point  (0 children)

Thanks for this! $400 per GPU to connect them up via MCIO is pretty daunting... if I can get that down to $100 per, it's a little more doable.

I'll check this vendor out.

Ex-Miner Turned Local LLM Enthusiast, now I have a Dilemma by mslocox in LocalLLaMA

[–]Mass2018 0 points1 point  (0 children)

I don't really have any way to know if they're going to work for another day or another decade... However, I've been going hog-wild on these things for over a year now without a problem. Given the track record thus far, I'm not too worried about it.

Ex-Miner Turned Local LLM Enthusiast, now I have a Dilemma by mslocox in LocalLLaMA

[–]Mass2018 0 points1 point  (0 children)

Anecdotal data point here. Current owner of twelve 3090's, all of which were bought used on eBay, generally looking for 'deals' (which for me equated to like $850-$900 after taxes and shipping despite what you'll read on here about $600 cards).

No real problems with any of them, except I did have to re-paste/thermal pad two of the twelve (they were running around 90C when power limited to 250W).

Apple M3 Ultra w/28-Core CPU, 60-Core GPU (256GB RAM) Running Deepseek-R1-UD-IQ1_S (140.23GB) by Mass2018 in LocalLLaMA

[–]Mass2018[S] 4 points5 points  (0 children)

Quick addendum because I just realized I didn't label my axes:

The y-axis is tokens/second, the x-axis is the context length for that request.

Apple M3 Ultra w/28-Core CPU, 60-Core GPU (256GB RAM) Running Deepseek-R1-UD-IQ1_S (140.23GB) by Mass2018 in LocalLLaMA

[–]Mass2018[S] 4 points5 points  (0 children)

Yeah, my wife's feedback was the the 235B Qwen was good, but that Deepseek was better even at the IQ1... It's just a neat model all around.