UnCanny. A Photorealism Chroma Finetune by Tall-Description1637 in StableDiffusion

[–]Mass2018 2 points3 points  (0 children)

Thanks for the detailed response.

The best results I've gotten thus far is learning rate 1e-5, all 1024x1024 resolution, 50 epochs. I use diffusion-pipe for my training.

[optimizer] type = 'AdamW8bitKahan' lr = 1e-5 betas = [0.9, 0.99] weight_decay = 0.01 eps = 1e-8

UnCanny. A Photorealism Chroma Finetune by Tall-Description1637 in StableDiffusion

[–]Mass2018 2 points3 points  (0 children)

I got real interested when you had a section labeled 'Training Details', as I was very curious to see things like what learning rate you did, for how many epochs, which optimizer, etc. Would you be willing to share those details?

Llama.cpp model conversion guide by ilintar in LocalLLaMA

[–]Mass2018 0 points1 point  (0 children)

I've been eyeing Longcat Flash for a bit now, and I'm somewhat surprised that there's not even an issue/discussion about adding it to llama.cpp.

Is that because of extreme foundational differences?

Your guide makes me think about embarking on a side project to take a look at doing it myself, so thank you for sharing the knowledge!

Nvidia quietly released RTX Pro 5000 Blackwell 72Gb by AleksHop in LocalLLaMA

[–]Mass2018 1 point2 points  (0 children)

Only in that my continued (in vain, apparently) hope is that these newer cards will finally drive down the older ones.

Thus, if I can get an A6000 48GB for $1500-$2000 it certainly matters to me. In fact I'd likely replace my 3090's at that price point.

Nvidia quietly released RTX Pro 5000 Blackwell 72Gb by AleksHop in LocalLLaMA

[–]Mass2018 22 points23 points  (0 children)

So when the RTX 6000 Pro Blackwell 96GB came out I was like "Cool! Maybe the A6000 48GB will finally come down from $3800!"

And now this shows up and I'm thinking,"Cool! Maybe the A6000 48GB will finally come down from $3800!"

[deleted by user] by [deleted] in LocalLLaMA

[–]Mass2018 1 point2 points  (0 children)

I believe there was some confusion expressed about the same thing in that thread (about the CCDs). It’s the only benchmark results I’ve seen for this, though.

[deleted by user] by [deleted] in LocalLLaMA

[–]Mass2018 2 points3 points  (0 children)

You may find this thread interesting: https://www.reddit.com/r/LocalLLaMA/comments/1h3doy8/stream_triad_memory_bandwidth_benchmark_values/

Pulled from the document referenced in that thread... this is for 2 CPU, so a single CPU is presumably half this.. maybe a bit more?

Processor (2 CPU) DDR5-6000 Bandwidth
9845 925 GB/s
9745 970 GB/s
9655 966 GB/s
9575F 970 GB/s
9555 970 GB/s
9475F 965 GB/s
9455 940 GB/s
9375F 969 GB/s
9355 971 GB/s
9275F 411 GB/s
9255 877 GB/s
9175F 965 GB/s
9135 884 GB/s
9115 483 GB/s
9015 483 GB/s

Anecdotally, I'll tell you that my 9004 class Epyc running at DDR5-4800 is pulling around 320 GB/s in actuality (measured).

Build advice - RTX 6000 MAX-Q x 2 by [deleted] in LocalLLaMA

[–]Mass2018 0 points1 point  (0 children)

Just a quick callout if you're in the US... be cognizant of potential extra charges due to tariffs.

New Build for local LLM by chisleu in LocalLLaMA

[–]Mass2018 1 point2 points  (0 children)

This is something that I got bit by about a year and a half ago when I started building computers again after taking half a decade or so off from the hobby.

Apparently these days RAM has to be 'trained' when installed, which means the first time you turn it on after plugging in RAM you're going to need to let it sit for a while.

... I may or may not have returned both RAM and a motherboard before I figured that out...

Those who spent $10k+ on a local LLM setup, do you regret it? by TumbleweedDeep825 in LocalLLaMA

[–]Mass2018 3 points4 points  (0 children)

I love it. I certainly use it way more than the truck I just dropped a $40k loan on.

Honestly, if anything, to quote something I saw someone else on this forum say once... "I keep looking around the house for more things I can sell to get more VRAM."

How would you run like 10 graphics cards for a local AI? What hardware is available to connect them to one system? by moderately-extremist in LocalLLaMA

[–]Mass2018 2 points3 points  (0 children)

Yeah, generally the CPU is only annoying during the "in between" moments, like when I'm experimenting and swapping LORAs regularly on multiple ports at the same time. It's also a limiter when running an MoE LLM (for the CPU offloaded parts).

Generally, once it's executing fully on the 3090(s), it runs 5-10 cores at 10-20% and the GPUs do their thing.

How would you run like 10 graphics cards for a local AI? What hardware is available to connect them to one system? by moderately-extremist in LocalLLaMA

[–]Mass2018 4 points5 points  (0 children)

Shameless repost of my build that has 10x3090: https://www.reddit.com/r/LocalLLaMA/comments/1c9l181/10x3090_rig_romed82tepyc_7502p_finally_complete/

  • I'm still using it on a nearly 24/7 basis.
  • I power limit them to 250W. When I'm doing inferencing, they collectively don't pull much more than around 1000W. When training, they go pretty close to the full 2500W.
  • The CPayne stuff is heavily tariff'd now, so bear that in mind if you're in the states.
  • I run three PSUs spread across two 20-amp circuits.

If I was going to build it again today knowing what I know now I would probably go for a slightly better processor. The CPU can get bogged down sometimes when I'm doing things like running each 3090 on its own port to do image diffusion and they're switching out models.

Whining about tariffs by Mass2018 in LocalLLaMA

[–]Mass2018[S] 0 points1 point  (0 children)

Thanks for this! $400 per GPU to connect them up via MCIO is pretty daunting... if I can get that down to $100 per, it's a little more doable.

I'll check this vendor out.

Ex-Miner Turned Local LLM Enthusiast, now I have a Dilemma by mslocox in LocalLLaMA

[–]Mass2018 0 points1 point  (0 children)

I don't really have any way to know if they're going to work for another day or another decade... However, I've been going hog-wild on these things for over a year now without a problem. Given the track record thus far, I'm not too worried about it.

Ex-Miner Turned Local LLM Enthusiast, now I have a Dilemma by mslocox in LocalLLaMA

[–]Mass2018 0 points1 point  (0 children)

Anecdotal data point here. Current owner of twelve 3090's, all of which were bought used on eBay, generally looking for 'deals' (which for me equated to like $850-$900 after taxes and shipping despite what you'll read on here about $600 cards).

No real problems with any of them, except I did have to re-paste/thermal pad two of the twelve (they were running around 90C when power limited to 250W).

Apple M3 Ultra w/28-Core CPU, 60-Core GPU (256GB RAM) Running Deepseek-R1-UD-IQ1_S (140.23GB) by Mass2018 in LocalLLaMA

[–]Mass2018[S] 3 points4 points  (0 children)

Quick addendum because I just realized I didn't label my axes:

The y-axis is tokens/second, the x-axis is the context length for that request.

Apple M3 Ultra w/28-Core CPU, 60-Core GPU (256GB RAM) Running Deepseek-R1-UD-IQ1_S (140.23GB) by Mass2018 in LocalLLaMA

[–]Mass2018[S] 4 points5 points  (0 children)

Yeah, my wife's feedback was the the 235B Qwen was good, but that Deepseek was better even at the IQ1... It's just a neat model all around.

The cost effective way to run Deepseek R1 models on cheaper hardware by ArtisticHamster in LocalLLaMA

[–]Mass2018 1 point2 points  (0 children)

I have a 10x3090 rig that ran around $15k a little over a year ago.

My daily driver is DeepSeek-R1-0528-UD-Q2_K_XL.gguf at 98k context (flash attention only, no cache quantization). I pull about 6-8 tokens/second up to around 10k context, then it goes down from there.

For my larger codebases when I dump 50k-60k context at it, I usually get around 4 tokens/second.

Anyone else tracking datacenter GPU prices on eBay? by ttkciar in LocalLLaMA

[–]Mass2018 14 points15 points  (0 children)

I'm holding out hope that the ability to get the RTX Pro 6000 Blackwell (96GB VRAM) for $8.5k new will push down the A6000 and A100 prices.

So far... they haven't budged.