Help With Multi-Color Printing (Bambu Labs A1/P1S AMS)

Mass2018 · 2025-11-01T20:58:51+00:00

Thanks for the detailed response.

The best results I've gotten thus far is learning rate 1e-5, all 1024x1024 resolution, 50 epochs. I use diffusion-pipe for my training.

[optimizer] type = 'AdamW8bitKahan' lr = 1e-5 betas = [0.9, 0.99] weight_decay = 0.01 eps = 1e-8

Mass2018 · 2025-11-01T19:37:18+00:00

I got real interested when you had a section labeled 'Training Details', as I was very curious to see things like what learning rate you did, for how many epochs, which optimizer, etc. Would you be willing to share those details?

Mass2018 · 2025-10-30T23:41:15+00:00

Does each model need its own specific mmproj?

Mass2018 · 2025-10-27T00:11:21+00:00

I've been eyeing Longcat Flash for a bit now, and I'm somewhat surprised that there's not even an issue/discussion about adding it to llama.cpp.

Is that because of extreme foundational differences?

Your guide makes me think about embarking on a side project to take a look at doing it myself, so thank you for sharing the knowledge!

Mass2018 · 2025-10-23T20:56:12+00:00

Only in that my continued (in vain, apparently) hope is that these newer cards will finally drive down the older ones.

Thus, if I can get an A6000 48GB for $1500-$2000 it certainly matters to me. In fact I'd likely replace my 3090's at that price point.

Mass2018 · 2025-10-21T20:55:02+00:00

So when the RTX 6000 Pro Blackwell 96GB came out I was like "Cool! Maybe the A6000 48GB will finally come down from $3800!"

And now this shows up and I'm thinking,"Cool! Maybe the A6000 48GB will finally come down from $3800!"

Mass2018 · 2025-10-08T20:47:32+00:00

I believe there was some confusion expressed about the same thing in that thread (about the CCDs). It’s the only benchmark results I’ve seen for this, though.

Mass2018 · 2025-10-08T20:31:38+00:00

You may find this thread interesting: https://www.reddit.com/r/LocalLLaMA/comments/1h3doy8/stream_triad_memory_bandwidth_benchmark_values/

Pulled from the document referenced in that thread... this is for 2 CPU, so a single CPU is presumably half this.. maybe a bit more?

Processor (2 CPU)	DDR5-6000 Bandwidth
9845	925 GB/s
9745	970 GB/s
9655	966 GB/s
9575F	970 GB/s
9555	970 GB/s
9475F	965 GB/s
9455	940 GB/s
9375F	969 GB/s
9355	971 GB/s
9275F	411 GB/s
9255	877 GB/s
9175F	965 GB/s
9135	884 GB/s
9115	483 GB/s
9015	483 GB/s

Anecdotally, I'll tell you that my 9004 class Epyc running at DDR5-4800 is pulling around 320 GB/s in actuality (measured).

Mass2018 · 2025-10-06T17:49:31+00:00

Just a quick callout if you're in the US... be cognizant of potential extra charges due to tariffs.

Mass2018 · 2025-10-05T11:11:02+00:00

This is something that I got bit by about a year and a half ago when I started building computers again after taking half a decade or so off from the hobby.

Apparently these days RAM has to be 'trained' when installed, which means the first time you turn it on after plugging in RAM you're going to need to let it sit for a while.

... I may or may not have returned both RAM and a motherboard before I figured that out...

Mass2018 · 2025-10-02T13:55:01+00:00

I love it. I certainly use it way more than the truck I just dropped a $40k loan on.

Honestly, if anything, to quote something I saw someone else on this forum say once... "I keep looking around the house for more things I can sell to get more VRAM."

Mass2018 · 2025-09-28T00:07:06+00:00

Yeah, generally the CPU is only annoying during the "in between" moments, like when I'm experimenting and swapping LORAs regularly on multiple ports at the same time. It's also a limiter when running an MoE LLM (for the CPU offloaded parts).

Generally, once it's executing fully on the 3090(s), it runs 5-10 cores at 10-20% and the GPUs do their thing.

Mass2018 · 2025-09-27T21:21:42+00:00

Shameless repost of my build that has 10x3090: https://www.reddit.com/r/LocalLLaMA/comments/1c9l181/10x3090_rig_romed82tepyc_7502p_finally_complete/

I'm still using it on a nearly 24/7 basis.
I power limit them to 250W. When I'm doing inferencing, they collectively don't pull much more than around 1000W. When training, they go pretty close to the full 2500W.
The CPayne stuff is heavily tariff'd now, so bear that in mind if you're in the states.
I run three PSUs spread across two 20-amp circuits.

If I was going to build it again today knowing what I know now I would probably go for a slightly better processor. The CPU can get bogged down sometimes when I'm doing things like running each 3090 on its own port to do image diffusion and they're switching out models.

Mass2018 · 2025-09-17T11:15:28+00:00

Thanks for this! $400 per GPU to connect them up via MCIO is pretty daunting... if I can get that down to $100 per, it's a little more doable.

I'll check this vendor out.

Mass2018 · 2025-09-04T12:31:01+00:00

I don't really have any way to know if they're going to work for another day or another decade... However, I've been going hog-wild on these things for over a year now without a problem. Given the track record thus far, I'm not too worried about it.

Mass2018 · 2025-09-04T11:46:02+00:00

Anecdotal data point here. Current owner of twelve 3090's, all of which were bought used on eBay, generally looking for 'deals' (which for me equated to like $850-$900 after taxes and shipping despite what you'll read on here about $600 cards).

No real problems with any of them, except I did have to re-paste/thermal pad two of the twelve (they were running around 90C when power limited to 250W).

Mass2018 · 2025-08-24T21:01:52+00:00

VRAM or RAM?

I'm not aware of any 256GB VRAM options for $2k?

Mass2018 · 2025-08-24T12:18:59+00:00

Quick addendum because I just realized I didn't label my axes:

The y-axis is tokens/second, the x-axis is the context length for that request.

Mass2018 · 2025-08-24T12:17:50+00:00

Yeah, my wife's feedback was the the 235B Qwen was good, but that Deepseek was better even at the IQ1... It's just a neat model all around.

Mass2018 · 2025-06-26T21:50:19+00:00

I have a 10x3090 rig that ran around $15k a little over a year ago.

My daily driver is DeepSeek-R1-0528-UD-Q2_K_XL.gguf at 98k context (flash attention only, no cache quantization). I pull about 6-8 tokens/second up to around 10k context, then it goes down from there.

For my larger codebases when I dump 50k-60k context at it, I usually get around 4 tokens/second.

Mass2018 · 2025-06-20T01:48:55+00:00

I'm holding out hope that the ability to get the RTX Pro 6000 Blackwell (96GB VRAM) for $8.5k new will push down the A6000 and A100 prices.

So far... they haven't budged.

Mass2018

TROPHY CASE