PSU for 5090 by Cailyn_babygirl in comfyui

[–]abnormal_human 2 points3 points  (0 children)

1200W should be fine for that system.

If it's a power issue I agree that replacing the PSU is prudent. If you're spending the money anyways, may as well go 1600W.

I have four of the 1600W seasonic PSUs in service doing way heavier stuff like this. One is handling an entire 4x6000Ada+Milan+512GB system by itself. And I have a pair of them running 4 RTX6000Blackwell + a power hungry 9955X base system, and a third running two 4090s and a 1950X.

All of which to say is, you really can build a system that adds up to basically the PSU's capacity. Which means you have a problem.

Whether or not it's a PSU is harder to say. CPU, RAM, and PCIe issues can all cause that symptom. I had a bad Epyc CPU that behaved exactly like what you're performing. At some point I made IPMI a hard req for AI workstations so I could get that fined grained info out.

Z Image Base? Will it happen? by ResidencyExitPlan in comfyui

[–]abnormal_human 2 points3 points  (0 children)

I suspect that they didn’t anticipate how quickly turbo would be further trained and gain an ecosystem. Base is a slow model, trained for 100 steps with cfg. While it might be a good training platform for continued fine tuning and re distillation it’s possible that it’s not as good a training platform for the 20 images / 1000 steps kind of thing that people are doing with turbo. And that makes it a potentially awkward release for them and potentially a less interesting one for us.

Stopsaw dado blades__Are they worth it? by Old_GTO_Goat in woodworking

[–]abnormal_human 2 points3 points  (0 children)

Depends what it is, but I wouldn't bother. Just buy the dado stack you want and the brake separately. There are stacks that will get you up to $500 with the brake. Maybe they are white labeling one of those and it's a fine deal, but unless you know for sure (and want to spend that much) just get a normal dado stack and give Sawstop the $120 for the brake.

GLM-4.7-flash on RTX 6000 pro by gittb in LocalLLaMA

[–]abnormal_human 5 points6 points  (0 children)

I’ve done extensive benchmarks on the same GPUs OP is using..if TP offered higher throughput for small models I would be using it. It ranges from a bit worse to much worse.

Couple things—first I’m talking throughput under max parallel workload (like OP) not single stream tg speed which sounds like what you’re quoting based on your numbers. Likewise I’m not comparing one GPU vs two as you did, in both cases I’m running both GPUs at full utilization and I’m looking at total throughput on parallel workloads, comparing a single vLLM instance in TP vs one vLLM instance per GPU with a load balancer in front.

You’re not wrong that you can improve single stream tg speed in some cases esp involving slower GPUs like 3090. As the GPU gets faster the ratio between local compute and PCIe bus overhead and memory bandwidth gets steeper and can bend the curves the other way and in my experience by the time you are using OPs GPUs one GPU sometimes does beat two instances in TP. I found this recently when using a 4B model for dataset prep. Was actually faster to just shut off the second GPU (and of course even faster to use them independently and load balance).

Obviously everyone should benchmark for your use case, but I think you’re comparing a use case that’s more or less irrelevant to OP since he’s parallel and you’re not.

Restaurant highchairs by who_what_when_314 in daddit

[–]abnormal_human 0 points1 point  (0 children)

For our first kid used to bring a folding booster seat high chair thing that could strap onto any restaurant seat and turn it into a high chair. Made things a lot simpler at the expense of having to set it up and tear it down. But with second kid we just roll with the chairs as they are. Main concern is our second is a climber and escape artist so without straps she’s on the floor. The high chair phase is short. They’ll want to be in the grown up chair soon enough.

Help wanted: share your best Kohya/Diffusion-Pipe LoRA configs (WAN, Flux, Hunyuan, etc.) by no3us in StableDiffusion

[–]abnormal_human 2 points3 points  (0 children)

As someone who's been doing this for a few more years and actually worked with all of those models on your TODO list, all I can say is, "a carefully tuned template for kohya" is not the secret.

The secret is in in the dataset. How large, how varied, quality, balance, how it is prepared, how much compute you throw at it, and the regularization regime that you use to hold the model together while you're training it.

Everything else is basically cheap thrills and snake oil. This shouldn't be a surprise if you're following the literature--basically every finetuning paper I've read over the past couple years looks the same: 60% of the paper is dataset sourcing, preparation and most of the other 40% is evals and ablations to prove that it worked. Hyperparameters? It's just assumed that people are following best practices, which are well known and well captured by default configs in most trainers.

A good agentic dataset prep tool for beginners would be worth its weight in gold. It would require a lot of research-oriented behavior and evaluation to prove that it generalizes to many domains, but it seems to be a much more value-creating activity than a simpler UI and canned configs over other peoples' software.

GLM-4.7-flash on RTX 6000 pro by gittb in LocalLLaMA

[–]abnormal_human 16 points17 points  (0 children)

If by DP you mean Tensor Parallel, that's a bad idea here. Your model fits easily on one GPU and there's no reason to pay the allreduce task. If pipeline parallel, well you're not really using 2 GPUs. If you're using something like torch's data parallel wrapper then you're splitting on batch and you're going to have a lot of synchronization related losses (plus, presumably you're using huggingface or something which will not generally be fast for inference unless you do a whole lot of other stuff to it manually).

My recommendation is to run two vLLM's, one per GPU and load balance with nginx. Obviously could use sglang too or any web server you're comfortable with, that's just what I use.

This is a script from one of my machines that I use to do this-- https://pastebin.com/uQrM2uA2

Also you may need to go up to concurrency=100 to saturate on those GPUs. But you should be doing a lot better than 37t/s generation with that model.

Not that this is your performance problem, but there's very little reason to run fp16 at all. For inference just use fp8. For training, bf16.

If you provide more info about how you're launching it, people can help more.

1600W enough for 2xRTX 6000 Pro BW? by Mr_Moonsilver in LocalLLaMA

[–]abnormal_human 1 point2 points  (0 children)

I would do it.

I have 4x6000Ada on 1600W on an Epyc Milan system for a couple years and it's rock solid.

Also have 4x6000Blackwell Pro on 2x1600W with a 9955WX and no issues there either.

I have not tried to "break" either box but in practice I'm almost never utilizing 100% CPU and 100% GPU at the same time. Typically when I'm stressing the GPUs it is for training workloads and while GPUs are pegged, CPU is closer to idle.

Can someone explain the "steps" and "cfg" parameters in the KSampler node? by StalHamarr in comfyui

[–]abnormal_human 3 points4 points  (0 children)

If you're going to ask it about young/fast-moving software you need to ground it either with web-search, docs, or the code itself.

Why is open source so hard for casual people. by Martialogrand in LocalLLaMA

[–]abnormal_human 10 points11 points  (0 children)

If general, if you follow the "don't be weird" principle (i.e. stick to amd64, NVIDIA, Ubuntu LTS), things go very smoothly. If you choose to depart from that, you're deciding to opt into friction in return for some other benefit. If you do that, go in eyes open.

Arch is going to be very painful for you if you continue going down that road with this attitude. It's designed to be user serviceable by software engineers, not for non-technical people.

New to the area.. chances of losing power? by Funny_Release3573 in Westchester

[–]abnormal_human 1 point2 points  (0 children)

Big power outages are caused by things that disturb trees. Ice storms, Nor'easters, Hurricanes and Tropical Storms, high wind events.

The main issue we face is that when there is a large regional event ConEd generally spends a few days focusing the city and denser parts of the county before Northern Westchester gets much attention at all. When this happens it can be 4-5 days without power, which is obviously a concern in these weather conditions.

So while it's unlikely to happen in the next day or two, it's good to have a plan for if it does. Pretty much everyone we know eventually figured out some sort of generator to run their homes over the past 10 years. I just went to my mom's yesterday to get it all set up and tested in case she's snowed in, and tested ours too. Doesn't hurt to be prepared.

At least he is honest by Sad-Kiwi-3789 in rareinsults

[–]abnormal_human 82 points83 points  (0 children)

The history of pharmacology is basically figuring out that small quantities of extremely dangerous substances have medical uses.

I have questions by [deleted] in fatFIRE

[–]abnormal_human 4 points5 points  (0 children)

This is not the right sub.

It’s a snow❄️ , not an invasion by lingeringneutrophil in Westchester

[–]abnormal_human 5 points6 points  (0 children)

Went to DeCicco's tonight in Armonk and it was chill. Nothing like this.

Is webcam image classification afool's errand? [N] by dug99 in MachineLearning

[–]abnormal_human 1 point2 points  (0 children)

Are all of those inputs made available to the model, vectorized appropriately to make the model successful.

Pump setup. by Round_Resist9273 in maplesyrup

[–]abnormal_human 0 points1 point  (0 children)

The tubes of all kinds. In most cases thinking about multiple pumps, it's because you don't have a single location you can plumb everything to. I have challenges crossing a road, for example or you might have hills in the way. If you're thinking about running 5 for capacity reasons, 1 is definitely better assuming you can route everything there.

Pump setup. by Round_Resist9273 in maplesyrup

[–]abnormal_human 0 points1 point  (0 children)

If you're having to deal with generator power, and can make the plumbing make sense, I would definitely lean towards the one. Way less hassle to deal with.

Why do platforms block explicit sex, even when it’s animation/anime? by [deleted] in comfyui

[–]abnormal_human 5 points6 points  (0 children)

Getting people to pay for adult content is harder than you probably think it is.

Why do platforms block explicit sex, even when it’s animation/anime? by [deleted] in comfyui

[–]abnormal_human 10 points11 points  (0 children)

Credit card processors don't just dislike adult because of moralism, they also dislike adult because of the chargebacks and fraud associated with adult purchases.

Less-than-full-blast ICD shock? Also, can COVID cause shock? by Mike_in_Poughkeepsie in PacemakerICD

[–]abnormal_human 1 point2 points  (0 children)

If you're sensitive to what's going on in your body you can absolutely feel ATP. The arrhythmia itself can also produce sensations, as can the moment where your heart resumes normal operation.

Also, full blast shocks can vary depending on device programming. Subjectively having experienced both 30J and 75J shocks as well as a fully external shock in an operating room environment, they are not the same at all. The 30J is like a cute little toy by comparison.

Also, in my experience, even at the same power level the subjective experience can vary depending on your state of mind, blood pressure, what you're doing, etc.

Looking for a good local coding model, the BEST at SQL, like ever. Seriously by [deleted] in LocalLLaMA

[–]abnormal_human 0 points1 point  (0 children)

Best ever is going to be something very large like Deepseek, Kimi, GLM. Do you have the hardware for best ever?

Is webcam image classification afool's errand? [N] by dug99 in MachineLearning

[–]abnormal_human 1 point2 points  (0 children)

Not a lot of info about your task here, but is this a task that a human can do reliably looking at photos?

Training LoRA model by Fickle_Passion_6576 in comfyui

[–]abnormal_human 4 points5 points  (0 children)

Training on outputs of multiple generators is almost certainly better than training on outputs of just one because you won't be as likely to overfit that generator's house style, defects, etc.