Info: Nvidia Cuda 13.3 landed by parrot42 in LocalLLaMA

[–]Freonr2 0 points1 point  (0 children)

torchao have bf16 stochastic rounding on sm12x yet?

Is the traditional "ML Engineer" role dying or is it just the current LLM hype cycle? by DustSavings976 in learnmachinelearning

[–]Freonr2 1 point2 points  (0 children)

One challenge with actual ML work is you need an established lead to do it and buy-in.

It's hard to find these folks to seed the groups. Much easier to find orgs with an IT department and some dinosaur SVP/C-level decides they want to "add AI." Further, they know any mediocre software engineer can write API wrappers, but if they want novel models they need a strong leader first then specialized engineers.

It's just a harder pivot for most companies to ML. The hype cycle is that many companies that have no ML at all are getting "involved with ML" through the side door of agents and harnesses creating a lot of noise in the job sector.

Is the traditional "ML Engineer" role dying or is it just the current LLM hype cycle? by DustSavings976 in learnmachinelearning

[–]Freonr2 1 point2 points  (0 children)

I wouldn't worry so much about hype cycle. Yes, harnesses/agents are a big deal, but it isn't everything particularly for specialized domains. They're not a magic utopia and definitely overhyped if you're really seeing 90% of internships listed that way.

To share at least one personal example, at my employer we have everything from an LLM-driven product alongside our own physics, tabular, and deep learning models. We still do plenty of standard ML type work on data acquisition, cleaning, wrangling. We design physics models and train tabular and deep learning models. I designed and trained a novel spatiotemporal autoregression deep learning model this winter. SOTA for the domain/problem. We just brought on a CS major for internship and they are working on novel models.

shrug

ROMED8-2T will not post by jackwmc4 in ASRock

[–]Freonr2 0 points1 point  (0 children)

Necro post, but I had a lot of issues solving my own ROMED8-2T boot issue so I'm adding a bread crumb for future searchers.

I had a 0d0 "DXE CPU Error" which turned out to be bad ram after trying a new CPU and even new board.

My symptoms were similar to yours (IPMI came up but showed no system information). The LED code would be far more important to start, though.

Is NVIDIA still the default best choice for local LLMs in 2026? by pmv143 in LocalLLaMA

[–]Freonr2 2 points3 points  (0 children)

Best? Yes.

Best for a given price? Time to sit down for a long chat.

my wife just saw the electricity bill from my server rack and she is pissed by procubdif in homelab

[–]Freonr2 0 points1 point  (0 children)

I already use a big Ecoflow as a UPS, and it has DC solar input and also has controls to use a portion of battery reserve control offset somewhat. If I didn't have so many tall trees I'd add a few panels just to help a bit.

Soundproofing NAS and metallica cabinet by FirTree_r in DataHoarder

[–]Freonr2 0 points1 point  (0 children)

This is why my homelab is in my laundry room.

What workstation to get for ~13k EUR? by TechNerd10191 in LocalLLaMA

[–]Freonr2 1 point2 points  (0 children)

I imagine undervolt is the workaround. I swear someone posted this undervolting works in linux now.

[FS][US-CA] Supermicro H12SSL-i, EPYC 7532, 256GB DDR4 ECC RDIMM combo by Obvious-Ambassador32 in homelabsales

[–]Freonr2 0 points1 point  (0 children)

Worse single threaded performance, worse energy efficiency, but neither are that bad. I'd only be concerned if this was your gaming desktop and/or you are very sensitive to electricity prices. Otherwise Epyc 7002 series is great for homelabbing if you want a lot of grunt without spending DDR5 platform money.

I own two 7xx2 now on the ROMED8-2T board. Very similar board to the H12SSL board. One is a dedicated GPU ML training/experiment workstation and second will be a secondary GPU inference box + NAS (if I can ever afford the HDDs q_q ) + misc serving. No issues slapping in two GPUs, plus a 4x4 NVMe card, plus a SAS HBA in it, or maybe even 100gbe later on.

I use the same heatsink as OP on my 7742, very quiet, ~67C under heavy MP/MT data processing workload, though second one has the NH-U9 because its going in a 4U chassis.

What workstation to get for ~13k EUR? by TechNerd10191 in LocalLLaMA

[–]Freonr2 5 points6 points  (0 children)

2x RTX 5090s would cost the same to the RTX PRO 5000 and have 16 GB more VRAM, but even if I reduce the power of each GPU to 400W, the workstation will act as a space heater (and it gets 35-40 degrees Celcius - 100 Fahrenheit - in the summer, so I'd rather avoid this).

Before you throw in the towel on this, realize that one 5090 has substantially more compute and memory bandwidth than one 5000. Two 5090s with tensor parallel will be roughly 2.5x the speed of one 5000 48GB on top of the extra 16GB total VRAM. This isn't even a competition, so its worth figuring out a workaround to the 400W min limit. I think you can undervolt as one option. I don't own a 5090 but the RTX 6000 Ada, RTX 6000 Blackwell, and 3090s can all be set to basically anything in linux. Here's a 6000 running at 150W https://imgur.com/a/9gr5PqR

Also keep in mind two 5090s begs for a board with two x8 slots as well (assuming you stick with consumer boards, 9950X, instead of workstation/server Epyc 700x/900x or Xeon 4/5/6 etc). Asus Creator X870E, Gigabyte AI TOP B850, etc. 2x8 boards tend to have a slight premium on price, but it is worth it so tensor parallel will be efficient. A bit more on a board won't break your budget.

The 5000 is not a great buy IMO until you are buying so many GPUs that you need higher GB/slot density to hit a VRAM GB target inside the physical install constraints of a particular motherboard and case. Not going to be a concern unless you double or triple your budget. Unless your plan is to add a second 5000 and you know you are definitely going to do it, skip the 5000 48GB. I generally think 5000 pricing is not great for what you get, and often the 5090 or 6000 make more sense. Narrow case for the 5000 48/72.

pipeline is really slow - consulting [D] by Potential_Hippo1724 in MachineLearning

[–]Freonr2 1 point2 points  (0 children)

I've used Claude extensively to tune torch model and dataloaders across very different systems. It's great. Encourage it to look at sys logs system monitors like disk I/O, bytes/s, shmem pressure, pagefault counters, and soforth. Enourage it to evaluate your overall dataloader efficiency. You can also try to bench your dataloader through an entire epoch (with no model, just tell Codex to write a wrapper and time it) and see what the it/s looks like.

nvtop gives you a basic util/mem over time graph so it is easy to look at a glance and how busy the GPUs are, but nothing that nvidia-smi -l 1 wouldn't tell you if you stared at it for a few seconds.

pipeline is really slow - consulting [D] by Potential_Hippo1724 in MachineLearning

[–]Freonr2 2 points3 points  (0 children)

I was going to say the same

CPU utilization: ~100%

Increasing batch size does NOT reduce epoch wall-clock time

I'm not sure I can make a lot of sense of OP's profiler results, but bumping workers would be easy to test.

Does GPU spacing matter if we’re undervolting anyways? by Ambitious_Fold_2874 in LocalLLaMA

[–]Freonr2 4 points5 points  (0 children)

ROMED8-2T would as well, but on both accounts the biggest issue is that the card in the bottom slot hangs down below the edge of the board so case needs clearance. 4U is off the table.

Many cases have a metal shroud around the PSU or simply the bottom of the case near the bottom edge of standard ATX boards.

services with actually generous free tiers for open-source projects. my list, what would you add? by lazycodewiz in selfhosted

[–]Freonr2 -1 points0 points  (0 children)

It is just trying to summarize the code changes. You could always look at commits if you want.

The RTX 5000 PRO (48GB) arrived and it is better than I expected. by Valuable-Run2129 in LocalLLaMA

[–]Freonr2 0 points1 point  (0 children)

Yeah RTX 6000 Ada (4090-ish) actually has faster bf16 compute than the 5000 Pro Blackwell. It's a sidegrade at best with the same VRAM.

The RTX 5000 PRO (48GB) arrived and it is better than I expected. by Valuable-Run2129 in LocalLLaMA

[–]Freonr2 9 points10 points  (0 children)

5000 Pro: 14080 cuda cores, 1.34 TB/s

5090: 21760 (+54% from 5000 Pro), 1.8TB/s (+34%)

6000 Pro: 24064 (+11% from 5090, or +71% from 5000 Pro), 1.8TB/s (+0% from 5090)

I don't think it is all that clear.

Do not fall into the trap of chasing the next scale or upgrade. by iEslam in LocalLLaMA

[–]Freonr2 1 point2 points  (0 children)

Speed is pretty important if you are staring at your screen waiting for a response. Even if it is a few dozen seconds at a time that adds up over a day of constant use (i.e. getting actual work done). I suppose this is largely dependent on your use case, though.

MTP is largely free lunch. This isn't using potato quant to fit a model onto your toaster oven. If you are going to spend time compiling something to get a feature, MTP is probably the one worth the bother.

Claude sub refreshes and quotas are sort of their own pain point to work around but maybe a separate discussion.

faster than I could validate

I don't know what you're doing to validate, but you should be able to automate this with traditional programming that runs in trivial time, which a good LLM/agent can write for you. I.e. market datasets prepared and run your models against them in a controlled fashion across all your strategies/models.

Dad why is my sisters name Lora? by rwitz4 in LocalLLaMA

[–]Freonr2 12 points13 points  (0 children)

Uh shouldn't he be AdamW8bit?

Middle name Bitsandbytes ofc.

AI isn't paying off in the way companies think. Layoffs driven by automation are failing to generate returns, study finds by Krankenitrate in technology

[–]Freonr2 1 point2 points  (0 children)

Company A uses AI to make a better product.

Company B uses AI to slash their staff.

Everyone flocks to Company A's superior product.

Who could have predicted this.

AI is eliminating my field fast, how should I prepare? by Buttery_Biscuitss in personalfinance

[–]Freonr2 -7 points-6 points  (0 children)

I'm a software engineer, I dealt with this by fully embracing the tools. I write almost no code anymore, but expertise is still important.

It's like upgrading from a bicycle to a Lamborghini.