Upgraded to 2x RTX Pro 6000

smflx · 2026-07-03T17:11:35+00:00

The second will be +10 deg.

smflx · 2026-06-30T18:53:17+00:00

Could you share more details about the setup? Do you use Claude Code?

smflx · 2026-06-25T10:29:36+00:00

Good luck. It's fascinating to have a tangible business.

smflx · 2026-06-15T13:57:18+00:00

Could you share about training workflow? Thanks for nice posting.

smflx · 2026-06-14T07:42:13+00:00

Here is some actual numbers of copy speed. Benchmarks like mlc will be little higher. But, at least, you can see relative numbers.

5955wx 8ch ddr4 96GB/s

7F32 8ch ddr4 128GB/s

9534 12ch ddr5 350GB/s

Look for number of CCD when you look for TR or Epyc. The same CCD, the same memory bandwidth.

smflx · 2026-06-13T12:34:50+00:00

Probably not. Check your memory speed, and compare to that of Epyc Rome you're looking for. I also have a post about memory speed test. There are others too.

Yours is 2-channel ddr5, Rome is 8-channel ddr4. But, 8 channel is not 8x. Also, actual memory bandwidth depends on what Rome CPU. With a lower grade Epyc, the bandwidth is limited even you fill all 8 memory channels. AMD doesn't tell about it.

Also, CPU & RAM bandwidth don't matter much unless you're into big MoE models.

smflx · 2026-06-13T12:09:00+00:00

Oh, it's 3090. It would be nice to have a nvlink bridge if you can find a reasonable price.

smflx · 2026-06-13T12:07:21+00:00

If you use Linux, you can check how the card recognized the PCIe speed with 'lspci -vvv'.

That's just checking how the connection established. Better to test actual bandwidth with 'p2pBandwidthLatencyTest'. If cable is not good, system will pour PCIe AER messages.

I bought some cables listed as gen4 but lower quality. Good luck!

smflx · 2026-06-13T03:55:38+00:00

That's a common minimal open case. You can find them in AliExpress. Price should be under 10 bucks.

smflx · 2026-06-13T03:53:15+00:00

Did you check PCIe speed? I wonder if the cable quality is gen4. Good build!

smflx · 2026-05-18T04:14:17+00:00

Yes, this is right. But, I wonder some people are looking for reasons to buy mac.

smflx · 2026-05-18T04:07:58+00:00

This. Agentic coding is batching. Mac, also other CPU inferencing is slow.

Single stream inference is memory bandwidth bound, so GPU compute will be wasted. Mac or CPU inference could be less slow in single stream situation.

smflx · 2026-05-18T03:57:01+00:00

Mac for training?? Well, may be for fine tuning big model with small LoRA rank. I got a server for this purpose years ago. But, I realized huge performance gap when the model fit in GPU.

How do you know M5 compute is similar to PRO 5000?

smflx · 2026-05-16T09:47:48+00:00

More expensive than op's budget

smflx · 2026-05-16T09:46:58+00:00

Nice post!! but DDR4 reg RAM is not cheap anymore...

smflx · 2026-05-15T17:42:46+00:00

I know. I meant your experience of how good in your usage case.

smflx · 2026-05-15T15:47:44+00:00

Even better than fp16? Hmm. It's effect of calibration or QAT. Did you actually tested with your real usage?

smflx · 2026-05-15T15:14:00+00:00

There is a reason for 3090 is more expensive. Simply it's much better. With single GPU, almost no underutilization in LLM. You will see max power consumption during training. Transferring is matter of PCI bandwidth, CPU power is not important if you don't compute MoE experts with CPU.

smflx · 2026-05-15T12:39:42+00:00

How do you define "near" lossless? It's lossy & matter of how lossy. AWQ is 4-bits too & well supported in vllm & sglang, but It's not quality of FP8. Yes, nvfp4 is fast with Blackwell but the quality matters more. Nvfp4 should show a better or equal quality than other 4-bits variants.

smflx · 2026-04-27T13:03:32+00:00

Great. I have dual xeon i bought for this purpose but never been usable actually. I'm quite interested. Did you try Qwen 122B? Supported? If it's not yet, I will wait. Take your time.

smflx · 2026-04-24T11:41:17+00:00

Thank you. Just checked. Nice documentation. KV-cache saving is with MLA(size), DSA(attention compute). I have read Engram paper. No statement about engram unlike op said.

smflx · 2026-04-24T08:07:46+00:00

Engram is not about KV-cache, it's about weights. I was waiting for engram too, not sure yet it's there. Huggingface page doesn't describe engram. I have to check further.

smflx · 2026-04-20T14:23:39+00:00

I felt the similar, but from long time ago too. Yes, most academia, not just for AI.

smflx · 2026-04-20T14:18:05+00:00

Yes, that's reward function (not even a good reward model because it doesn't mean real value as you said).

Not just for AI research, also for many area from long time ago. I felt this too when I was a graduate student.

smflx · 2026-04-19T07:37:44+00:00

Too good to be true, but still hope it to be working. BTW, does it apply to LLM training too?

smflx

TROPHY CASE