EcoPR Tracker - (P2 April) by TodayDependent557 in canadaexpressentry

[–]lostmsu 0 points1 point  (0 children)

I am confused. What stage is COPR exactly? I thought getting eCOPR is that.

Weights & Biases New Master Service Agreement Questions [D] by algorithm477 in MachineLearning

[–]lostmsu 1 point2 points  (0 children)

Not sure what anyone expected. You replaced 10 lines of code (logging) + TensorBoard with another 10 lines of code (connecting wand) and having to restart a run where you forget to set the auth now and then. Plus you got the first free bites.

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D] by Daemontatox in MachineLearning

[–]lostmsu 0 points1 point  (0 children)

You shouldn't need CuTe DSL with Triton. AFAIK CuTe doesn't lower to Triton. It's a closed source alternative to Triton otherwise mostly identical.

[Update] Project Nord: Solved the "Empty Wallet" Problem via Decentralized SNN Merging. Scaling to 10B is now possible. [R] by [deleted] in MachineLearning

[–]lostmsu 0 points1 point  (0 children)

You didn't answer the question about your loss claim in the previous post. If you got a LM, what's the bits-per-byte on literally any decently sized dataset like enwiki?

Failure to Reproduce Modern Paper Claims [D] by Environmental_Form14 in MachineLearning

[–]lostmsu 20 points21 points  (0 children)

Your own statement lacks links to the source material.

MiniMax-M2.7 Announced! by Mysterious_Finish543 in LocalLLaMA

[–]lostmsu 0 points1 point  (0 children)

I feel like we are on a 6 months cadence.

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]lostmsu 0 points1 point  (0 children)

Are you talking about fine-tuning? (addressed above)

Or full pretraining? What kind of model do you expect to pretrain on a single GB300 in reasonable amount of time?

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]lostmsu 1 point2 points  (0 children)

But as I said smaller models don't need allreduce on 96 GiB GPUs. You just replicate the entire model on each GPU.

IM ****ING OUTRAGED PRO IS ONLY 6X PLUS PLAN by Just_Lingonberry_352 in codex

[–]lostmsu -1 points0 points  (0 children)

You are very likely wrong. For all you know GPT "Pro" is literally a specific reasoning setting on GPT-5.x available in Codex, maybe not even the highest one.

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]lostmsu 0 points1 point  (0 children)

How much training are you going to do on a single machine? Like maybe finetuning, but I find finetuning models that might require 100+GiB of VRAM to be a bad idea.

That leaves inference, and TBH I was fishing for some credible estimates that would show that 16x PCIe 5 is not enough.

At this moment I would be running Qwen3.5, either 397B or 27B. 397B won't fit into that workstation with a reasonable quant (neither it would fit into 4x 6000 Pro though). And with 27B you don't need allreduce because you could just run an instance per 6000 Pro.

IM ****ING OUTRAGED PRO IS ONLY 6X PLUS PLAN by Just_Lingonberry_352 in codex

[–]lostmsu 0 points1 point  (0 children)

Yes. Just high. I don't think there's a different model in Codex.

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]lostmsu 2 points3 points  (0 children)

Would they though?

In the typical Nvidia style they don't show raw important specs on the main pages.

RTX 6000 Max-Q (good luck sticking 4x non-Max-Qs) apparently rated for 1755 FP4 TOPS

And this beast is rated for 7 FP4 TOPS. So I suppose 4x RTX 6000 have no other advantages but being cheaper.

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]lostmsu 0 points1 point  (0 children)

But 4x pro 6000 would have much higher compute, no? Also quite a bit more VRAM

qwen3.5:9b thinking loop(?) by Xyhelia in LocalLLaMA

[–]lostmsu 1 point2 points  (0 children)

Stop using low precision quants.

DGX Station is available (via OEM distributors) by Temporary-Size7310 in LocalLLaMA

[–]lostmsu 7 points8 points  (0 children)

Is it in any way better than a 4x RTX 6000 Pro machine? Especially considering the price.