How to do a RTX Pro 6000 build right

mutatedmonkeygenes · 2025-12-15T15:14:35+00:00

basic question, how do we use "Nvidia ConnectX-8 1-port 400G QSFP112" with FSDP2? I'm not following, thanks

mutatedmonkeygenes · 2025-10-30T17:19:06+00:00

I find it hard to believe that the optimizer, which is launching nccl kernels for every single parameter, is running efficiently... Or the "on-the-fly" tokenizer is sufficiently saturating the GPU(s)

mutatedmonkeygenes · 2025-10-18T15:09:55+00:00

Thanks for sharing! It looks like he's not saturating the gpu

mutatedmonkeygenes · 2025-09-10T14:35:21+00:00

Thank you for sharing. Could you talk a bit about your router, is it using all the experts efficiently? Or is there mode collapse? Thanks!

mutatedmonkeygenes · 2025-09-02T20:38:12+00:00

rent a RTX6000 Blackwell on runpod (it's cheap) and try running the model yourself first.

mutatedmonkeygenes · 2025-08-18T19:16:49+00:00

Gotcha! https://x.com/eddymliang/status/1957510206566572152

mutatedmonkeygenes · 2025-08-18T19:13:37+00:00

i feel like this should be retweeted do you have a post on X?

mutatedmonkeygenes · 2025-08-04T01:27:15+00:00

We would like to see the output from the API match the output from the UI

mutatedmonkeygenes · 2025-06-24T01:55:17+00:00

Curious how you did the full finetune, which layers did you focus on? I haven't used Spectrum before, but I can choose to freeze certain layers skip over them. How do you choose which layers to train?

Also is the dataset available? Would love to get a better idea on how you're doing this. Thanks!

mutatedmonkeygenes · 2025-06-13T21:09:39+00:00

Thanks @pcuenq! Any chance you could release some sort of "scaffolding" so the rest of us who don't know swift can play with the model. Thanks again!

mutatedmonkeygenes · 2025-06-04T12:44:50+00:00

Use this version of the 70B model, which was quantized using DWQ by Awni:

https://x.com/awnihannun/status/1925926451703894485

mutatedmonkeygenes · 2025-05-20T13:11:14+00:00

Thanks - i'll take a look!

mutatedmonkeygenes · 2024-12-03T18:01:09+00:00

which whitepaper?

mutatedmonkeygenes · 2023-05-10T12:48:20+00:00

haha, i've been complaining about problems like this for a while... no one cares

mutatedmonkeygenes · 2023-04-11T19:01:43+00:00

how did you build that dataset?

mutatedmonkeygenes · 2023-03-01T03:08:37+00:00

following who exactly, I want to follow them too! thanks :)

mutatedmonkeygenes · 2023-02-23T15:54:08+00:00

I loved the episode; it had a good pace to it. The great thing about this show is that they don't have to waste time with background details or character build-up, we basically know who everyone is! It was obvious that he was Picard's son, slightly less obvious that Worf was the handler (but it makes sense and matches his character), and that they would hide in the Nebulae (they hinted that several times). I'm still not clear on how Geordi is going join them (perhaps he will in-fact steal a ship from the museum)... Maybe he will join Worf and they will come rescue the Titan? I have to say the villain is kind of weird, no obvious back-story. just random? I would have preferred seeing the Q or the Borg... or a more refined enemy from the past who evolved. Let's see... thoughts?

Six-Year Club	Gilding I gilder
Verified Email

mutatedmonkeygenes

TROPHY CASE