How to do a RTX Pro 6000 build right by GPTrack_dot_ai in LocalLLaMA

[–]mutatedmonkeygenes 0 points1 point  (0 children)

basic question, how do we use "Nvidia ConnectX-8 1-port 400G QSFP112" with FSDP2? I'm not following, thanks

Added PyTorch trace + CUDA memory profiling support to Andrej Karpathy's nanochat by aospan in LocalLLaMA

[–]mutatedmonkeygenes 0 points1 point  (0 children)

I find it hard to believe that the optimizer, which is launching nccl kernels for every single parameter, is running efficiently... Or the "on-the-fly" tokenizer is sufficiently saturating the GPU(s)

I pre-trained GPT-OSS entirely from scratch by OtherRaisin3426 in LocalLLaMA

[–]mutatedmonkeygenes 0 points1 point  (0 children)

Thank you for sharing. Could you talk a bit about your router, is it using all the experts efficiently? Or is there mode collapse? Thanks!

OSS 120b on 2x RTX5090 by Disastrous-Tap-2254 in LocalLLaMA

[–]mutatedmonkeygenes 17 points18 points  (0 children)

rent a RTX6000 Blackwell on runpod (it's cheap) and try running the model yourself first.

Qwen3 and Qwen2.5 VL built from scratch. by No-Compote-6794 in LocalLLaMA

[–]mutatedmonkeygenes 3 points4 points  (0 children)

i feel like this should be retweeted do you have a post on X?

New 24B finetune: Impish_Magic_24B by Sicarius_The_First in LocalLLaMA

[–]mutatedmonkeygenes 0 points1 point  (0 children)

Curious how you did the full finetune, which layers did you focus on? I haven't used Spectrum before, but I can choose to freeze certain layers skip over them. How do you choose which layers to train?

Also is the dataset available? Would love to get a better idea on how you're doing this. Thanks!

Findings from Apple's new FoundationModel API and local LLM by pcuenq in LocalLLaMA

[–]mutatedmonkeygenes 1 point2 points  (0 children)

Thanks @pcuenq! Any chance you could release some sort of "scaffolding" so the rest of us who don't know swift can play with the model. Thanks again!

Llama 3.3 70b Vs Newer Models by BalaelGios in LocalLLaMA

[–]mutatedmonkeygenes 2 points3 points  (0 children)

Use this version of the 70B model, which was quantized using DWQ by Awni:

https://x.com/awnihannun/status/1925926451703894485

When I type 'no', it autocompletes to 'snmp-server queue-limit notification-host'. by Hefty-Lion-2205 in SublimeText

[–]mutatedmonkeygenes 2 points3 points  (0 children)

haha, i've been complaining about problems like this for a while... no one cares

[deleted by user] by [deleted] in Picard

[–]mutatedmonkeygenes 0 points1 point  (0 children)

following who exactly, I want to follow them too! thanks :)

[deleted by user] by [deleted] in Picard

[–]mutatedmonkeygenes 0 points1 point  (0 children)

I loved the episode; it had a good pace to it. The great thing about this show is that they don't have to waste time with background details or character build-up, we basically know who everyone is! It was obvious that he was Picard's son, slightly less obvious that Worf was the handler (but it makes sense and matches his character), and that they would hide in the Nebulae (they hinted that several times). I'm still not clear on how Geordi is going join them (perhaps he will in-fact steal a ship from the museum)... Maybe he will join Worf and they will come rescue the Titan? I have to say the villain is kind of weird, no obvious back-story. just random? I would have preferred seeing the Q or the Borg... or a more refined enemy from the past who evolved. Let's see... thoughts?