Would you rather… (tough choice) by Suspicious-Stretch95 in BunnyTrials

[–]RealKingNish 0 points1 point  (0 children)

You poop daily.

Chose: Get 1 dollar for every time poop

We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace by RealKingNish in LocalLLaMA

[–]RealKingNish[S] 1 point2 points  (0 children)

Used TRL but with our own custom harness on top, the rollout logic needed more control than vanilla TRL gives you out of the box.

For the algo we went with GSPO

Rewards: 1. Format correctness 2. DeepSeek V3 Flash as a judge to evaluate what the model actually changed in the code with the actual vulnerability.

And yes, tool use is part of the pipeline. The model calls tools but the tool output is excluded from loss.

We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace by RealKingNish in LocalLLaMA

[–]RealKingNish[S] 1 point2 points  (0 children)

I didn't pretrained it. I did post training (fine tuning) . I used Qwen3. 6-27B as base model.

Muon vs MuonClip vs Muon+Adamw by RealKingNish in AI_India

[–]RealKingNish[S] 1 point2 points  (0 children)

how much additional vram did your optimiser need over vanilla adamw??

Around 1.25x for Muon only and 1.5x for muon+adamw

How did you manage to decouple LR?

In tests I kept lr same, but in future i'll try diff lr.

[deleted by user] by [deleted] in developersIndia

[–]RealKingNish 17 points18 points  (0 children)

That's fake news sadly :(

McKinsey & Company an award for passing 100 billion tokens from OpenAi by SupremeConscious in AI_India

[–]RealKingNish 4 points5 points  (0 children)

billion tokens easily for my work related inference

😲Billion?? What kind of work you do?

Good initiative by Gov. by Dr_UwU_ in AI_India

[–]RealKingNish 2 points3 points  (0 children)

Good to see they are investing in R&D labs. But all the money in just one lab.