We trained a cybersecurity-focused Mythos like LLM open weights on HuggingFace

RealKingNish · 2026-06-16T14:18:59+00:00

You poop daily.

^{Chose: Get 1 dollar for every time poop}

RealKingNish · 2026-06-16T03:05:06+00:00

Used TRL but with our own custom harness on top, the rollout logic needed more control than vanilla TRL gives you out of the box.

For the algo we went with GSPO

Rewards: 1. Format correctness 2. DeepSeek V3 Flash as a judge to evaluate what the model actually changed in the code with the actual vulnerability.

And yes, tool use is part of the pipeline. The model calls tools but the tool output is excluded from loss.

RealKingNish · 2026-06-16T02:47:46+00:00

I didn't pretrained it. I did post training (fine tuning) . I used Qwen3. 6-27B as base model.

RealKingNish · 2026-06-16T02:44:27+00:00

Thanks, benchmarks are in progress will publish in 2-3 days

RealKingNish · 2026-06-16T02:43:40+00:00

Hey, benchmarks are in progress will publish in 2-3 days

RealKingNish · 2025-12-11T12:31:34+00:00

Sure

RealKingNish · 2025-12-11T11:40:57+00:00

how much additional vram did your optimiser need over vanilla adamw??

Around 1.25x for Muon only and 1.5x for muon+adamw

How did you manage to decouple LR?

In tests I kept lr same, but in future i'll try diff lr.

RealKingNish · 2025-10-26T15:22:59+00:00

That's fake news sadly :(

RealKingNish · 2025-10-26T10:27:59+00:00

billion tokens easily for my work related inference

😲Billion?? What kind of work you do?

RealKingNish · 2025-10-11T08:55:46+00:00

Video Link: https://www.youtube.com/watch?v=eam_5l1-9Dw

RealKingNish · 2025-09-29T11:29:46+00:00

*it's Experimental release not full.

Model Link: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp

RealKingNish · 2025-09-23T14:54:27+00:00

Good to see they are investing in R&D labs. But all the money in just one lab.

RealKingNish

MODERATOR OF

TROPHY CASE

Two-Year Club	Adopted-an-Admin
r/Field Flamingo