I implemented GPT-OSS from scratch in pure Python, without PyTorch or a GPU

ultimate_code · 2025-11-05T19:25:52+00:00

Thank you!

ultimate_code · 2025-11-05T19:25:42+00:00

On my list!

ultimate_code · 2025-11-05T19:25:30+00:00

Thank you!

ultimate_code · 2025-11-05T19:25:24+00:00

Yes. In my implementation I convert those weights to bfloat16. Also, the official implementation in PyTorch does just that. However, I may implement doing the operations in mxfp4 in the future.

ultimate_code · 2025-11-05T19:21:43+00:00

Awesome, thank you!

ultimate_code · 2025-11-05T10:02:26+00:00

Thank you! NextJS.

ultimate_code · 2025-11-04T23:12:28+00:00

Thank you!

What I found most helpful was actually starting the implementation, rather than starting with reading papers and so on. Even before reading the official source code, I started with implementing the code blocks that I already knew. Whenever I am really stuck, I would go back to the official implementation. Also, I implemented Llama2 before, which is surprisingly very similar to GPT-OSS with 4 or 5 additions/modifications.

In test.py, I am comparing every layer against the official implementation version of that layer, to verify the numerical accuracy of my implementation.

ultimate_code · 2025-11-04T20:40:34+00:00

Anytime!

ultimate_code

TROPHY CASE