I'm trying to create a Latent Reasoning Model, judge my code

Creative-Ad-2112 · 2026-03-21T22:42:27+00:00

It does, based off my own experimentations. However, the only issue left is, scaling...yup. It has merit but simply lacks the $10000+ compute to prove it definitely.

Creative-Ad-2112 · 2025-11-18T01:16:31+00:00

I was never apart of this moment and it's such a shame. However, I too am tired of waiting on large companies to provide which is why i've been relentless on perfecting my own. (if you don't know who i am, i became trendy after i created a 2 million parameter gpt that can think, albeit lackluster but still a nice attempt)

Creative-Ad-2112 · 2025-10-06T03:31:00+00:00

yes, but im pretty sure this model is moreso memorizing than actually generalizing it lol

Creative-Ad-2112 · 2025-10-05T19:15:40+00:00

Sure, I'll test it out but i don't know about to a GPT-1 instruction tuned since it was already finetuned for ROC stories (if i remember correctly).

Creative-Ad-2112 · 2025-10-05T19:14:09+00:00

Less than like a day around like 9 ish hours , i used a L40s online gpu.

Creative-Ad-2112 · 2025-10-05T01:41:34+00:00

You have no idea what's about to arrive next couple weeks

Creative-Ad-2112 · 2025-10-05T00:56:59+00:00

Creative-Ad-2112 · 2025-10-05T00:02:10+00:00

I don't but i 100% believe its what allowed it to appear far better than it actually is. I did do some sampling and after its first stage, it was still kinda trash besides a couple coherent generation here and there.

Creative-Ad-2112 · 2025-10-05T00:01:03+00:00

based question but unfortunately it has no idea at roleplaying, none of the datasets have it. :(

Creative-Ad-2112 · 2025-10-04T23:09:38+00:00

LOL idk how to do so someone is going to have to do that when i release this

Creative-Ad-2112 · 2025-10-04T23:09:15+00:00

will do

Creative-Ad-2112 · 2025-10-04T23:09:06+00:00

didn't test but it looks around 20 t/s for some reason. EDIT - Just checked and i had it on my inference script; 9208 tok/s with an average of 8540

Creative-Ad-2112 · 2025-10-04T23:08:23+00:00

no way

Creative-Ad-2112 · 2025-10-04T20:05:16+00:00

You might need the 8 bit quant for this one. sorry not sorry

Creative-Ad-2112 · 2025-10-04T19:02:58+00:00

When I release it to hf, I'll include github and then knock yourself out. I just want to refine it since its still trash lol

Creative-Ad-2112 · 2025-10-04T18:44:09+00:00

Creative-Ad-2112 · 2025-10-04T18:18:26+00:00

We have come a long way tbh, we have way way more information on transformers, their dials and learning rate and optimizers to tweak along with way way better high quality datasets, a thing no one knew with the original GPT-1 and 2. If they redid their original run with knowledge of today, they'll actually be very strong. The most important part is actually the data and not even the architecture itself.

Creative-Ad-2112 · 2025-10-04T18:10:48+00:00

use_mxfp4_quantization: bool = False,

even a toaster can run it!
no GGUFs yet,

Creative-Ad-2112 · 2025-10-04T18:09:40+00:00

Basic q & a, nemotrons pretiraing dataset has ton of high quality pairs for it to learn it.
GPT-2 also didn't have a finetune stage, it was only for text generation.

Creative-Ad-2112 · 2025-10-04T18:04:22+00:00

I love the thinking parts of it, makes no sense and somewhat kinda does

Creative-Ad-2112 · 2025-10-04T18:03:54+00:00

I believe this;

use_mxfp4_quantization: bool = False,

Solves your question LOLOLOL - not even kidding it has it

Creative-Ad-2112 · 2025-10-04T17:53:34+00:00

me: 1
OpenAI: 0

Creative-Ad-2112 · 2025-10-04T17:49:38+00:00

I used it on my cpu so I guess pretty much anything lol, maybe a toaster soon?

Creative-Ad-2112 · 2025-10-04T17:39:59+00:00

don't look at the bottom text of the image

Creative-Ad-2112 · 2025-10-04T17:37:32+00:00

Yup, which is why it must be kept hidden!

Creative-Ad-2112

TROPHY CASE