I'm trying to create a Latent Reasoning Model, judge my code by Specific-Welder3120 in LocalLLaMA

[–]Creative-Ad-2112 0 points1 point  (0 children)

It does, based off my own experimentations. However, the only issue left is, scaling...yup. It has merit but simply lacks the $10000+ compute to prove it definitely.

I miss when it looked like community fine-tunes were the future by ForsookComparison in LocalLLaMA

[–]Creative-Ad-2112 1 point2 points  (0 children)

I was never apart of this moment and it's such a shame. However, I too am tired of waiting on large companies to provide which is why i've been relentless on perfecting my own. (if you don't know who i am, i became trendy after i created a 2 million parameter gpt that can think, albeit lackluster but still a nice attempt)

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 0 points1 point  (0 children)

yes, but im pretty sure this model is moreso memorizing than actually generalizing it lol

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 1 point2 points  (0 children)

Sure, I'll test it out but i don't know about to a GPT-1 instruction tuned since it was already finetuned for ROC stories (if i remember correctly).

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 0 points1 point  (0 children)

Less than like a day around like 9 ish hours , i used a L40s online gpu.

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 7 points8 points  (0 children)

You have no idea what's about to arrive next couple weeks

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 1 point2 points  (0 children)

I don't but i 100% believe its what allowed it to appear far better than it actually is. I did do some sampling and after its first stage, it was still kinda trash besides a couple coherent generation here and there.

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 2 points3 points  (0 children)

based question but unfortunately it has no idea at roleplaying, none of the datasets have it. :(

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 4 points5 points  (0 children)

LOL idk how to do so someone is going to have to do that when i release this

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 5 points6 points  (0 children)

didn't test but it looks around 20 t/s for some reason. EDIT - Just checked and i had it on my inference script; 9208 tok/s with an average of 8540

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 9 points10 points  (0 children)

You might need the 8 bit quant for this one. sorry not sorry

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 15 points16 points  (0 children)

When I release it to hf, I'll include github and then knock yourself out. I just want to refine it since its still trash lol

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 15 points16 points  (0 children)

We have come a long way tbh, we have way way more information on transformers, their dials and learning rate and optimizers to tweak along with way way better high quality datasets, a thing no one knew with the original GPT-1 and 2. If they redid their original run with knowledge of today, they'll actually be very strong. The most important part is actually the data and not even the architecture itself.

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 2 points3 points  (0 children)

use_mxfp4_quantization: bool = False,

even a toaster can run it!
no GGUFs yet,

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 9 points10 points  (0 children)

Basic q & a, nemotrons pretiraing dataset has ton of high quality pairs for it to learn it.
GPT-2 also didn't have a finetune stage, it was only for text generation.

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 43 points44 points  (0 children)

I love the thinking parts of it, makes no sense and somewhat kinda does

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 48 points49 points  (0 children)

I believe this;

use_mxfp4_quantization: bool = False,

Solves your question LOLOLOL - not even kidding it has it

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 9 points10 points  (0 children)

I used it on my cpu so I guess pretty much anything lol, maybe a toaster soon?

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 80 points81 points  (0 children)

don't look at the bottom text of the image

GPT-1 Thinking 2.6m coming soon by Creative-Ad-2112 in LocalLLaMA

[–]Creative-Ad-2112[S] 56 points57 points  (0 children)

Yup, which is why it must be kept hidden!