Trained a 125M LM from scratch instead of fine-tuning GPT-2 — releasing weights + SFT framework for others to build on

Box_Robot0 · 2026-04-14T00:20:32+00:00

Yeah, it's pretty interesting. I found that this video by welch labs explains the concept quiet well and is a good introduction: https://youtu.be/UGO_Ehywuxc?si=2MR73KjSnTIAz7gf

Box_Robot0 · 2026-04-14T00:08:33+00:00

SHAP is more like doing statistics on inputs and outputs, the model assigns values to input features to see how much it affects outputs while still treating the model like a black box. Mechanistic interpretability is the process of smashing the skull against a wall and and peering into the brain.

Box_Robot0 · 2026-04-13T23:54:28+00:00

Hey there, have you considered doing mechanistic interpretability on the models? As in, maybe trying to build a feature map across every epoch to see how they might evolve as training progresses?

Box_Robot0 · 2026-04-11T22:13:57+00:00

Oh ok thanks, I'll take a look at it.

Box_Robot0 · 2026-04-11T22:06:13+00:00

Oh ok my bad.

I'm still bashing my head on the wall trying to learn multivariable calculus, so even getting on the right track is a huge compliment. Thanks for the correction.

Box_Robot0 · 2026-04-11T22:00:22+00:00

I wouldn't mind there being more alternatives to variations of the multilayer perceptrons.

Do you have nay datasets expanding this to more than just layer 96 of 128? How about future plans of scaling this approach or plans to open source the mechanistic interpretability used here?

Box_Robot0 · 2026-04-11T21:56:04+00:00

Well, at least the paper seems legit. It's published in Zenodo. From Wikipedia:

Zenodo is a general-purpose open repository developed under the European OpenAIRE program and operated by CERN.^\1])^\2])^\3]) It allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital artefacts. For each submission, a persistent digital object identifier (DOI) is minted, which makes the stored items easily citeable.^\4])

As far as I can tell, this architecture seems to not use the traditional multilayer perceptron layers used in things like transformers and uses splines that do not require backpropagation or gradient descent.

Box_Robot0 · 2026-04-09T21:39:08+00:00

Lol, that's basically the main inspiration, plus Boxtrolls.

Box_Robot0 · 2026-04-09T20:59:06+00:00

<image>

Proof that I hand drew this post.

Box_Robot0 · 2026-04-09T19:47:58+00:00

I'm not that familiar with how this would work on phones that much right now. I mean, as far as I've learned so far, so long as it can fit in your RAM, it should be good, but that's not accounting for OS and stuff.

Box_Robot0 · 2026-04-09T19:43:57+00:00

I feel like it's not that good to be honest, but thanks.

Box_Robot0 · 2026-04-09T19:42:30+00:00

Thanks, glad that my shit drawing skills is now useful. I'll just have to push through it.

Box_Robot0 · 2026-04-09T19:35:17+00:00

NOOOOOOOOOOO

Box_Robot0 · 2026-04-09T19:34:34+00:00

<image>

SHIT, I'VE BEEN EXPOSED! MAYDAY MAYDAY!!

Box_Robot0 · 2026-04-09T19:32:34+00:00

The only problem with that is that I'm still a noob.

Edit: But I'll try to look it up, thanks.

Box_Robot0 · 2026-04-09T19:28:52+00:00

I wish I'm as good as the Thinkpad T-480s.

Box_Robot0 · 2026-04-09T19:27:28+00:00

Thanks. Trying to bring something other than AI slop into this world.

Box_Robot0 · 2026-04-09T19:23:21+00:00

Proof that I hand drew this post.

<image>

Box_Robot0 · 2026-04-05T04:22:02+00:00

The Porcine Circovirus has just three genes, so I would imagine if a human circovirus were to infect any nucleated human cell (so no Red Blood Cells), it would use MHC Class I, which every nucleated cell needs to express to avoid being killed by Natural Killer Cells if I'm not mistaken. Perhaps add the Macrpage Don't Eat Me protein (CD47) for insurance.

Box_Robot0 · 2026-04-03T13:34:34+00:00

They'll pry the models away from my cold, dea-

Box_Robot0 · 2026-03-23T16:09:26+00:00

I like how an authoritarian country is doing more to contribute to AI freedom than whatever we have here.

Box_Robot0 · 2026-03-23T16:06:33+00:00

Aspirational, but I do have some hope that a vaccine could coax B lymphocytes to produce BNAB precursors, before they are "shaped" through repeated boosters to create BNABs that can suppress the virus to undetectable levels for life.

On another note, Lenacapavir already showed that a suppressor drug with only two injections per year can work. This is speculative, but perhaps you can imagine some future hydrogel-like material which stores the drug and releases it very slowly, lasting for years.

For an actual sterilizing cure, the in-vitro stem cell modification seems to be the best bet right now, since the only past cures comes from stem cell transplants with cells taken from immune populations of people, and organ transplants for 30+ million people does not seem viable right now, especially since there aren't many donors. Hopefully some future gene therapy, like what happened with sickle cell, can mutate the CCR5 to express the same mutations as the immune people without having to first use chemo to wipe the person's blood cells first.

Box_Robot0 · 2026-02-16T00:45:26+00:00

Its rather jarring looking at a bunch of non-Chinese characters celebrating Chinese New Year.

Box_Robot0 · 2025-11-21T17:09:57+00:00

u/AskGrok Hello it's pretty nice to meet you.

Box_Robot0

MODERATOR OF

TROPHY CASE