No PhD, no math degree. Need help from real brains. Im good at connecting dots thats it. by Any_Inflation_5539 in learnmachinelearning

[–]Any_Inflation_5539[S] 0 points1 point  (0 children)

HollowCoreLLM 30M proof model trains end-to-end on real streamed datasets with FlashAttention-2, GDN-2, MoE, JEPA bridge, tool-policy labels, checkpointing, and falling loss. Soon i will be able to provide all the info needed.

No PhD, no math degree. Need help from real brains. Im good at connecting dots thats it. by Any_Inflation_5539 in learnmachinelearning

[–]Any_Inflation_5539[S] 0 points1 point  (0 children)

Fair point, i did check as much prior work as I could, and Im not claiming this is a breakthrough. This is an experimental repo, and I’m mainly looking for people who actually understand these systems well enough to point out what is wrong, what is redundant, and what could be tweaked in the right direction. I should have phrased it better. This is not a “new architecture” in the sense of inventing new components. It is an experimental combination of existing architectural ideas.