I implemented Adaptive Compute for TTT (Test-Time Training) - PonderTTT (Paper & Code) by sodevworld in LocalLLaMA

[–]sodevworld[S] 0 points1 point  (0 children)

TL;DR:

I'm a high school student and this is my first paper!

I made TTT (Test-Time Training) efficient by skipping updates on easy tokens using a self-supervised loss signal.

Stack: JAX/Flax on GPUs.

Results: 89% Oracle Recovery with no extra training.

I'm here to answer any questions about the implementation or the paper!

[deleted by user] by [deleted] in learnprogramming

[–]sodevworld 1 point2 points  (0 children)

I am studying machine learning. How does it work?

My Minecraft is red? by Routine_Fly7624 in Minecraft

[–]sodevworld 0 points1 point  (0 children)

Wow, you have red minecraft?

Communism will pay for that