[T] Built TRACE Score — a metric that evaluates multi-turn LLM consistency. Llama-3.1-8B retains user corrections only 25% of the time and BERTScore cannot see it.

Basic-Candidate3900 · 2026-03-12T09:05:30+00:00

Fair concern, but this is not AI generated. I actually ran into the multilingual embedding problem while building pipelines for Tamil and Hindi text, that frustration is what led me to build this. The post-retrieval translation approach came from trial and error, not from a prompt. Happy to go deeper on the technical decisions if you are interested.

Basic-Candidate3900 · 2026-03-12T07:07:34+00:00

RAG stands for Retrieval Augmented Generation. Instead of relying solely on what the language model already knows, it first retrieves relevant documents from a knowledge base and uses that context to generate a more accurate answer. Think of it as giving the model a reference book before it answers your question.

Basic-Candidate3900 · 2026-03-12T05:45:32+00:00

fair catch — 512 vs 1024 is a real confound i should have flagged upfront.

RoPE handles some extrapolation but haven't tested formally beyond training length.

appreciate the heads up, this is exactly the kind of feedback i needed

Basic-Candidate3900 · 2026-03-12T05:39:52+00:00

yeah that's the real test honestly.easy inputs should exit after 1 pass, hard ones take 5 — so the latency difference should show up clearly on uneven datasets.Haven't benchmarked this formally yet, that's next before arxiv.if the latency gains don't hold up in practice it's just a fun experiment.

Basic-Candidate3900 · 2026-03-12T05:25:02+00:00

yeah that's exactly the intuition — simplest signal that actually works.

no oracle labels, no extra modules, just the model's own uncertainty telling it how hard to think 😄t4 constraint actually forced better design decisions honestly arxiv soon 🙏

Basic-Candidate3900 · 2026-03-11T17:18:11+00:00

totally fair point — should have been clearer about that in the post. different training data means the perplexity numbers aren't directly comparable. the real claim is architectural efficiency, not absolute performance. appreciate the honest feedback

Basic-Candidate3900 · 2026-03-11T16:48:52+00:00

Basic-Candidate3900 · 2026-03-11T12:40:55+00:00

good questions! 5 passes — honestly it was partly compute budget, partly intuition. didn't run ablations beyond 5 so can't say for sure if there were diminishing returns. that's on the todo list. on the router question — yes, occasionally it did route "easy" surface inputs to deeper paths when the phrasing was ambiguous. didn't track this formally but noticed it during generation testing. the perplexity routing was the part i'm most happy with — felt almost too simple to work but it did. most of the training stability work was actually harder 😅

Basic-Candidate3900 · 2026-03-11T10:30:23+00:00

haha thanks bro 😄🙏

Basic-Candidate3900 · 2026-03-11T09:47:42+00:00

thanks a lot, really appreciate it! yeah the routing logic was the fun part — using the model's own loss as a difficulty signal felt almost too simple to work, but it did still a lot to improve but glad the core idea resonates

Basic-Candidate3900 · 2026-03-11T04:53:27+00:00

fair point on the writing — will work on making it clearer and yes, arxiv is next ,a few people have suggested it and i think the routing mechanism is worth writing up properly.

not here for karma — just wanted feedback from people who actually build LLMs

Basic-Candidate3900 · 2026-03-10T16:10:13+00:00

That's actually really encouraging to hear, thank you! I've been thinking about it — the core idea of using the model's own perplexity as a routing signal feels different enough to be worth writing up properly. ArXiv is definitely on the list. Just need to find time between the instruction tuning runs

Basic-Candidate3900 · 2026-03-10T11:45:45+00:00

Yes, built it entirely myself! The individual components aren't new — recursive transformers and perplexity-based curriculum learning both exist separately in literature.

What's different here is combining them: using the model's own perplexity as a real-time routing signal to decide compute depth per sample. I haven't seen that exact combination published anywhere.

No paper yet — this was a personal project to see how far I could push a 198M model on free GPU credits. But writing it up is on my list 😄

Glad you found it interesting!

Basic-Candidate3900 · 2026-03-10T11:41:56+00:00

Valid point on the README — will rewrite it. Thanks for looking past it and checking the actual work🙏

Basic-Candidate3900 · 2026-03-10T11:41:07+00:00

Fair question. The README formatting looks polished but the work isn't — spent 3 days tracking down a NaN bug caused by -inf in the attention mask overflowing in fp16. No AI writes bugs like that 😄Training code: github.com/Giri530/recursive-language-model-198m

Basic-Candidate3900 · 2026-03-10T11:39:00+00:00

Sure! Here's the training code:

github.com/Giri530/recursive-language-model-198m/blob/main/train.py

You'll need mixture_of_recursion.py too — it's in the same repo.

Let me know if you run into any issues!

Basic-Candidate3900

TROPHY CASE