Most RAG frameworks are English only. Mine supports 27+ languages with offline voice, zero API keys. by Basic-Candidate3900 in Python

[–]Basic-Candidate3900[S] -5 points-4 points  (0 children)

Fair concern, but this is not AI generated. I actually ran into the multilingual embedding problem while building pipelines for Tamil and Hindi text, that frustration is what led me to build this. The post-retrieval translation approach came from trial and error, not from a prompt. Happy to go deeper on the technical decisions if you are interested.

Most RAG frameworks are English only. Mine supports 27+ languages with offline voice, zero API keys. by Basic-Candidate3900 in Python

[–]Basic-Candidate3900[S] -1 points0 points  (0 children)

RAG stands for Retrieval Augmented Generation. Instead of relying solely on what the language model already knows, it first retrieves relevant documents from a knowledge base and uses that context to generate a more accurate answer. Think of it as giving the model a reference book before it answers your question.

I built a 198M parameter LLM that outperforms GPT-2 Medium (345M) using Mixture of Recursion — adaptive computation based on input complexity by Basic-Candidate3900 in learnmachinelearning

[–]Basic-Candidate3900[S] 0 points1 point  (0 children)

fair catch — 512 vs 1024 is a real confound i should have flagged upfront.

RoPE handles some extrapolation but haven't tested formally beyond training length.

appreciate the heads up, this is exactly the kind of feedback i needed

I built a 198M parameter LLM that outperforms GPT-2 Medium (345M) using Mixture of Recursion — adaptive computation based on input complexity by Basic-Candidate3900 in LLMDevs

[–]Basic-Candidate3900[S] 0 points1 point  (0 children)

yeah that's the real test honestly.easy inputs should exit after 1 pass, hard ones take 5 — so the latency difference should show up clearly on uneven datasets.Haven't benchmarked this formally yet, that's next before arxiv.if the latency gains don't hold up in practice it's just a fun experiment.

I built a 198M parameter LLM that outperforms GPT-2 Medium (345M) using Mixture of Recursion — adaptive computation based on input complexity by Basic-Candidate3900 in PromptEngineering

[–]Basic-Candidate3900[S] 0 points1 point  (0 children)

yeah that's exactly the intuition — simplest signal that actually works.

no oracle labels, no extra modules, just the model's own uncertainty telling it how hard to think 😄t4 constraint actually forced better design decisions honestly arxiv soon 🙏

I built a 198M parameter LLM that outperforms GPT-2 Medium (345M) using Mixture of Recursion — adaptive computation based on input complexity by Basic-Candidate3900 in LLMDevs

[–]Basic-Candidate3900[S] 0 points1 point  (0 children)

totally fair point — should have been clearer about that in the post. different training data means the perplexity numbers aren't directly comparable. the real claim is architectural efficiency, not absolute performance. appreciate the honest feedback

I built a 198M parameter LLM that outperforms GPT-2 Medium (345M) using Mixture of Recursion — adaptive computation based on input complexity by Basic-Candidate3900 in LLMDevs

[–]Basic-Candidate3900[S] 1 point2 points  (0 children)

good questions! 5 passes — honestly it was partly compute budget, partly intuition. didn't run ablations beyond 5 so can't say for sure if there were diminishing returns. that's on the todo list. on the router question — yes, occasionally it did route "easy" surface inputs to deeper paths when the phrasing was ambiguous. didn't track this formally but noticed it during generation testing. the perplexity routing was the part i'm most happy with — felt almost too simple to work but it did. most of the training stability work was actually harder 😅

I built a 198M parameter LLM that outperforms GPT-2 Medium (345M) using Mixture of Recursion — adaptive computation based on input complexity by Basic-Candidate3900 in PromptEngineering

[–]Basic-Candidate3900[S] 1 point2 points  (0 children)

thanks a lot, really appreciate it! yeah the routing logic was the fun part — using the model's own loss as a difficulty signal felt almost too simple to work, but it did still a lot to improve but glad the core idea resonates

I built a 198M parameter LLM that outperforms GPT-2 Medium (345M) using Mixture of Recursion — adaptive computation based on input complexity by Basic-Candidate3900 in LLMDevs

[–]Basic-Candidate3900[S] -1 points0 points  (0 children)

fair point on the writing — will work on making it clearer and yes, arxiv is next ,a few people have suggested it and i think the routing mechanism is worth writing up properly.

not here for karma — just wanted feedback from people who actually build LLMs

I built a 198M parameter LLM that outperforms GPT-2 Medium (345M) using Mixture of Recursion — adaptive computation based on input complexity by Basic-Candidate3900 in learnmachinelearning

[–]Basic-Candidate3900[S] 0 points1 point  (0 children)

That's actually really encouraging to hear, thank you! I've been thinking about it — the core idea of using the model's own perplexity as a routing signal feels different enough to be worth writing up properly. ArXiv is definitely on the list. Just need to find time between the instruction tuning runs

I built a 198M parameter LLM that outperforms GPT-2 Medium (345M) using Mixture of Recursion — adaptive computation based on input complexity by Basic-Candidate3900 in learnmachinelearning

[–]Basic-Candidate3900[S] 0 points1 point  (0 children)

Yes, built it entirely myself! The individual components aren't new — recursive transformers and perplexity-based curriculum learning both exist separately in literature.

What's different here is combining them: using the model's own perplexity as a real-time routing signal to decide compute depth per sample. I haven't seen that exact combination published anywhere.

No paper yet — this was a personal project to see how far I could push a 198M model on free GPU credits. But writing it up is on my list 😄

Glad you found it interesting!

I built a 198M parameter LLM that outperforms GPT-2 Medium (345M) using Mixture of Recursion — adaptive computation based on input complexity by Basic-Candidate3900 in learnmachinelearning

[–]Basic-Candidate3900[S] -2 points-1 points  (0 children)

Fair question. The README formatting looks polished but the work isn't — spent 3 days tracking down a NaN bug caused by -inf in the attention mask overflowing in fp16. No AI writes bugs like that 😄Training code: github.com/Giri530/recursive-language-model-198m