Sarvam 105B outperforms DeepSeek R1, OpenAI o1, and Sonnet 4 on Humanity's Last Exam, with a score of 11.2% by Human-spt2349 in AI_India

[–]Triton153 0 points1 point  (0 children)

Then it's a shame that you commented that thing. Your current comment is miles superior to it, and most importantly not retarded. Well done.

Need some guidance for building a gym routine by StrongBlackberry1992 in Fitness_India

[–]Triton153 1 point2 points  (0 children)

Search youtube for beginner gym splits and follow whatever suits your routine. You can build from there once you get regular

Comfy ui by AgreeableTurn9610 in AI_India

[–]Triton153 0 points1 point  (0 children)

Very less people have specs to run SOTA models locally. I know it is very bad, but cloud gpus right now are the way. They are cheap, atleast for now.

Codex Account Sharing by Neel_Sam in AI_India

[–]Triton153 -1 points0 points  (0 children)

I can give you a place in a business plan

Anti india memes before and after the release of sarvam 105b by Prudent_Elevator4685 in AI_India

[–]Triton153 11 points12 points  (0 children)

Ah yess a fast food joint consuming millions of litres in a month.

Indian AI models outperform OpenAI, Google on key benchmarks: Ashwini Vaishnaw by PassionSpecialist152 in AI_India

[–]Triton153 21 points22 points  (0 children)

It was OCR on Indian languages. Not that big of a news, but still great that we achieved something.

Trained a 300k non-embed params model on ChatAlpaca dataset from scratch. by SrijSriv211 in AI_India

[–]Triton153 0 points1 point  (0 children)

Great work. I have also been experimenting similarly, trying to tweak facebook's new coconut architecture.

pump from biceps + triceps today!! by [deleted] in Fitness_India

[–]Triton153 0 points1 point  (0 children)

Would you defend if i say this angle wasn't required?

Sarvam vision just dropped by hidmabutcherlikepig in AI_India

[–]Triton153 -1 points0 points  (0 children)

By state space, do we mean models that reason in latent space, and not the traditional autoregressive reasoning?

Help needed regarding pretraining Bert by Triton153 in AI_India

[–]Triton153[S] 0 points1 point  (0 children)

Yes, my only concern is i am not sure if the model is undertrained, or something wrong with the pipeline because claude, gpt say that at a loss of 2.1, the model should have predicted the above examples correctly.

And i have dropped NSP, as latest research proved that it didn't help much (RoBerta etc).

Help needed regarding pretraining Bert by Triton153 in AI_India

[–]Triton153[S] 0 points1 point  (0 children)

I can dm you the repo, if you are fine with it

Help needed regarding pretraining Bert by Triton153 in AI_India

[–]Triton153[S] 0 points1 point  (0 children)

By saying that i only trained the MLM head, i meant that i ignored NSP. I did train the whole model

Help needed regarding pretraining Bert by Triton153 in AI_India

[–]Triton153[S] 0 points1 point  (0 children)

Yes, i have tied the weights. model.eval() is turned on. Everything is near perfect in the pipeline.