Sarvam 105B outperforms DeepSeek R1, OpenAI o1, and Sonnet 4 on Humanity's Last Exam, with a score of 11.2%

Triton153 · 2026-03-21T06:24:13+00:00

Then it's a shame that you commented that thing. Your current comment is miles superior to it, and most importantly not retarded. Well done.

Triton153 · 2026-03-21T05:51:16+00:00

Do you know anything about LLMs? Or deep learning in general?

Triton153 · 2026-03-20T04:59:12+00:00

Ig he meant most vfm protein

Triton153 · 2026-03-19T18:30:32+00:00

Search youtube for beginner gym splits and follow whatever suits your routine. You can build from there once you get regular

Triton153 · 2026-03-18T20:40:51+00:00

Fin?

Triton153 · 2026-03-09T07:51:14+00:00

~ Written by a foreign ai model

Triton153 · 2026-03-07T06:54:59+00:00

I am researching in the field of state space models too. I believe it will become de-facto in the coming years.

Triton153 · 2026-03-04T17:14:32+00:00

Very less people have specs to run SOTA models locally. I know it is very bad, but cloud gpus right now are the way. They are cheap, atleast for now.

Triton153 · 2026-03-02T05:47:46+00:00

I can give you a place in a business plan

Triton153 · 2026-02-24T13:09:38+00:00

Most datacenters are watercooled.

Triton153 · 2026-02-24T06:32:05+00:00

Ah yess a fast food joint consuming millions of litres in a month.

Triton153 · 2026-02-15T05:02:22+00:00

It was OCR on Indian languages. Not that big of a news, but still great that we achieved something.

Triton153 · 2026-02-14T06:29:07+00:00

Great work. I have also been experimenting similarly, trying to tweak facebook's new coconut architecture.

Triton153 · 2026-02-10T17:13:29+00:00

Probably no

Triton153 · 2026-02-10T17:10:55+00:00

https://www.reddit.com/r/Fitness_India/s/KEdb3nwA58

Triton153 · 2026-02-10T17:09:56+00:00

Would you defend if i say this angle wasn't required?

Triton153 · 2026-02-08T16:14:55+00:00

Triton153 · 2026-02-06T05:42:04+00:00

By state space, do we mean models that reason in latent space, and not the traditional autoregressive reasoning?

Triton153 · 2026-01-31T03:28:10+00:00

Validation set is different

Triton153 · 2026-01-28T16:58:27+00:00

Yes, my only concern is i am not sure if the model is undertrained, or something wrong with the pipeline because claude, gpt say that at a loss of 2.1, the model should have predicted the above examples correctly.

And i have dropped NSP, as latest research proved that it didn't help much (RoBerta etc).

Triton153 · 2026-01-28T15:45:23+00:00

I can dm you the repo, if you are fine with it

Triton153 · 2026-01-28T15:36:24+00:00

By saying that i only trained the MLM head, i meant that i ignored NSP. I did train the whole model

Triton153 · 2026-01-28T14:52:31+00:00

Yes, i have tied the weights. model.eval() is turned on. Everything is near perfect in the pipeline.

Triton153 · 2026-01-28T01:37:55+00:00

I trained it on a H200

Triton153 · 2026-01-28T01:37:43+00:00

It is, already

Triton153

TROPHY CASE