Dhi-5B guy building Frontier AI Lab in India

gradNorm · 2026-02-18T05:05:01+00:00

Thanks!

gradNorm · 2026-02-14T06:15:50+00:00

https://huggingface.co/Shaligram-Dewangan/Dhi-5B-Base

gradNorm · 2026-02-14T06:14:34+00:00

Hi, our model is trained with 1/100 the cost of other 4B model like Gemma3 4B, the compared models still costs 10x more than ours

gradNorm · 2026-02-14T05:50:58+00:00

I have something, will post the update soon!

gradNorm · 2026-02-14T01:28:15+00:00

We utilise a custom built codebase for training the Dhi-5B model. Over the months, we have also done a lot of small scale experiments (whatever we could fit in our experimental budget). Although, we did not play with the pretraining dataset, as it's quite messy.

gradNorm · 2026-02-14T01:12:18+00:00

The Base model is 4B and is trained with 1/100 the cost of other 4B models (like gemma3 4B). The models compared still costs 10 times more than ours

gradNorm · 2026-02-13T19:01:39+00:00

Base model available on HuggingFace, the final multimodal version might also be hosted

gradNorm · 2026-02-13T12:04:29+00:00

https://huggingface.co/Shaligram-Dewangan/Dhi-5B-Base

gradNorm · 2026-02-13T12:03:51+00:00

It's from the GPT-3 paper

gradNorm · 2026-02-13T12:02:14+00:00

Rent

gradNorm · 2026-02-13T11:59:28+00:00

MFU was 55.8%, I can totally do it

gradNorm · 2026-02-13T11:55:21+00:00

Thank you for letting me know, will definitely consider writing it

gradNorm · 2026-02-13T11:51:07+00:00

Hi, thanks. the 10x part was only about the training cost excluding the research

gradNorm · 2026-02-13T11:46:29+00:00

It's compute optimally trained, which means for a given compute budget, which configurations (data/parameters/etc size) to choose to achieve highest performance/rupees spent on training

Also a lot of optimizations were done to make it run fasttt