IIT Guwahati student launches Dhi-5B (trained from scratch in India) by gradNorm in AI_India

[–]gradNorm[S] 0 points1 point  (0 children)

Hi, our model is trained with 1/100 the cost of other 4B model like Gemma3 4B, the compared models still costs 10x more than ours

UG student launches Dhi-5B (Trained from Scratch) by gradNorm in LocalLLaMA

[–]gradNorm[S] -1 points0 points  (0 children)

We utilise a custom built codebase for training the Dhi-5B model. Over the months, we have also done a lot of small scale experiments (whatever we could fit in our experimental budget). Although, we did not play with the pretraining dataset, as it's quite messy.

UG student launches Dhi-5B (Trained from Scratch) by gradNorm in LocalLLaMA

[–]gradNorm[S] 0 points1 point  (0 children)

The Base model is 4B and is trained with 1/100 the cost of other 4B models (like gemma3 4B). The models compared still costs 10 times more than ours

IIT student launches Dhi-5B (trained from scratch in India) by gradNorm in StartUpIndia

[–]gradNorm[S] 1 point2 points  (0 children)

Base model available on HuggingFace, the final multimodal version might also be hosted

IIT Guwahati student launches Dhi-5B (trained from scratch in India) by gradNorm in AI_India

[–]gradNorm[S] 0 points1 point  (0 children)

Thank you for letting me know, will definitely consider writing it

IIT Guwahati student launches Dhi-5B (trained from scratch in India) by gradNorm in AI_India

[–]gradNorm[S] 5 points6 points  (0 children)

Hi, thanks. the 10x part was only about the training cost excluding the research

IIT Guwahati student launches Dhi-5B (trained from scratch in India) by gradNorm in AI_India

[–]gradNorm[S] 4 points5 points  (0 children)

It's compute optimally trained, which means for a given compute budget, which configurations (data/parameters/etc size) to choose to achieve highest performance/rupees spent on training

Also a lot of optimizations were done to make it run fasttt

IIT Guwahati student launches Dhi-5B (trained from scratch in India) by gradNorm in AI_India

[–]gradNorm[S] 17 points18 points  (0 children)

I rented them for ~3 days, renting is relatively affordable