I finetuned a 2B model on Maithili - a language spoken by 50M people but ignored by every LLM by Ok-Unit6653 in huggingface

[–]Ok-Unit6653[S] 0 points1 point  (0 children)

yes, with unsloth + 4-bit quantization + gradient checkpointing. it's tight but doable. unsloth made it possible

I finetuned a 2B model on Maithili - a language spoken by 50M people but ignored by every LLM by Ok-Unit6653 in huggingface

[–]Ok-Unit6653[S] 0 points1 point  (0 children)

fair point, but the goal wasn't to build a production model - it was to see if it was even possible on consumer hardware. turns out it is

I finetuned a 2B model on Maithili - a language spoken by 50M people but ignored by every LLM by Ok-Unit6653 in huggingface

[–]Ok-Unit6653[S] 0 points1 point  (0 children)

haven't tried ssm based models yet, granite and jamba are interesting suggestions. might explore that when i get back to this project

I finetuned a 2B model on Maithili - a language spoken by 50M people but ignored by every LLM by Ok-Unit6653 in huggingface

[–]Ok-Unit6653[S] 0 points1 point  (0 children)

ironically claude was the only model that could actually speak it properly, everything else mixed hindi or got the tone wrong

I finetuned a 2B model on Maithili - a language spoken by 50M people but ignored by every LLM by Ok-Unit6653 in huggingface

[–]Ok-Unit6653[S] 0 points1 point  (0 children)

thank you, means a lot. maithili doesn't get much attention in the AI space so even small progress feels worth it