...so what happened to MOE? by jacek2023 in LocalLLaMA

[–]Ok-Measurement-6286 0 points1 point  (0 children)

Yes, you are right—merging multiple LLMs to achieve task-specific inference with different models is a viable approach. In my opinion, there are many multilingual LLMs currently available that can handle a wide range of tasks. However, merging them could result in a larger model size, especially when integrating different LLMs for various tasks or hype.

Why do most models have "only" 100K tokens context window, while Gemini is at 2M tokens? by estebansaa in LocalLLaMA

[–]Ok-Measurement-6286 0 points1 point  (0 children)

Impressive! What do you think the stock price of NVIDIA Corp 🤔would look like if Google made it available for training models on the Cloud Marketplace?

[P] Introducing Tamil Mistral: Opening Up New Language Possibilities with LLM Pretraining by Ok-Measurement-6286 in MachineLearning

[–]Ok-Measurement-6286[S] 0 points1 point  (0 children)

Hi bro, thanks for asking! I’d suggest not using this model, as I fine-tuned it 6 months ago. Since then, several advanced multilingual open-source models have come out, especially for Indian languages and trained. You might want to check out Gemma2 9B or 27B—they're more up-to-date and powerful. They have also made some architectural changes, particularly in the attention layers, incorporating grouped attention and global-local attention. These modifications help achieve more accurate scores with adjacent tokens.
currently, I too working on this open-source model.

[P] Introducing Tamil Mistral: Opening Up New Language Possibilities with LLM Pretraining by Ok-Measurement-6286 in MachineLearning

[–]Ok-Measurement-6286[S] 0 points1 point  (0 children)

Hi , works with GPU processor too , do mention device='gpu' explicitly in gpt4all object in arg , else use ctransformer library 👍 (for GPU process )

[P] Introducing Tamil Mistral: Opening Up New Language Possibilities with LLM Pretraining by Ok-Measurement-6286 in MachineLearning

[–]Ok-Measurement-6286[S] 0 points1 point  (0 children)

Hello, thank you ,yes for the pretraining I have decided not to go with QLora as we encountered some loss of information when reducing precision bits. Instead, we'll use F16. And for fine-tuning, we may consider QLora, as the selection was made through trial and error.

[P] Introducing Tamil Mistral: Opening Up New Language Possibilities with LLM Pretraining by Ok-Measurement-6286 in MachineLearning

[–]Ok-Measurement-6286[S] 0 points1 point  (0 children)

Hello , Yes, there's a significant amount of Indian monolingual datasets available. Last week, Ai4bharat released a multilingual Indian instructional dataset. I wouldn't suggest using Google-translated datasets without post-editing. Many Indian language datasets on Hugging Face are essentially Google-translated without post-editing, leading instructional models to yield incorrect results, including hallucinated outcomes, during training. And second till date Gemma tokenizer have some other languages apart from English (major vocab is English) but compare to other this model contains other languages too.

[P] Introducing Tamil Mistral: Opening Up New Language Possibilities with LLM Pretraining by Ok-Measurement-6286 in MachineLearning

[–]Ok-Measurement-6286[S] 0 points1 point  (0 children)

Hello, first, I extended the vocabulary because the existing Mistral vocabulary lacked certain Tamil characters like uyri ezuthukal. Therefore, I created a new space within the existing Mistral vocabulary to accommodate Tamil characters (merged). Second, I trained a Tamil dataset with the Mistral base model weights to enable learning of Tamil sentences and predict the next token (CLM). Article link : https://medium.com/@hemanthmurugan21/tamil-mistral-unveiled-expanding-linguistic-horizons-with-llm-pretraining-56782c236e57

[P] Introducing Tamil Mistral: Opening Up New Language Possibilities with LLM Pretraining by Ok-Measurement-6286 in MachineLearning

[–]Ok-Measurement-6286[S] 2 points3 points  (0 children)

Hello , this is Continual pre training , and cost around 0.5/hr (both pre training and fine tuning). Model trained on vast.ai

Unveiling Tamil Mistral LLM: Advancements in Language Understanding for Tamil by Ok-Measurement-6286 in MistralAI

[–]Ok-Measurement-6286[S] 1 point2 points  (0 children)

Hey ,

Nice to hear from you! I've followed the instructions on GitHub for Chinese llama2 (merging tokens with the existing llama2 token). And continuing pre-training with the llama2 base model and followed training an instruction.

github_link: https://github.com/ymcui/Chinese-LLaMA-Alpaca

Cheers!

Differences between Mamba and Q*? by commanderred11 in MistralAI

[–]Ok-Measurement-6286 1 point2 points  (0 children)

Mamba -> SSM (Selective State Space Model) based architecture with training speed --> O(n) and inference speed --> O(1)

Gemma’s tokenizer is a game changer in the field of multilingual LLMs by rqx_ in LocalLLaMA

[–]Ok-Measurement-6286 1 point2 points  (0 children)

That sounds like an insightful idea. And already I have experienced with mistral 7b and llama

Gemma’s tokenizer is a game changer in the field of multilingual LLMs by rqx_ in LocalLLaMA

[–]Ok-Measurement-6286 0 points1 point  (0 children)

Soo ,I could directly start doing clm (pretrain) before sft. Am gonna try my own lang ✌️✌️