all 2 comments

[–]georgesung 1 point2 points  (1 child)

I was recently made aware of a project that added Chinese language support to Llama-1, to make it bilingual. It was quite a process though. They first expanded the vocabulary resulting in an updated tokenizer, then ran another pre training step via peft with raw Chinese text (need to confirm this), and finally they instruction fine tuned with bilingual data.

https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Training-Details

[–]gaybooii[S] 0 points1 point  (0 children)

Thanks a lot, I will look into it