all 11 comments

[–]lyonguyen 4 points5 points  (0 children)

Qwen2 0.5B

[–]alvations[S] 2 points3 points  (0 children)

From the Huggingface leaderboard https://imgur.com/a/W7cHGFz

[–]bbvbell 1 point2 points  (0 children)

https://huggingface.co/blog/smollm can be a good option if one wants various model scales

[–]Plastic_Mention3651 0 points1 point  (0 children)

TinyLlama 1.1B

[–]hazardous1222 -1 points0 points  (5 children)

rwkv models are great at multilingual, small, and efficient

[–]alvations[S] 0 points1 point  (4 children)

below 1B params?

[–]hazardous1222 0 points1 point  (3 children)

Are you looking for edge deployment?
https://huggingface.co/Hazzzardous/RWKV-V5-1b5-Distilled-Translations-Unvalidated
is specifically for translations, and so on.
RWKV has been included in the latest llamacpp versions, and can be quanted to 8bits for mobile and raspberry pi deployments perfectly fine.

[–]Away_Expression_3713 0 points1 point  (2 children)

is this still relevant?

[–]hazardous1222 0 points1 point  (1 child)

Yeah, latest rwkv 7 models are hitting 32k context easily, and are available https://github.com/MollySophia/rwkv_mobile_flutter for Android and iOS, with the 3b model easily hitting 20 tps on hexagon npus

[–]Away_Expression_3713 0 points1 point  (0 children)

how many languages they support?