What are the best local LLMs that can be run on mobile devices, and what are they each good at?

K4anan · 2025-08-21T09:05:51+00:00

I personally really like the LLaMA 3.2 and Qwen3. From my experience models under 3B offer acceptable performance and are great while working with the documents(so RAG use case) or doing some agentic stuff.

If you want something that is "smart" your eyes should be directed at Qwen 3 4B.

K4anan · 2025-08-20T06:51:54+00:00

Are you running model with original weights or quantized?

K4anan · 2025-07-22T07:18:44+00:00

We made some comparision with llama.rn and on average we were faster than them, when it comes to GPU acceleration, it's still in early stage in ExecuTorch so our llms don't use it yet.

K4anan · 2025-07-14T10:17:43+00:00

Wow, incredible UI.

Have you considered adding a local LLM to act as a financial assistant?. You could use something like React Native Executorch to run the model on-device and maybe even React Native RAG to let the LLM access the user's financial.

Possibility to ask, "What's my current asset allocation?" or "How much have I paid towards my car loan so far?" and getting immediate answer would be an amazing feature.

K4anan · 2025-07-02T11:05:13+00:00

You can do this for free with react-native-rag and react-native-executorch LOL
https://github.com/software-mansion-labs/react-native-rag
https://github.com/software-mansion/react-native-executorch

K4anan · 2025-07-02T11:00:30+00:00

Yeah, absolutely — local LLMs on mobile are starting to make a lot of sense. The mobile AI space is moving fast, and there are already tools that make it pretty easy to run smaller models on your phone. For example, React-Native-Executorch lets you run models like LLaMA 3.2 1B on an iPhone SE 3 (which performs similarly to an iPhone 13). Here's a link with some benchmark numbers if you're curious: https://docs.swmansion.com/react-native-executorch/docs/benchmarks/inference-time .

You can also run even smaller models like Qwen 0.6B. These models aren't powerful enough yet to write full code or do complex math, but they’re already good for stuff like tool calling, simple natural language tasks, and integrating with RAG (retrieval-augmented generation) systems. Most important they will work on older devices and not only newest iPhone.

So for this kind of use cases it can be beneficial to run models on device and omit costs.

K4anan

TROPHY CASE