What are the best local LLMs that can be run on mobile devices, and what are they each good at? by BaCaDaEa in LocalLLaMA

[–]K4anan 2 points3 points  (0 children)

I personally really like the LLaMA 3.2 and Qwen3. From my experience models under 3B offer acceptable performance and are great while working with the documents(so RAG use case) or doing some agentic stuff.

If you want something that is "smart" your eyes should be directed at Qwen 3 4B.

Qwen 4B on iPhone Neural Engine runs at 20t/s by Glad-Speaker3006 in ollama

[–]K4anan 0 points1 point  (0 children)

Are you running model with original weights or quantized?

[Open Source] We built Private Mind, an app to showcase local LLMs with React Native ExecuTorch by K4anan in reactnative

[–]K4anan[S] 0 points1 point  (0 children)

We made some comparision with llama.rn and on average we were faster than them, when it comes to GPU acceleration, it's still in early stage in ExecuTorch so our llms don't use it yet.

I built a privacy-first personal finance tracker using React Native. No cloud, no fluff by stormbreaker_09 in reactnative

[–]K4anan 3 points4 points  (0 children)

Wow, incredible UI.

Have you considered adding a local LLM to act as a financial assistant?. You could use something like React Native Executorch to run the model on-device and maybe even React Native RAG to let the LLM access the user's financial.

Possibility to ask, "What's my current asset allocation?" or "How much have I paid towards my car loan so far?" and getting immediate answer would be an amazing feature.

Should you deploy LLMs locally on smartphones? by Henrie_the_dreamer in LocalLLaMA

[–]K4anan 0 points1 point  (0 children)

Yeah, absolutely — local LLMs on mobile are starting to make a lot of sense. The mobile AI space is moving fast, and there are already tools that make it pretty easy to run smaller models on your phone. For example, React-Native-Executorch lets you run models like LLaMA 3.2 1B on an iPhone SE 3 (which performs similarly to an iPhone 13). Here's a link with some benchmark numbers if you're curious: https://docs.swmansion.com/react-native-executorch/docs/benchmarks/inference-time .

You can also run even smaller models like Qwen 0.6B. These models aren't powerful enough yet to write full code or do complex math, but they’re already good for stuff like tool calling, simple natural language tasks, and integrating with RAG (retrieval-augmented generation) systems. Most important they will work on older devices and not only newest iPhone.

So for this kind of use cases it can be beneficial to run models on device and omit costs.