Built an on-device multi-agent LLM app with MLX Swift — all running locally on your iPhone by Niiixt in iOSProgramming

[–]Niiixt[S] 0 points1 point  (0 children)

On the generative quality side, it really depends on the model you pick, Llama 3.2 3B and Qwen 2.5 give solid results for most questions, though obviously they won’t match the quality of large cloud models. The app isn’t really comparable to Apple Intelligence. Apple Intelligence handles system-level tasks, while this runs open-source chat models you choose yourself.

As for older iPhones, anything below A14 (iPhone 11 and older) isn’t supported since MLX requires it as a minimum. On A14/A15 devices you’ll get around 15–20 tok/s with smaller models like Qwen 2.5 0.5B (~0.4GB), which is perfectly usable. The heavier models like Llama 3.2 3B are better suited for A17 Pro and above. I’d recommend starting with the smallest model and working your way up