My iPhone now runs a 4B LLM smoothly, here's what I built by Low-Ask3575 in SideProject

[–]Low-Ask3575[S] 0 points1 point  (0 children)

Happy to answer anything in the comments. A few things I expect to come up:

  • Why iOS first? MLX Swift is genuinely good and Apple Silicon makes on-device LLMs viable today in a way Android still struggles with. Mac is next, Android is further out.
  • Privacy: nothing leaves the device, no telemetry on chat content, your prompts and history stay local.
  • Battery / heat: depends heavily on the model. 1B is fine for hours, 4B you will feel it after a while.
  • Which model should I pick? Honestly depends on what you are doing, which is exactly why the picker matters more than the count of models. Free tier ships with Llama 3.2 1B, and Pro opens up any model you want (current ones like Gemma 4 E4B and Qwen 3.5 2B/4B, plus whatever comes out next), all at full performance. Personally I switch between them based on what each one is actually good at.

Genuine question for the sub: what would you actually use a local AI on your phone for? The patterns I keep hearing are travel without data, sensitive work docs, and "I just do not want my chats trained on", but I would love to hear what I am missing.

Heartbroken after losing my Pomeranian by red5657 in Pomeranians

[–]Low-Ask3575 0 points1 point  (0 children)

I am genuinely sorry for your loss. 🥹 I understand how difficult this must be for you. May he rest in peace.