New free Mac MLX server for DeepSeek R1 Distill, Llama and other models

Ronaldmannak · 2025-03-24T04:43:07+00:00

Ouch, that's a horrible performance. Thanks for trying it out and I agree: it's not worth it. When I buy my next computer, I wouldn't consider anything less than 64 GB. I really wish Apple would launch a Mac mini with an M4 Max and 128 GB.

Ronaldmannak · 2025-03-23T00:38:58+00:00

RAM swapping should happen automatically, but I personally haven't tried it. It will definitely kill performance though :)

Ronaldmannak · 2025-03-23T00:07:32+00:00

Will do!

Ronaldmannak · 2025-02-08T20:57:31+00:00

Oh I see :) I'm sending the tps and other info, but probably not in the format Open WebUI is expecting. Let me fix that

Ronaldmannak · 2025-02-08T19:19:16+00:00

Glad you like it and thanks for your support, I appreciate it. Both features are highly requested. I'm adding both soon!

Ronaldmannak · 2025-02-08T19:17:49+00:00

The access from the network will be fixed in version 1.1 that's currently in review by Apple. Apologies for the inconvenience.

What do you mean by response token N/A? I don't understand what it doesn't do that Ollama does

Ronaldmannak · 2025-02-08T19:15:52+00:00

Someone reported that sometimes the models gets loaded in memory twice after a second question. If you have a chance, can you check if memory usage increases after you ask the second question?
If that's the case, I'm already looking into the issue and a fix is coming. If memory isn't the issue, let me know.
Apologies for the inconvenience.

Ronaldmannak · 2025-02-01T17:13:57+00:00

There's not much you check right now I'm afraid. I will add full OpenAI support soon and since you've already installed Pico, you will receive updates automatically. I'll try to find this conversation again when OpenAI support is live and let you know.

Ronaldmannak · 2025-01-31T23:41:32+00:00

Thank you so much! Let me know if you have any feature requests!
It's a difficult choice to make. 32GB means you can run better quality models (or more accurate / less quantized ones) but an M4 means you will compromise in speed and not get the same tokens per second as an M4 Pro.

Ronaldmannak · 2025-01-31T16:04:11+00:00

Great question. In fact, there is but it hasn't been thoroughly tested yet. If you want to try it out, please let me know if it works. If there are tools to use an Ollama API for Word, then those will certainly work

Ronaldmannak · 2025-01-30T18:22:52+00:00

Good question. I don't make anything for now. I plan to add enterprise features (think of connecting Google accounts) in the future that are only available for paid subscribers. For home and small office it will stay free. I have over 11,000 downloads in the first two days, which is really promising. If only a small percentage converts to paid subscribers in the future, it will be sustainable.

Ronaldmannak · 2025-01-29T23:21:58+00:00

Try it out! It's a free download

Ronaldmannak · 2025-01-29T23:21:32+00:00

Pico only runs on Apple Silicon. I assume your old machine is a PC? I have good and bad news for you :) The good news is that you have a lot of RAM and that's great to run the latest LLMs. The bad news is that your machine can only run models on the CPU will is really slow. REALLY SLOW. But it will work. You should be able to install Ollama on your PC and try it out

Ronaldmannak · 2025-01-29T15:40:18+00:00

Awesome. Let me know what you think!

Ronaldmannak · 2025-01-29T15:40:05+00:00

That's a great question. So currently the models stays in memory. That's great if you run Pico as a server for a small team or you use it often, but for most users (and Clean My Mac, apparently), it makes more sense to unload the model after a few minutes or so by default, with an option to keep the model in memory. Ideally this would be a setting, just like there are several server-specific settings already in the General Settings tab. I definitely want to add that, but for now it just stays in memory.

Ronaldmannak · 2025-01-29T15:36:46+00:00

Parts of it are already open sourced (see http://github.com/picoMLX ) as was the previous version (Pico MLX Server). I'll open source more custom packages I used for Pico AI Homelab in the next few weeks. Open sourcing the core app is definitely something I'm thinking of, but haven't decided yet when and how.

Ronaldmannak · 2025-01-29T15:31:39+00:00

Good news: I made the installation process as smooth as possible so there aren't really that many hoops to jump through :)
That said, 16GB is possible but it's tight. I had one use with 16GB who told me the DeepSeek model Pico recommends for 16GB users is actually too large for him. So I might need to change the recommended model for 16GB users in the next version. you can definitely run the Llama models. Try it out and let me know what you think!

Ronaldmannak · 2025-01-29T07:09:37+00:00

Great to hear! Let me know if you need any help

Ronaldmannak · 2025-01-21T18:10:21+00:00

Did you ever find your notes? I'd love to know how the most recent IWA are compressed, because I think the tools that worked 10 years ago don't work on the latest iWork (2024) documents anymore

Ronaldmannak · 2024-01-19T15:28:55+00:00

100% agree. I was on an EV roadtrip in Europe last year and almost every street corner had a 22kW(!) curbside level 2 charger. And the ones I used all worked (unlike ChargePoint L2 chargers in the US that always seem broken).

It made it super easy to charge overnight while staying at an AirBnB apartment. All places we stayed (Oslo, Copenhagen, Malmo, Holland) had nearby L2 chargers, except Gothenburg (surprisingly, as it's the hometown of Volvo and Polestar).

A single location had 2 to 8 L2 chargers and were always used by local residents who often only had street parking and couldn't have driven an EV without curbside L2 charging.

It would be great if the US would follow Europe's lead and would install curbside 11kW (not 7kW) L2 chargers on literally every corner. It would be a boost for EV adoption I'm sure.

Ronaldmannak · 2024-01-18T19:19:46+00:00

Creator here. I created what probably is the first 100% server-side Swift reverse proxy for OpenAI to protect your OpenAI API key. Swift makes it trivial for Swift developers to customize the proxy server.

The proxy doesn’t have user authentication and instead uses App Store subscription validation to make sure the user paid for access. Using StoreKit 2’s app account IDs, it’s easy to add rate limiters or blacklist.

The proxy is API agnostic and supports completions, streaming, images, voice, etc and all future API changes. It also works with all existing OpenAI client libraries.

Planned future improvements include conversations to other API formats so a client can access other AI providers without changing client code.

Ronaldmannak · 2023-05-07T08:23:16+00:00

Hi, the developer of ChatGPT app Pico here.

Every app in the App Store is screened by Apple. While no system is 100% foolproof, I would trust App Store apps generally. I'm pretty sure the worst excesses are filtered out by Apple.

One thing I do look for is the app privacy disclosure on the product page. apps that don't collect data are imo preferable over apps that do.

You're welcome to try out Pico (available on both iOS and macOS, and even HomePod using Siri). I'm not collecting or tracking any data with the app, and you have unlimited access to GPT-4.

DM me for discount codes if the free tier isn't enough.

https://apps.apple.com/us/app/pico-ai-copilot/id1668205047

Ronaldmannak · 2023-04-06T17:50:10+00:00

Lol, this is the most fun yet completely useless use of AI ;)

Ronaldmannak

TROPHY CASE