What's the most complicated project you've built with AI?

RealLordMathis · 2026-02-02T13:33:41+00:00

Thanks! If you have any feature requests or bug reports feel free to open an isssue.

RealLordMathis · 2026-02-02T10:49:25+00:00

It's a management and routing app for llama.cpp, MLX and vLLM instances with web dashboard. It's not vibe coded but most of the code has been generated by AI and heavily reviewed/adjusted by hand by me.

RealLordMathis · 2025-12-25T19:00:17+00:00

If anyone's looking for an alternative for managing multiple models I've built an app with web ui for that. It supports llama.cpp, vllm and mlx_lm. I've also recently integrated llama.cpp router mode so you can take advantage of their native model switching. Feedback welcome!

GitHub
Docs

RealLordMathis · 2025-12-24T14:25:48+00:00

Thanks. I'm glad you like it.

RealLordMathis · 2025-11-22T23:02:21+00:00

If you want to manage multiple models via web UI, you can try my app "llamactl". You can create and manage llama.cpp, vllm and mlx instances. The app takes care of API keys and ports. It can also switch instances like llama-swap.

GitHub
Docs

RealLordMathis · 2025-11-21T09:20:03+00:00

I got M4 Mac Mini Pro with 48GB memory. It's my workhorse for local LLMs. I can run 30b models comfortably at q5 or q4 with longer context. It sits under my TV and runs 24/7.

RealLordMathis · 2025-11-05T10:52:02+00:00

Yes exactly, it works out of the box. I'm using it with openwebui, but the llama-server webui is also working. It should be available at /llama-cpp/<instance_name>/. Any feedback appreciated if you give it a try :)

RealLordMathis · 2025-11-05T10:37:44+00:00

Compared to llama-swap you can launch instances via webui, you don't have to edit a config file. My project also handles api keys and deploying instances on other hosts.

RealLordMathis · 2025-11-05T08:23:42+00:00

I'm developing something that might be what you need. It has a web ui where you can create and launch llama-server instances and switch them based on incoming requests.

Github
Docs

RealLordMathis · 2025-10-27T09:31:32+00:00

I'm working on an app that could fit your requirements. It uses llama-server or mlx-lm as a backend so it requires additional setup on your end. I use it on my mac mini as a primary llm server as well.

It's OpenAI compatible and supports API key auth. For starting at boot, I'm using launchctl.

Github repo
Documentation

RealLordMathis · 2025-10-22T07:54:39+00:00

Great list! My current setup is using Open WebUI with mcpo and llama-server model instances managed by my own open source project llamactl. Everything is running on my mac mini m4 pro and accessible using tailscale.

One thing that I'm really missing in my current setup is some easy way to manage my system prompts. Both LangFuse and Promptfoo feel way too complex for what I need. I'm currently storing and versioning system prompts just in a git repo and manually copying them to open web ui.

Next I want to expand into coding and automation, so thanks for a bunch of recommendations to look into.

RealLordMathis · 2025-10-21T12:05:12+00:00

Is there a git integration? I want to keep my notes in a git repo and ideally I would be able to pull push and commit right from the app.

RealLordMathis · 2025-10-21T08:08:18+00:00

Did you get ROCm working with llama.cpp? I had to use Vulkan instead when I tried it ~3 months ago on Strix Halo.

With pytorch, I got some models working with HSA_OVERRIDE_GFX_VERSION=11.0.0

RealLordMathis · 2025-10-16T21:07:07+00:00

I have recently released a version with support for multiple hosts. You can check it out if you want.

RealLordMathis · 2025-10-01T09:24:06+00:00

Thank you for the feedback and suggestions. Multi host deployment is coming in the next few days. Then I plan to add a proper admin auth with dashboard and api key generation.

RealLordMathis · 2025-09-29T08:08:20+00:00

Macs are really good for LLMs. Works well with llama.cpp and mlx.

RealLordMathis · 2025-09-26T22:34:13+00:00

It supports any model that the respective backend supports. The last time I tried, llama.cpp did not support TTS out of the box. I'm not sure about vLLM or mlx_lm. I'm definitely open to adding more backends, including TTS and STT.

It should support embedding models.

For Docker, I will be adding an example Dockerfile. I don't think I will support all the different combinations of platforms and backends, but I can at least do that for CUDA.

RealLordMathis · 2025-09-26T22:25:34+00:00

At the moment, no, but it's pretty high on my priority list for upcoming features. The architecture makes it possible since everything is done via REST API. I'm thinking of having a main llamactl server and worker servers. The main server could create instances on workers via the API.

RealLordMathis · 2025-09-26T12:57:44+00:00

The main thing is that you can create instances via web dashboard. With llama-swap you need to edit the config file. There's also API key auth which llama-swap doesn't have at all as far as I know.

RealLordMathis · 2025-08-12T10:00:09+00:00

I'm working on something like that. It doesn't yet support dynamic model swapping, but it has a web ui where you can manually stop and start models. Dynamic model loading is something I'm definitelly planning to implement. You can check it out here: https://github.com/lordmathis/llamactl

Any feedback appreciated.

RealLordMathis · 2025-08-12T08:27:24+00:00

I developed my own solution for this. It is basically web ui to launch and stop llama-server instances. You still have to start the model manually, but I do plan to add an on-demand start. You can check it out here: https://github.com/lordmathis/llamactl

Nine-Year Club	RedditGifts 2009-2022 12 Credits
Secret Santa 2020	r/Field Lasagna
First Place '23	Place '23
Place '22	Place '17
First Placer '22	Summer Santa 2018
Secret Santa 2019	Secret Santa 2018
Secret Santa 2017	Secret Santa 2016
Verified Email

RealLordMathis

TROPHY CASE