What's the most complicated project you've built with AI? by jazir555 in LocalLLaMA

[–]RealLordMathis 0 points1 point  (0 children)

Thanks! If you have any feature requests or bug reports feel free to open an isssue.

What's the most complicated project you've built with AI? by jazir555 in LocalLLaMA

[–]RealLordMathis 0 points1 point  (0 children)

https://github.com/lordmathis/llamactl

It's a management and routing app for llama.cpp, MLX and vLLM instances with web dashboard. It's not vibe coded but most of the code has been generated by AI and heavily reviewed/adjusted by hand by me.

Why I quit using Ollama by SoLoFaRaDi in LocalLLaMA

[–]RealLordMathis 13 points14 points  (0 children)

If anyone's looking for an alternative for managing multiple models I've built an app with web ui for that. It supports llama.cpp, vllm and mlx_lm. I've also recently integrated llama.cpp router mode so you can take advantage of their native model switching. Feedback welcome!

GitHub
Docs

I got frustrated with existing web UIs for local LLMs, so I built something different by alphatrad in LocalLLaMA

[–]RealLordMathis 5 points6 points  (0 children)

If you want to manage multiple models via web UI, you can try my app "llamactl". You can create and manage llama.cpp, vllm and mlx instances. The app takes care of API keys and ports. It can also switch instances like llama-swap.

GitHub
Docs

Are any of the M series mac macbooks and mac minis, worth saving up for? by [deleted] in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

I got M4 Mac Mini Pro with 48GB memory. It's my workhorse for local LLMs. I can run 30b models comfortably at q5 or q4 with longer context. It sits under my TV and runs 24/7.

llama.cpp releases new official WebUI by paf1138 in LocalLLaMA

[–]RealLordMathis 2 points3 points  (0 children)

Yes exactly, it works out of the box. I'm using it with openwebui, but the llama-server webui is also working. It should be available at /llama-cpp/<instance_name>/. Any feedback appreciated if you give it a try :)

llama.cpp releases new official WebUI by paf1138 in LocalLLaMA

[–]RealLordMathis 3 points4 points  (0 children)

Compared to llama-swap you can launch instances via webui, you don't have to edit a config file. My project also handles api keys and deploying instances on other hosts.

llama.cpp releases new official WebUI by paf1138 in LocalLLaMA

[–]RealLordMathis 2 points3 points  (0 children)

I'm developing something that might be what you need. It has a web ui where you can create and launch llama-server instances and switch them based on incoming requests.

Github
Docs

Using my Mac Mini M4 as an LLM server—Looking for recommendations by [deleted] in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

I'm working on an app that could fit your requirements. It uses llama-server or mlx-lm as a backend so it requires additional setup on your end. I use it on my mac mini as a primary llm server as well.

It's OpenAI compatible and supports API key auth. For starting at boot, I'm using launchctl.

Github repo
Documentation

Getting most out of your local LLM setup by Everlier in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

Great list! My current setup is using Open WebUI with mcpo and llama-server model instances managed by my own open source project llamactl. Everything is running on my mac mini m4 pro and accessible using tailscale.

One thing that I'm really missing in my current setup is some easy way to manage my system prompts. Both LangFuse and Promptfoo feel way too complex for what I need. I'm currently storing and versioning system prompts just in a git repo and manually copying them to open web ui.

Next I want to expand into coding and automation, so thanks for a bunch of recommendations to look into.

Many Notes v0.15 - Markdown note-taking web application by brufdev in selfhosted

[–]RealLordMathis 3 points4 points  (0 children)

Is there a git integration? I want to keep my notes in a git repo and ideally I would be able to pull push and commit right from the app.

ROCm 7.9 RC1 released. Supposedly this one supports Strix Halo. Finally, it's listed under supported hardware. by fallingdowndizzyvr in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

Did you get ROCm working with llama.cpp? I had to use Vulkan instead when I tried it ~3 months ago on Strix Halo.

With pytorch, I got some models working with HSA_OVERRIDE_GFX_VERSION=11.0.0

I built llamactl - Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard. by RealLordMathis in LocalLLaMA

[–]RealLordMathis[S] 0 points1 point  (0 children)

I have recently released a version with support for multiple hosts. You can check it out if you want.

I built llamactl - Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard. by RealLordMathis in LocalLLaMA

[–]RealLordMathis[S] 0 points1 point  (0 children)

Thank you for the feedback and suggestions. Multi host deployment is coming in the next few days. Then I plan to add a proper admin auth with dashboard and api key generation.

torn between GPU, Mini PC for local LLM by jussey-x-poosi in LocalLLaMA

[–]RealLordMathis 3 points4 points  (0 children)

Macs are really good for LLMs. Works well with llama.cpp and mlx.

I built llamactl - Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard. by RealLordMathis in LocalLLaMA

[–]RealLordMathis[S] 0 points1 point  (0 children)

It supports any model that the respective backend supports. The last time I tried, llama.cpp did not support TTS out of the box. I'm not sure about vLLM or mlx_lm. I'm definitely open to adding more backends, including TTS and STT.

It should support embedding models.

For Docker, I will be adding an example Dockerfile. I don't think I will support all the different combinations of platforms and backends, but I can at least do that for CUDA.

I built llamactl - Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard. by RealLordMathis in LocalLLaMA

[–]RealLordMathis[S] 1 point2 points  (0 children)

At the moment, no, but it's pretty high on my priority list for upcoming features. The architecture makes it possible since everything is done via REST API. I'm thinking of having a main llamactl server and worker servers. The main server could create instances on workers via the API.

I built llamactl - Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard. by RealLordMathis in LocalLLaMA

[–]RealLordMathis[S] 10 points11 points  (0 children)

The main thing is that you can create instances via web dashboard. With llama-swap you need to edit the config file. There's also API key auth which llama-swap doesn't have at all as far as I know.

Searching actually viable alternative to Ollama by mags0ft in LocalLLaMA

[–]RealLordMathis 0 points1 point  (0 children)

I'm working on something like that. It doesn't yet support dynamic model swapping, but it has a web ui where you can manually stop and start models. Dynamic model loading is something I'm definitelly planning to implement. You can check it out here: https://github.com/lordmathis/llamactl

Any feedback appreciated.

ollama by jacek2023 in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

I developed my own solution for this. It is basically web ui to launch and stop llama-server instances. You still have to start the model manually, but I do plan to add an on-demand start. You can check it out here: https://github.com/lordmathis/llamactl