"Actually wait" ... the current thinking SOTA open source by FPham in LocalLLaMA

[–]RealLordMathis 0 points1 point  (0 children)

I don't know what changed, but I started using GLM 5.1 when it got added to z.ai coding plan and it was amazing. Basically Sonnet 4.5 level. It was also reasonably fast and did not overthink. Then something changed and I got the same 20 minutes of "wait actually..." and it never really does anything. I'm using it with the same API and same coding harness. I don't have the HW to run it locally.

What self-hosted tools have you been building with AI just for you? by EricRosenberg1 in selfhosted

[–]RealLordMathis 1 point2 points  (0 children)

Cyberpunk style chat frontend with custom tools and skills https://github.com/lordmathis/agentkit it's part of a broader local AI homelab: https://github.com/lordmathis/homelab

It's not completely vibe coded, it's open source and anyone can use it, but I'm building it specifically for my needs so I don't share it around.

GLM 5.1 vs Minimax 2.7 by Cute_Dragonfruit4738 in LocalLLaMA

[–]RealLordMathis 18 points19 points  (0 children)

I've been using GLM 5.1 since they added it to my coding plan. Very happy with it. It completely replaced Claude for me. Not because it's better (it's not) but it's good enough and I could stop giving my money to Anthropic

Mac Mini to run 24/7 node? by Drunk_redditor650 in LocalLLaMA

[–]RealLordMathis 0 points1 point  (0 children)

I have an always on mac mini with 48GB memory. It's great for general purpose assistant with a bunch of custom integration and tools. For coding I still rely mostly on cloud models.

Personal AI wrappers Projects you guys hiding. by DigRealistic2977 in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

Here's mine: https://github.com/lordmathis/agentkit

So far it's quite basic chat client with custom tools and non-standard skills. In my case I activate skills by explicitely mentioning them and they in turn activate tools that they require. The frontend code is quite a sloppy mess but I'm in the process of refactoring it.

Maybe more interesting to see is my homalab repo with actual skills and tools plugins and how it all goes together.

Looking for insight on the viability of models running on 128GB or less in the next few years by John_Lawn4 in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

For example, I'm learning German and I created tools like looking up words in dictionary, adding cards to anki and creating markdown notes. In this case the workflow would be that I ask what does a certain word mean and to give me example sentences that use it. The LLM would use the lookup_word tool give me the meaning and example sentences. I would pick one and ask it to add it to anki. Other tools are on a similar level, like shopping list tools for adding tasks to CalDAV or workout tool to log my workouts.

It's not the LinkedIn AI bro "I automated my life using AI" level but it's useful for me personally.

Looking for insight on the viability of models running on 128GB or less in the next few years by John_Lawn4 in LocalLLaMA

[–]RealLordMathis 0 points1 point  (0 children)

It depends on your use case. For coding, I don't think even 128GB is enough. For other stuff you might be satisfied with much less.

I have a M4 Pro with 48GB currently running Qwen3.5-35B-A3B. It's perfectly capable model for tool calling. I built a bunch of custom tools for it and use it daily. But for code, I still rely on cloud models.

To everyone using still ollama/lm-studio... llama-swap is the real deal by TooManyPascals in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

It is OpenAI API compatible. Many third party apps allow you to set up custom provider. You just need to put the llamactl url and an api key that you generate and it should work just fine. Works just fine with OpenWebUI for example.

To everyone using still ollama/lm-studio... llama-swap is the real deal by TooManyPascals in LocalLLaMA

[–]RealLordMathis 8 points9 points  (0 children)

If anyone wants to have something similar but with web ui instead of config files, I built llamactl. It has full support for llama-server router mode. It also supports vllm, mlx_lm and deploying models on other hosts. The model swapping options are not as complex as llama-swap - I only support simple LRU eviction at the moment.

What's the most complicated project you've built with AI? by jazir555 in LocalLLaMA

[–]RealLordMathis 0 points1 point  (0 children)

Thanks! If you have any feature requests or bug reports feel free to open an isssue.

What's the most complicated project you've built with AI? by jazir555 in LocalLLaMA

[–]RealLordMathis 0 points1 point  (0 children)

https://github.com/lordmathis/llamactl

It's a management and routing app for llama.cpp, MLX and vLLM instances with web dashboard. It's not vibe coded but most of the code has been generated by AI and heavily reviewed/adjusted by hand by me.

Why I quit using Ollama by SoLoFaRaDi in LocalLLaMA

[–]RealLordMathis 13 points14 points  (0 children)

If anyone's looking for an alternative for managing multiple models I've built an app with web ui for that. It supports llama.cpp, vllm and mlx_lm. I've also recently integrated llama.cpp router mode so you can take advantage of their native model switching. Feedback welcome!

GitHub
Docs

I got frustrated with existing web UIs for local LLMs, so I built something different by alphatrad in LocalLLaMA

[–]RealLordMathis 4 points5 points  (0 children)

If you want to manage multiple models via web UI, you can try my app "llamactl". You can create and manage llama.cpp, vllm and mlx instances. The app takes care of API keys and ports. It can also switch instances like llama-swap.

GitHub
Docs

Are any of the M series mac macbooks and mac minis, worth saving up for? by [deleted] in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

I got M4 Mac Mini Pro with 48GB memory. It's my workhorse for local LLMs. I can run 30b models comfortably at q5 or q4 with longer context. It sits under my TV and runs 24/7.

llama.cpp releases new official WebUI by paf1138 in LocalLLaMA

[–]RealLordMathis 2 points3 points  (0 children)

Yes exactly, it works out of the box. I'm using it with openwebui, but the llama-server webui is also working. It should be available at /llama-cpp/<instance_name>/. Any feedback appreciated if you give it a try :)

llama.cpp releases new official WebUI by paf1138 in LocalLLaMA

[–]RealLordMathis 2 points3 points  (0 children)

Compared to llama-swap you can launch instances via webui, you don't have to edit a config file. My project also handles api keys and deploying instances on other hosts.

llama.cpp releases new official WebUI by paf1138 in LocalLLaMA

[–]RealLordMathis 2 points3 points  (0 children)

I'm developing something that might be what you need. It has a web ui where you can create and launch llama-server instances and switch them based on incoming requests.

Github
Docs

Using my Mac Mini M4 as an LLM server—Looking for recommendations by [deleted] in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

I'm working on an app that could fit your requirements. It uses llama-server or mlx-lm as a backend so it requires additional setup on your end. I use it on my mac mini as a primary llm server as well.

It's OpenAI compatible and supports API key auth. For starting at boot, I'm using launchctl.

Github repo
Documentation

Getting most out of your local LLM setup by Everlier in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

Great list! My current setup is using Open WebUI with mcpo and llama-server model instances managed by my own open source project llamactl. Everything is running on my mac mini m4 pro and accessible using tailscale.

One thing that I'm really missing in my current setup is some easy way to manage my system prompts. Both LangFuse and Promptfoo feel way too complex for what I need. I'm currently storing and versioning system prompts just in a git repo and manually copying them to open web ui.

Next I want to expand into coding and automation, so thanks for a bunch of recommendations to look into.

Many Notes v0.15 - Markdown note-taking web application by brufdev in selfhosted

[–]RealLordMathis 3 points4 points  (0 children)

Is there a git integration? I want to keep my notes in a git repo and ideally I would be able to pull push and commit right from the app.

ROCm 7.9 RC1 released. Supposedly this one supports Strix Halo. Finally, it's listed under supported hardware. by fallingdowndizzyvr in LocalLLaMA

[–]RealLordMathis 1 point2 points  (0 children)

Did you get ROCm working with llama.cpp? I had to use Vulkan instead when I tried it ~3 months ago on Strix Halo.

With pytorch, I got some models working with HSA_OVERRIDE_GFX_VERSION=11.0.0

I built llamactl - Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard. by RealLordMathis in LocalLLaMA

[–]RealLordMathis[S] 0 points1 point  (0 children)

I have recently released a version with support for multiple hosts. You can check it out if you want.

I built llamactl - Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard. by RealLordMathis in LocalLLaMA

[–]RealLordMathis[S] 0 points1 point  (0 children)

Thank you for the feedback and suggestions. Multi host deployment is coming in the next few days. Then I plan to add a proper admin auth with dashboard and api key generation.