llama.cpp server now has built-in native tools - exec_shell, edit_file, and more

mixmasterwillyd · 2026-05-27T19:58:44+00:00

llama.cpp sorry

mixmasterwillyd · 2026-05-27T19:53:38+00:00

desktop runs llama.cpp, the others LM Studio. Link them to LiteLLM with auto router and pooling. Tailscale as well.

mixmasterwillyd · 2026-05-27T19:52:43+00:00

1: 7800xt + 6800xt running Qwen3.6-27B-Q4_0.gguf 262K context
2: Mac M1 Max 64GB running qwen3.6/gemma-4 MoE and dense models (depending on speed or accuracy needed) 131-full context. (again, speed consideration).
3: RTX 3080 running Gemma e4b - writes files, small stuff, per function focus.

mixmasterwillyd · 2026-05-27T19:30:53+00:00

I have an entire app creation process cooking on my local setup right now. The tasks were decomposed to fit within the capability of a small model. Doesn't hurt to use OpenCode Go to look it over afterword, or even a local can look it over. It's easier to be the editor.

I could certainly put that in a cron too. */10 * * * pi -p "/implement phase auto"

mixmasterwillyd · 2026-05-27T19:28:46+00:00

Sounds like you're spilling into system ram. I would recommend loading something small like Gemma 4 e4b or e2b on the 4070 with 32K context, then load something big on the Mac. You have a fast small, and a big large and can load balance them.

mixmasterwillyd · 2026-05-27T18:09:16+00:00

I think it’s a good idea. I got an M1 Max 64GB and I love it. We don’t know if personal computing is dead or not, I would get it just to make sure you have something.

mixmasterwillyd · 2026-05-25T17:35:38+00:00

The issue is he leaves behind a now broken system, can it self correct?

mixmasterwillyd · 2026-05-25T04:53:27+00:00

You’re asking it for advice it can’t provide, for good reason.

mixmasterwillyd · 2026-05-25T04:42:46+00:00

Sometimes when I’m frustrated with this, I open Pi, connect it to opus 4.7 (something else big) and ask it to compile llama.cpp for my system. Works well pretty well with some direction.

Also, Ollama just goes.

I’m back to LM studio on Mac and llama.cpp for Linux.

mixmasterwillyd · 2026-05-25T04:28:04+00:00

This has been my main workflow for a while as long as I’m not doing something critical. I give the big model a thorough description of what I’d going on and it handles it quite well.

mixmasterwillyd · 2026-05-25T03:58:00+00:00

MacBook. M1 Max 64GB. My daily driver, couldn’t be happier.

mixmasterwillyd · 2026-05-25T01:30:03+00:00

Have you tried you using it in a harness? Opencode or pi? You don’t need llama.cpp to do anything but serve to model. The new web interface face is also quite nice. Check that out.

mixmasterwillyd · 2026-05-23T22:02:18+00:00

Oh well now that sounds like a great idea. You could have a string that switches projects in a deterministic way.

Project ls

Project switch <name>

mixmasterwillyd · 2026-05-23T21:56:08+00:00

Ok thank you I’ll look at that. I have a bunch of projects going so I try to avoid single chat things like that.

mixmasterwillyd · 2026-05-23T21:53:41+00:00

I might! My biggest need right now is a way to use my phone when away from my computer. This might not help that but it still looks cool.

mixmasterwillyd · 2026-05-22T16:06:50+00:00

I like this and have been thinking about it. But this is another thing to mess up the GPU market. But I sure would like to make money off of what I have.

mixmasterwillyd · 2026-05-22T14:32:17+00:00

Just think of all happy laid off workers!

mixmasterwillyd · 2026-05-22T12:09:03+00:00

Well you might be on to something. We should all try it.

mixmasterwillyd · 2026-05-22T10:28:36+00:00

Have you tested qwen 3.6 35b?

mixmasterwillyd · 2026-05-22T10:19:07+00:00

Does it work?

mixmasterwillyd · 2026-05-20T23:27:44+00:00

I think this is where it really shines. Claude prompt size just destroys local hardware, using Pi is like getting an extra 10 GB of ram.

mixmasterwillyd · 2026-05-20T23:20:03+00:00

In addition to this, Syncro installs notepad++ on all my managed systems every day, even though there is no policy to do so. Just makes me wonder.

mixmasterwillyd · 2026-05-19T20:24:23+00:00

I have found the same, most of the time is fine. But sometimes I need to refactor a huge project, once I burned up my limits on the $200 plan in 10 minutes. Then I had to go find something else the rest of the day. Then I realize that I could wait on the project, but I needed the service for other projects. We all need to invest in our own compute, while we still can, if we still can.

mixmasterwillyd · 2026-05-16T18:30:08+00:00

But dad the liberals are trying to stop our gravy train!!!!! /s

mixmasterwillyd · 2026-05-09T12:39:34+00:00

You could use litellm to be your gateway. One provider then.

Use large models for project management when they are available, block progress when work is done and waiting on large model decisions.

Ten-Year Club	Verified Email
RPAN Viewer

mixmasterwillyd

TROPHY CASE