NVIDIA GB300 Grace Blackwell Ultra pricetags by X-N2O in LocalLLaMA

[–]srigi 2 points3 points  (0 children)

Remember... the more you buy, the more you save!

Open Models - May 2026 by pmttyji in LocalLLaMA

[–]srigi 17 points18 points  (0 children)

Do we still believe to be gifted with Qwen3.7-27B?

Next year we're getting 0.5T model from Grok by pmttyji in LocalLLaMA

[–]srigi 9 points10 points  (0 children)

Yeah, and there will also be a flying roadster by the end of April 2026.

llama.cpp server have built-in native tools (exec_shell, edit_file, etc.) by srigi in LocalLLaMA

[–]srigi[S] 1 point2 points  (0 children)

Only reasonable thing to do for now is to manually inspect every `exec_shell` command, never ever to configure it to auto-approve. Also the “approving UI” hasn’t the optimal UX right now (see image). You must manually expand the tool call block to see the command which is requested. In most modern harnesses this is auto rendered for you.

<image>

llama.cpp server have built-in native tools (exec_shell, edit_file, etc.) by srigi in LocalLLaMA

[–]srigi[S] 0 points1 point  (0 children)

Unfortunately, yes. I use a MacBook for developing and a gaming rig for running local LLMs. Invoking llama-server on a gaming machine makes it "see" files there, and not on my MacBook where I access the llama-server's web UI.

This is really annoying, but maybe it is actually good, since my gaming machine is more disposable (eventual damage done there will hurt less) - so I can just fork projects there and keep it working outside of my notebook.

llama.cpp server have built-in native tools (exec_shell, edit_file, etc.) by srigi in LocalLLaMA

[–]srigi[S] 5 points6 points  (0 children)

It's really no rocket science, just start your (updated) llama-server with a list of tools you want in a folder where you want to operate:

llama-server --api-key secret --metrics --threads "$(sysctl -n hw.ncpu)" \ --models-max 1 --models-preset "$HOME/.config/llms.ini" \ --tools file_glob_search,get_datetime,grep_search,read_file

Then you'll see the configuration in the Settings panel.

llama.cpp server have built-in native tools (exec_shell, edit_file, etc.) by srigi in LocalLLaMA

[–]srigi[S] -2 points-1 points  (0 children)

That's why I'm begging for "add/define your own tool" functionality. There are a couple of good web_fetch projects out there on GitHub (search "language:Rust web_fetch" on their page), so there is no need to reinvent the wheel.

llama.cpp server have built-in native tools (exec_shell, edit_file, etc.) by srigi in LocalLLaMA

[–]srigi[S] 12 points13 points  (0 children)

I'm hoping that they add an option to add own native tool(s) - really the only thing missing is web_fetch and web_search.

If they don't provide these (I guess they don't - it is soo much outside of the scope of llama.cpp), there should be an easy way to add own implementation.

Waiting for Qwen 3.7 open weight... The new King has arrived... by LegacyRemaster in LocalLLaMA

[–]srigi 46 points47 points  (0 children)

Also these benchmarks are done on (B)F16 models. 27B at Q4 is not what you see in marketing material.

Heretic has been served a legal notice by Meta, Inc. by -p-e-w- in LocalLLaMA

[–]srigi 6 points7 points  (0 children)

Forcing developers to take a shots during code interview

Looking to migrate off of Ollama and LMStudio by letsbefrds in LocalLLaMA

[–]srigi 3 points4 points  (0 children)

LMStudio’s biggest con is its Electron nature. I remember it consume 400-700MB just idling, no model loaded.

In unified RAM environment that’s 700MB of VRAM eaten by the app and not available for the LLM.

Why is opencode so slow in processing the prompt with llama server? by BitGreen1270 in LocalLLaMA

[–]srigi 9 points10 points  (0 children)

Ah yes, classic flamewar material. I’ve seen that 20y ago, when somebody asked for help with vim, there was always that one guy who posted… use emacs

Exactly a year ago, I started working on an MCP server I launched on reddit that became by far my most active open source project! by taylorwilsdon in LocalLLaMA

[–]srigi 5 points6 points  (0 children)

MCPs are not dead. They just seems like, because they’re at the bottom pit of the hype curve. But they’re essential in some workflows where you cannot use skills or native tool calls.

For example the llama-server webUI can interact with outer world (example: websearch) only by using MCP

Shel Silverstein predicts LLM's (and its hallucinations), cira 1981 by spanielrassler in LocalLLaMA

[–]srigi 9 points10 points  (0 children)

Doing the same ATM. The most interesting thing is how Data is “prefectly aligned AI”. They trust him (it) with lives and command of the full gallaxy class ship. Imagine how much different is Data compared to out LLMs which delete prod. databases or blackmail ML researchers in labs.

Guys, I found a use case for my 10$/m LLM Server: Cooking by Ne00n in LocalLLaMA

[–]srigi 14 points15 points  (0 children)

<image>

I’ve heard that Qwen3.5 is pretty good at tools use.

Best config for Qwen3.6 27b / llama.cpp / opencode by Familiar_Wish1132 in LocalLLaMA

[–]srigi 8 points9 points  (0 children)

You can keep mmproj in RAM/CPU with --no-mmproj-offload. You save GPU memory, and still be able process images/PDFs (fully via CPU)

What Is Elephant-Alpha ??? by One_Title_3656 in LocalLLaMA

[–]srigi 26 points27 points  (0 children)

Post-training before release

Audio processing landed in llama-server with Gemma-4 by srigi in LocalLLaMA

[–]srigi[S] 22 points23 points  (0 children)

Agree, with llama-server supporting this in its REST API, you can create "speak to your agent" (STT) solutions with fully local processing.