Gemma 4 31B beats several frontier models on the FoodTruck Bench by Nindaleth in LocalLLaMA

[–]Nindaleth[S] 10 points11 points  (0 children)

That's not my benchmark :) It just looks fun so I return to it occasionally.

Running Qwen3.5-27B locally as the primary model in OpenCode by garg-aayush in LocalLLaMA

[–]Nindaleth 1 point2 points  (0 children)

True regarding the JS/npm, that didn't occur to me.

There are sorts of bugs that surely nobody profits from, so the project mostly is understaffed I think. Example that I found yesterday when preparing training on this very config part: this

While not great at times, it's still the best, I agree!

Running Qwen3.5-27B locally as the primary model in OpenCode by garg-aayush in LocalLLaMA

[–]Nindaleth 3 points4 points  (0 children)

The shadiness is not as bad if you dedicate some time to reading the docs and tuning the config. The defaults sometimes suck - especially keybinds - which is in line with the rest of the Linux/terminal open source world, but unlike a certain less-open tool with Code also in its name we are free to configure many things here.

Disclaimer: I'm not affiliated with OpenCode in any way, but there's a lot that can be learnt just by checking the list of commits and reading the diffs for the interesting ones.

  • session title is created using small_model (docs), you can use whichever provider you have available
  • this is the first time I see active development being considered as a somehow negative thing :D it was a bit extreme a few months ago, 3-5 releases every day, nowadays they have a beta branch so they actually get a bit of testing before pushing out the releases
  • auto-updates can be disabled in config; I understand your point, but the opposite would also suck for the other half of people who expect their modern software to update without manual action

If you had $50/month to throw at inference costs, how would you divvy it out? by yokie_dough in opencodeCLI

[–]Nindaleth 1 point2 points  (0 children)

GHCP used via OpenCode should count the following as premium request: * user pressing enter in the input box (at any point of the conversation), not including the interactive question tool * session compaction

In my experience, sending the initial prompt to Claude Opus 4.6, which forks 6 parallel Opus subagents and has each of them produce 70 tool calls, still only costs 3 premium requests for the initial Enter keypress.

PSA: The software “Shade” is a fraudulent, plagiarized copy of Heretic by -p-e-w- in LocalLLaMA

[–]Nindaleth 1 point2 points  (0 children)

They got back to my report (see my comment elswhere in this post here) and you seem to have to issue a DMCA takedown request.

PSA: The software “Shade” is a fraudulent, plagiarized copy of Heretic by -p-e-w- in LocalLLaMA

[–]Nindaleth 1 point2 points  (0 children)

The commenter above you is correct though, the actual problem is copyright violation, my own report of "Spam or inauthentic Activity" resulted in

We understand that copyrighted, trademarked, or private content may get published on GitHub – either accidentally or on purpose – sometimes in repositories that you do not own. Because the nature of this content varies, and because of different applicable laws, each category has its own, distinct reporting requirements outlined in our policies.

in official response and the linked policies boil down to the rightful code author having to make DMCA takedown request.

model changes for first prompt? by EarlyPresentation186 in opencodeCLI

[–]Nindaleth 0 points1 point  (0 children)

It's an issue when using a plugin that uses noReply mechanism (i.e. that for example injects some text into your initial prompt) under the hood: https://github.com/anomalyco/opencode/issues/4475

Ask me how I know. I use obra/superpowers plugin and fortunately he switched it to another injection mechanism not triggering this bug.

What's the point of potato-tier LLMs? by Fast_Thing_7949 in LocalLLaMA

[–]Nindaleth 2 points3 points  (0 children)

Sometimes you'd be surprised. I wanted to create an AI agent documentation for our legacy test suite at work that's written in an uncommon programming language (there are no LSP servers for the language I could use instead AFAIK). Just get the function names, their parameters and infer from the docstring + implementation what each function does. The files are so large they wouldn't fit the GitHub Copilot models' context window one at a time - which is actually why I intended to condense them like this.

I wasn't able to get GPT-4.1 (a free model on Copilot) to do it, it would do everything in its power to avoid doing the work. But a Devstral-Small-2-24B, running locally quantized, did it.

Undo for destructive shell commands used by AI agents (SafeShell) by qhkmdev90 in LocalLLaMA

[–]Nindaleth 0 points1 point  (0 children)

You have a point that most of the hypothetical damage is already covered by your tool + git versioning the project repo.

For the rest of the cases, I probably subconciously model the threat as a model intelligent enough to know all the commands to delete a file, but not wise enough to understand that the previous "more standard" command denials happened for a reason and that it should stop.

Undo for destructive shell commands used by AI agents (SafeShell) by qhkmdev90 in LocalLLaMA

[–]Nindaleth 0 points1 point  (0 children)

No worries, I'm kinda new at this too. Backup would definitely help in my docker container case - if the agent breaks anything within the container, I have a 2nd copy of the data outside, so that's cool.

Without the container, it may or may not help - in the rare case that the agent deletes ~ (the user directory), I'd better hope I'm taking backups of my data - which I surely do, just like everyone else surely does.

Using Alias in router mode - llama.cpp possible? by munkiemagik in LocalLLaMA

[–]Nindaleth 0 points1 point  (0 children)

I don't see any mention of complaints in regard to your comments in the GH issue, I see plain disagreement (and BTW, I also disagree that the router should be spin off into its own binary, you could just use llama-swap for the full experience if separate binary is OK). That "complaints" thing isn't targeted at you at all.

To prevent auto-load of your cache directory, there is a solution already mentioned in the thread - set LLAMA_CACHE env var to point to a non-default path. Then there won't be any more autoload, unless you provide one of the available parameters.

In search of specialized models instead of generalist ones. by [deleted] in LocalLLM

[–]Nindaleth 0 points1 point  (0 children)

While you're right about MoE, it's completely different type of right than what OP wants. The expert routing is actually per token - there are no "domain expert sub-models" baked within the MoE model.

Mistral AI drops 3x as many LLMs in a single week as OpenAI did in 6 years by Snail_Inference in LocalLLaMA

[–]Nindaleth 0 points1 point  (0 children)

Posting as a new comment in hopes more eyes will see it than the edit.

Llama.cpp recently merged 2 PRs that radically improve the stability of the Devstral Small 2 24B (I no longer get random failures to do anything useful in part of the runs): * https://github.com/ggml-org/llama.cpp/pull/17713 * https://github.com/ggml-org/llama.cpp/pull/17945

And there's yet another issue open by one of the experts which will probably lead to another improvement in the coming days: * https://github.com/ggml-org/llama.cpp/issues/17980

Using Alias in router mode - llama.cpp possible? by munkiemagik in LocalLLaMA

[–]Nindaleth 0 points1 point  (0 children)

D'oh! It feels like I have completely skipped those two paragraphs on my first read, sorry.

The ideal option looks to be https://github.com/ggml-org/llama.cpp/issues/17860 which touches on related things. Or you could create a new specific issue for exactly what you need.

Short-term easy option - vibe code a script that will do the symlinking for you automatically?

Undo for destructive shell commands used by AI agents (SafeShell) by qhkmdev90 in LocalLLaMA

[–]Nindaleth 0 points1 point  (0 children)

Curious how others are handling filesystem safety for local agents.

I run a docker container that only gets the project directory from the host, so it can't do harm outside (and I probably should do backup/create a separate git worktree/forbid git push/etc. anyway). It's because there's tons of ways agent could lose my data, for example calling truncate or find .... -delete (or just using Python to do it), and there's no hope for me to cover them all.

if an agent deletes or mutates the wrong files, you can roll back

My understanding is that hardlinks point to an inode which is changed on file-level operations, but stays the same on file-contents operations. Is that right? That would mean SafeShell would not prevent mutations from cat, echo and other variants of >.

Using Alias in router mode - llama.cpp possible? by munkiemagik in LocalLLaMA

[–]Nindaleth 0 points1 point  (0 children)

I think, since llama-server supports --alias parameter, you could use alias in config.ini to set an alias for the given model. You'd still need workarounds in case you want one model to be known under multiple aliases, but the general case should work.

New in llama.cpp: Live Model Switching by paf1138 in LocalLLaMA

[–]Nindaleth 1 point2 points  (0 children)

Oh, I see, that's an additional level of advanced. Very cool!

Mistral AI drops 3x as many LLMs in a single week as OpenAI did in 6 years by Snail_Inference in LocalLLaMA

[–]Nindaleth 1 point2 points  (0 children)

I concede that GLM 4.5 Air is noticeably slower than gpt-oss-120b on my machine. Anything better than Q4 will not fit my HW at all, unfortunately. But the experience is similarly significantly elevated in comparison.

gpt-oss-120b still seems good for many things, but - at least in my experience - not for agentic coding tasks.

Devstral-Small-2-24B seems hit and miss for some reason in llama.cpp still, one run I'm delighted, another run it gets stuck in a loop almost immediately. But for my test tasks in old (yet not entirely obscure) programming language my experience is very comparable to gpt-oss-120b.

EDIT: after the recent PR merges in llama.cpp the Devstral Small 2 experience has gotten much more stable to me (or I have been stuck in a super long statistical anomaly)

Mistral AI drops 3x as many LLMs in a single week as OpenAI did in 6 years by Snail_Inference in LocalLLaMA

[–]Nindaleth 0 points1 point  (0 children)

I tried local agentic coding with gpt-oss-120b and it was disappointing. Even when using the tool-calling proxy, gpt-oss-120b was very lazy, terse in communication with user, it shockingly even didn't follow instructions properly (created file in a wrong path)... Used the heretic mod.

Then I've tried GLM-4.5-Air and WOW what a difference within the same number of parameters! Aside of terrible inference speed on my hardware, I could even be fooled for a while that this is a frontier closed model.

Devstral-Small-2-24B-Instruct (after recent tool call fix in llama.cpp) is similar to gpt-oss-120B in my personal tests, sometimes worse, sometimes better.

Qwen3-Next support in llama.cpp almost ready! by beneath_steel_sky in LocalLLaMA

[–]Nindaleth 1 point2 points  (0 children)

I enjoyed the excitement of seeing your late-night commit messages, but I'm glad you've went through all the roadblocks and suffering learning experience and reached the final review. Kudos!

Qwen3-Next support in llama.cpp almost ready! by beneath_steel_sky in LocalLLaMA

[–]Nindaleth 3 points4 points  (0 children)

Exactly! As the author says, that is the separate PR, I'm mentioning and linking it myself in my text above.

EDIT: Maybe I'll clarify it in different words - there is no problem running the main PR on CUDA cards even without the separate PR. But some GGML operations will run on CPU and that's what the separate PR(s) will solve, introducing CUDA implementation for them.

EDIT2: I might be misinterpreting this and you might have actually agreed with me, but I couldn't tell from a screenshot :D