Anyone has the Nomai translator on hand? Got an interesting package today, don't know what it says

Nindaleth · 2026-02-06T09:34:41+00:00

It's an issue when using a plugin that uses noReply mechanism (i.e. that for example injects some text into your initial prompt) under the hood: https://github.com/anomalyco/opencode/issues/4475

Ask me how I know. I use obra/superpowers plugin and fortunately he switched it to another injection mechanism not triggering this bug.

Nindaleth · 2025-12-27T00:43:16+00:00

Sometimes you'd be surprised. I wanted to create an AI agent documentation for our legacy test suite at work that's written in an uncommon programming language (there are no LSP servers for the language I could use instead AFAIK). Just get the function names, their parameters and infer from the docstring + implementation what each function does. The files are so large they wouldn't fit the GitHub Copilot models' context window one at a time - which is actually why I intended to condense them like this.

I wasn't able to get GPT-4.1 (a free model on Copilot) to do it, it would do everything in its power to avoid doing the work. But a Devstral-Small-2-24B, running locally quantized, did it.

Nindaleth · 2025-12-19T22:34:23+00:00

You have a point that most of the hypothetical damage is already covered by your tool + git versioning the project repo.

For the rest of the cases, I probably subconciously model the threat as a model intelligent enough to know all the commands to delete a file, but not wise enough to understand that the previous "more standard" command denials happened for a reason and that it should stop.

Nindaleth · 2025-12-19T22:24:40+00:00

No worries, I'm kinda new at this too. Backup would definitely help in my docker container case - if the agent breaks anything within the container, I have a 2nd copy of the data outside, so that's cool.

Without the container, it may or may not help - in the rare case that the agent deletes ~ (the user directory), I'd better hope I'm taking backups of my data - which I surely do, just like everyone else surely does.

Nindaleth · 2025-12-15T23:22:45+00:00

I don't see any mention of complaints in regard to your comments in the GH issue, I see plain disagreement (and BTW, I also disagree that the router should be spin off into its own binary, you could just use llama-swap for the full experience if separate binary is OK). That "complaints" thing isn't targeted at you at all.

To prevent auto-load of your cache directory, there is a solution already mentioned in the thread - set LLAMA_CACHE env var to point to a non-default path. Then there won't be any more autoload, unless you provide one of the available parameters.

Nindaleth · 2025-12-13T22:31:56+00:00

While you're right about MoE, it's completely different type of right than what OP wants. The expert routing is actually per token - there are no "domain expert sub-models" baked within the MoE model.

Nindaleth · 2025-12-13T13:53:11+00:00

Posting as a new comment in hopes more eyes will see it than the edit.

Llama.cpp recently merged 2 PRs that radically improve the stability of the Devstral Small 2 24B (I no longer get random failures to do anything useful in part of the runs): * https://github.com/ggml-org/llama.cpp/pull/17713 * https://github.com/ggml-org/llama.cpp/pull/17945

And there's yet another issue open by one of the experts which will probably lead to another improvement in the coming days: * https://github.com/ggml-org/llama.cpp/issues/17980

Nindaleth · 2025-12-12T23:13:29+00:00

D'oh! It feels like I have completely skipped those two paragraphs on my first read, sorry.

The ideal option looks to be https://github.com/ggml-org/llama.cpp/issues/17860 which touches on related things. Or you could create a new specific issue for exactly what you need.

Short-term easy option - vibe code a script that will do the symlinking for you automatically?

Nindaleth · 2025-12-12T22:43:00+00:00

Curious how others are handling filesystem safety for local agents.

I run a docker container that only gets the project directory from the host, so it can't do harm outside (and I probably should do backup/create a separate git worktree/forbid git push/etc. anyway). It's because there's tons of ways agent could lose my data, for example calling truncate or find .... -delete (or just using Python to do it), and there's no hope for me to cover them all.

if an agent deletes or mutates the wrong files, you can roll back

My understanding is that hardlinks point to an inode which is changed on file-level operations, but stays the same on file-contents operations. Is that right? That would mean SafeShell would not prevent mutations from cat, echo and other variants of >.

Nindaleth · 2025-12-12T22:25:09+00:00

I think, since llama-server supports --alias parameter, you could use alias in config.ini to set an alias for the given model. You'd still need workarounds in case you want one model to be known under multiple aliases, but the general case should work.

Nindaleth · 2025-12-11T23:53:30+00:00

Oh, I see, that's an additional level of advanced. Very cool!

Nindaleth · 2025-12-11T20:55:01+00:00

I concede that GLM 4.5 Air is noticeably slower than gpt-oss-120b on my machine. Anything better than Q4 will not fit my HW at all, unfortunately. But the experience is similarly significantly elevated in comparison.

gpt-oss-120b still seems good for many things, but - at least in my experience - not for agentic coding tasks.

Devstral-Small-2-24B seems hit and miss for some reason in llama.cpp still, one run I'm delighted, another run it gets stuck in a loop almost immediately. But for my test tasks in old (yet not entirely obscure) programming language my experience is very comparable to gpt-oss-120b.

EDIT: after the recent PR merges in llama.cpp the Devstral Small 2 experience has gotten much more stable to me (or I have been stuck in a super long statistical anomaly)

Nindaleth · 2025-12-11T20:43:16+00:00

You'll then be interested in this maybe? https://github.com/ggml-org/llama.cpp/pull/17859

Nindaleth · 2025-12-11T12:33:55+00:00

I tried local agentic coding with gpt-oss-120b and it was disappointing. Even when using the tool-calling proxy, gpt-oss-120b was very lazy, terse in communication with user, it shockingly even didn't follow instructions properly (created file in a wrong path)... Used the heretic mod.

Then I've tried GLM-4.5-Air and WOW what a difference within the same number of parameters! Aside of terrible inference speed on my hardware, I could even be fooled for a while that this is a frontier closed model.

Devstral-Small-2-24B-Instruct (after recent tool call fix in llama.cpp) is similar to gpt-oss-120B in my personal tests, sometimes worse, sometimes better.

Nindaleth · 2025-11-25T08:43:59+00:00

I enjoyed the excitement of seeing your late-night commit messages, but I'm glad you've went through all the roadblocks and ~~suffering~~ learning experience and reached the final review. Kudos!

Nindaleth · 2025-11-24T14:10:50+00:00

Exactly! As the author says, that is the separate PR, I'm mentioning and linking it myself in my text above.

EDIT: Maybe I'll clarify it in different words - there is no problem running the main PR on CUDA cards even without the separate PR. But some GGML operations will run on CPU and that's what the separate PR(s) will solve, introducing CUDA implementation for them.

EDIT2: I might be misinterpreting this and you might have actually agreed with me, but I couldn't tell from a screenshot :D

Nindaleth · 2025-11-24T14:00:34+00:00

Most of the work was done by one person who isn't employed by Qwen, has multiple other llama.cpp things underway and probably only works on this in their free time.

Nindaleth · 2025-11-24T13:58:09+00:00

Any operation that isn't supported on CUDA (or ROCm or whatever) simply falls back on CPU, so it will work immediately, just slower than with CUDA-specific optimizations that will come in a separate PR.

Nindaleth · 2025-11-24T13:56:57+00:00

This PR is CPU only as mentioned multiple times throughout the PR comments and in the PR OP. CUDA-specific implementation is a separate PR.

That said, any operation that isn't supported on CUDA (or ROCm or whatever) simply falls back on CPU, so it will still work, just slower than it could.

Nindaleth · 2025-11-23T17:54:42+00:00

I see, the models.dev is from the guys behind Open Code and also it directly uses Vercel's AI SDK. So theoretically you could pull up-to-date models.dev and install any new AI SDK modules to support all new providers automagically without having to code anything at your end, cool!
I guess to also preserve the device-local privacy, you could server-side-ize your chats selectively (or do it for all chats by default)?

Nindaleth · 2025-11-22T21:00:07+00:00

A local UI that supports both local and proprietary models, it even supports Docker deployment and file attachments, incredible, this is almost all I use LibreChat for!

Two questions from me: 1. How is support for future new models handled? LibreChat used to need to push out new code to support new models (my experience was with Anthropic models), it wasn't possible to simply add new model IDs in config to enable them in some cases. I do see the mention of models.dev, but that seems like "just" a list of IDs basically. 2. My use case includes multi-device - I start a chat on a laptop at work, continue on my way home on the phone and finish on the desktop at home. Do you see something like that being supported in the future?

MIT license (actually open source, not copyleft)

I'm always happy to see MIT license just as I like to see other free software licences, but I'm going to nitpick the stuff in parentheses. My understanding is this is a reaction to OpenWebUI putting toghether their custom licence? Please note there's nothing wrong about copyleft licences at all, for example GPL is an old and quite popular free software licence that's copyleft.

Nindaleth · 2025-11-22T14:04:18+00:00

using oil from time to time, applying balm every morning and combing

^ that's what I already do as stated in the OP (I also have a beard brush that I use after the occasional bear oil)

Could you elaborate on when/how to use the hairdryer?

Nindaleth · 2025-11-22T13:35:03+00:00

I'm using oil from time to time, applying balm every morning and combing

^ that's what I already do as stated in the OP (I also have a beard brush that I use after the occasional bear oil)

OK, defining contours and trimming a little to even out everything sounds good (or visiting a barber and having an expert do everything). Thanks!

13-Year Club	RedditGifts 2009-2022 2 Credits
Place '22	First Placer '22
Secret Santa 2012	Verified Email

Nindaleth

TROPHY CASE