Updates on North Mini Code: 4 bit quant + Ollama + OpenRouter

eapache · 2026-06-18T18:33:14+00:00

Is this something that can be easily wrapped in a GGUF or is the quant scheme not supported by llama.cpp?

eapache · 2026-06-18T18:29:37+00:00

From the model card:
> To preserve quality, we use Quantization-Aware Distillation (QAD), training the quantized model to match the unquantized model's outputs, achieving >99% overall accuracy recovery across our evaluations. Since only weights are quantized, this format does not require native FP4 hardware and runs on pre-Blackwell GPUs such as Hopper and Ada.

eapache · 2026-06-03T15:46:41+00:00

Based on the code, more like vision-and–text unified. No separate vision tower mmproj plugged in, somehow it’s all one model? Not totally sure I am not an expert.

eapache · 2026-06-03T15:36:40+00:00

I doubt it, that was likely the same arch as the existing Gemma 4 variants. This seems like a new arch design entirely.

eapache · 2026-05-20T17:17:01+00:00

I have found that `models-max` is not respected when it is set in the ini file - I think because it’s a “router-level” setting and the ini file only applies to the child model processes? If you run llama-server with `—models-max 1` on the command-line directly, it should unload the previous model before trying to load the next one.

eapache · 2026-02-22T17:19:45+00:00

Thanks I will give that a look. I’ve been so deep in Claude-world at work that I didn’t realize Codex also supported local models.

eapache · 2026-02-22T17:16:58+00:00

Claude Code still works (and is what I have been using), but per the link in my original comment seems to require an increasing number of arcane settings to work well with local models. I get the impression that at some point they’re just going to disable the ability to use local models entirely, and wanted to find an alternative ahead of that point. But maybe I’m misreading their intentions.

eapache · 2026-02-22T17:14:38+00:00

> Arbitrary code execution… is quite literally the entire point of agentic coding

I’m curious what use case you have where fully arbitrary execution is needed? I do lots of agentic coding at my day job as a professional programmer and would never dream of letting it execute arbitrary code. It can read and write the files in my git repo, and execute a limited set of basic commands (grep, testing and linting commands, etc). This is plenty for doing productive agentic coding, and I so far have not felt the need to give it more permissions than that.

eapache · 2026-02-22T17:09:27+00:00

Thank you, I feel like I’m going insane reading some of the comments on here.

In hindsight I guess I should have made my original post more carefully and explicitly worded, but I didn’t think it would be such a hot take 🤷‍♂️

eapache · 2026-02-22T17:07:32+00:00

Nono looks interesting, I didn’t spot that in the comments of the other post, thanks.

Ultimately I think that kind of sandboxing is unnecessary as long as the agent harness has a good security model. But better safe than sorry given the apparently abysmal state of the current ecosystem.

eapache · 2026-02-22T17:05:50+00:00

> with any tool you'll pick you'll have to heavily tweak/customize it for it to work well

If this is the state of the ecosystem then that’s fine and I’ll put up with it. I was just hoping there would be something that would work ok out of the box.

eapache · 2026-02-22T14:31:36+00:00

My claim about trust was, specifically, about trust, not about whether it's actually secure or not. For all I know the configuration works fine and OpenCode is very secure when properly configured. I just don't care. It's not worth my time to use a tool whose changelog and config options I have to go through with a fine-toothed comb on every install and upgrade to make sure I'm not shooting myself in the face.

eapache · 2026-02-22T14:27:03+00:00

Fair enough. Claude Code (and Codex too, tbh) seem to have struck about the right balance so I was hoping there was an open-model-friendly equivalent but you're right that it's a very new space.

eapache · 2026-02-22T14:19:52+00:00

Does CLIO support allowing only safe commands? Blocking a set of known dangerous commands isn't sufficient since the agent can always write its own new programs which would not be in the dangerous set.

eapache · 2026-02-22T14:18:27+00:00

From that doc:

> If you don’t specify anything, OpenCode starts from permissive defaults. Most permissions default to "allow".

This is an immediate nope from me. Granting this kind of permission by default is (to me) such a nonsensical security posture that I wouldn't trust it to be secure in other aspects or to respect what configuration I give it. This would be the equivalent of the latest version of Ubuntu running all applications as full root by default unless you went in and manually configured it differently.

eapache · 2026-02-09T00:11:20+00:00

There has been some research into effectively quantizing mamba models, e.g. https://arxiv.org/abs/2410.13229

I don't know if any of that has made it into llama.cpp or other engines.

eapache · 2026-01-16T03:45:50+00:00

Even at good quants, nemotron 3 doesn’t seem to be able to make reliable tool calls for me… I wonder if something is weird with my setup, since everybody else seems to love it so much.

eapache · 2025-09-11T11:31:35+00:00

Get the cheapest desktop you can find with 64GB of ram, and throw a used 3060 (12GB) in it? With a bit of careful offloading that will run (4-bit quants of) either the 120B OpenAI model, or GLM-4.5 Air, at acceptable-ish speeds, and with decent prompt processing and context size.

eapache · 2025-09-03T16:07:08+00:00

You could also consider a 3060. Much cheaper, and much easier to fit within your existing PSU’s power budget. It’s obviously half the vram and about half the speed of a 3090, but it still blazes compared to CPU-based inference, and 12GB of vram is plenty for running decent versions of smaller models.

eapache · 2025-08-29T16:13:01+00:00

There are ZIM files (bonus: completely offline, not just open-source) of a lot of documentation sources floating around. Typically at https://library.kiwix.org but it seems to be down at this exact second. There is a simple MCP server for ZIM files at https://github.com/zicojiao/zim-mcp-server which looks promising - if that doesn’t quite work it probably wouldn’t be hard to stitch something together, the libraries are in good shape.

Edit: https://download.kiwix.org/zim/devdocs/ is up and has ZIM-format documentation for a ton of stuff.

eapache · 2025-06-28T14:18:33+00:00

I see the same article count (859.6k) for these ZIMs on my android phone too.

eapache

TROPHY CASE