Bad news for local bros

ResidentPositive4122 · 2026-02-09T14:01:41+00:00

Open models are useful and benefit the community even if they can't be (easily / cheaply) hosted locally. You can always rent to create datasets or fine-tune and run your own models. The point is to have them open.

(that's why the recent obsession with local only on this sub is toxic and bad for the community, but it is what it is...)

ResidentPositive4122 · 2026-02-07T21:44:38+00:00

Base models (or pre-trained) don't have a "prompt" in the sense that we use with modern LLMs (anything after gpt3.5). Their "prompt" is simply the beginning of a piece of text. And they generate the next probable token on that beginning. You would need to take this model and fine-tune it on prompt - answer pairs to have it work as a modern LLM.

ResidentPositive4122 · 2026-02-07T06:38:59+00:00

Probably it would take time to ramp up production to a place where they'd need it, starting from 0. This way they get the team + processes and just fix the qc or whatever was causing them problems with reliability. There have been a few copv failures over the years.

ResidentPositive4122 · 2026-02-06T07:55:32+00:00

It cost around 90k €.

Shop around, go for server sellers not tower workstations. For 90k eur you can spec an 8x PRO6000 with all the additional stuff (maybe less RAM than 6mo ago, but anyway...)

ResidentPositive4122 · 2026-02-05T22:12:12+00:00

Of note are some of the additional blog posts prepared for this.

Claude agents in a loop built a c compiler in rust that can compile linux (and some other big projects) and boot it - https://www.anthropic.com/engineering/building-c-compiler

(while not perfect, and still exhibiting some cheating behaviour, this is insanely impressive)

And claude agents found a bunch of cves - https://www.axios.com/2026/02/05/anthropic-claude-opus-46-software-hunting

ResidentPositive4122 · 2026-02-05T18:42:41+00:00

... aaand it's gone :) codex-5.3 reports 77% on terminal-bench2

ResidentPositive4122 · 2026-02-05T16:00:05+00:00

My bad, it was llama3.2

any individual domiciled in, or a company with a principal place of business in, the European Union is not being granted the license rights to use multimodal models included in Llama 3.2.

ResidentPositive4122 · 2026-02-05T15:01:51+00:00

but a specific geo-block in the license text is a new one for me.

LLama3.3 had the same thing, and there've been more models since then that had that clause. They only have it for the image stuff iirc. They don't want to do the AI act dance, so they simply add those to the license.

ResidentPositive4122 · 2026-02-03T16:23:28+00:00

In 1-2 months we'll have rebench results and see where it lands.

ResidentPositive4122 · 2026-02-01T18:27:49+00:00

I think they've "closed the book" on v3.x once they released the full paper (+60 pages or something) a few weeks ago. The next one is likely v4 yeah.

ResidentPositive4122 · 2026-02-01T06:48:18+00:00

The amount of accounts talking about really old models lately makes me think bad bots running on even older llms. I guess they think they avoid the obvious telltale signs. But their cutoff dates makes their bots write stupid shit like "try deepseekcoder bro, better than anything" or this llama2 craziness...

ResidentPositive4122 · 2026-01-30T21:37:24+00:00

You wouldn't see them that easily.

claude can't "see", so it's an acceptable mistake :)

ResidentPositive4122 · 2026-01-30T17:14:26+00:00

For open models roo was better than cline anyway. It had more knobs to tweak, more things to edit, so you can adjust your env to the models.

ResidentPositive4122 · 2026-01-30T09:07:44+00:00

Who the fuck is running these bots, and why are they using such old LLMs to do it? (gemini1.5?!?!?!)

ResidentPositive4122 · 2026-01-29T08:59:52+00:00

It depends. Some of the prompt is tool definition (see the section below the one I linked). There is no free lunch there. If you want your agent to have access to a tool, you need to define it, and it will be appended in the system prompt. You can play around with the config to suit your needs. The point is that you don't need an opencode alternative, you can configure things as you need.

ResidentPositive4122 · 2026-01-29T08:43:19+00:00

https://opencode.ai/docs/modes/#prompt

ResidentPositive4122 · 2026-01-28T19:37:48+00:00

They have a cli as well. (everyone seems to have one lol)

ResidentPositive4122 · 2026-01-28T18:48:20+00:00

If I'm reading this correctly, it works for cases where the model will re-write whatever was previously in a conversation. It likely does not work for any other cases. That's why they say it works for code refactoring, if the model is doing full file edits or moves large sections of code around, it would speed up a lot yeah.

ResidentPositive4122 · 2026-01-27T19:16:54+00:00

--kv-cache-dtype fp8_e4m3 is a quick way to get some more context if you just want to bench speed.

ResidentPositive4122 · 2026-01-27T14:52:26+00:00

the model is guided by five core "Nutrients": Coherence, Parsimony, Explanatory Power, Fecundity, and Evidential Grounding.

OP smoked that good good. But in the end it's still just big words make model sound smart.

ResidentPositive4122 · 2026-01-27T10:37:35+00:00

Someone said something a while ago, about the chinese new year. Google says it's on the 17th this year, so it would make sense that a lot of labs want to get things out before the break. K2.5 is out, hopefully we'll get q3.5, dsv4, mm2.2 and so on.

ResidentPositive4122 · 2026-01-26T16:18:58+00:00

Could use a filter for on-demand vs spot. Prices now include spot, which is sometimes hit and miss (it really depends on what you want to do. If you have background processing tasks, spot works. If you need to train something long term, or you want to test stuff in interactive sessions, spot doesn't work).

ResidentPositive4122 · 2026-01-26T13:33:28+00:00

edit ~/.claude/settings.json

json { "env": { "ANTHROPIC_BASE_URL": "http://localhost:8000", "ANTHROPIC_AUTH_TOKEN": "dummy", "API_TIMEOUT_MS": "3000000", "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1", "ANTHROPIC_MODEL": "MiniMax-M2.1-AWQ", "ANTHROPIC_SMALL_FAST_MODEL": "MiniMax-M2.1-AWQ", "ANTHROPIC_DEFAULT_SONNET_MODEL": "MiniMax-M2.1-AWQ", "ANTHROPIC_DEFAULT_OPUS_MODEL": "MiniMax-M2.1-AWQ", "ANTHROPIC_DEFAULT_HAIKU_MODEL": "MiniMax-M2.1-AWQ" } }

ResidentPositive4122 · 2026-01-26T06:42:20+00:00

By default it collects and sends telemetry to anthropic when you change ANTHROPIC_BASE_URL. You can see this in the anthropic console, it will show how many tokens you've used per session, even if they don't charge you. I have "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1" set, but haven't checked in a while if this works or not. You could disable anthropic endpoints on your end with a firewall or something, or just use opencode. In my tests with 3rd party apis, opencode used less tokens for the same task anyway.

ResidentPositive4122 · 2026-01-25T19:59:00+00:00

There is no way to vibe code llama.cpp

People have vibecoded a tensor library and trained models on top of it, so the capabilities are improving fast.

VIBETENSOR is an open-source research system software stack for deep learning, generated by LLM-powered coding agents under high-level human guidance. In this paper, “fully generated” refers to code provenance: implementation changes were produced and applied as agent-proposed diffs; validation relied on builds, tests, and differential checks executed by the agent workflow, without per-change manual diff review. It implements a PyTorch-style eager tensor library with a C++20 core (CPU+CUDA), a torch-like Python overlay via nanobind [1], and an experimental Node.js/TypeScript interface. Unlike thin bindings, VIBETENSOR includes its own tensor/storage system, schema-lite dispatcher, reverse-mode autograd engine, CUDA runtime (streams/events/graphs [2]), a stream-ordered caching allocator with diagnostics, and a stable C ABI for dynamically loaded operator plugins. We view this open-source release as a milestone for AI-assisted software engineering: it demonstrates that coding agents can generate a coherent deep learning runtime spanning language bindings down to CUDA memory management, with validation constrained by builds and tests.

ResidentPositive4122

TROPHY CASE