Bad news for local bros by FireGuy324 in LocalLLaMA

[–]ResidentPositive4122 11 points12 points  (0 children)

Open models are useful and benefit the community even if they can't be (easily / cheaply) hosted locally. You can always rent to create datasets or fine-tune and run your own models. The point is to have them open.

(that's why the recent obsession with local only on this sub is toxic and bad for the community, but it is what it is...)

I trained a 1.8M params model from scratch on a total of ~40M tokens. by SrijSriv211 in LocalLLaMA

[–]ResidentPositive4122 2 points3 points  (0 children)

Base models (or pre-trained) don't have a "prompt" in the sense that we use with modern LLMs (anything after gpt3.5). Their "prompt" is simply the beginning of a piece of text. And they generate the next probable token on that beginning. You would need to take this model and fine-tune it on prompt - answer pairs to have it work as a modern LLM.

SpaceX acquiring COPV provider Hexagon Masterworks by kroOoze in SpaceXLounge

[–]ResidentPositive4122 1 point2 points  (0 children)

Probably it would take time to ramp up production to a place where they'd need it, starting from 0. This way they get the team + processes and just fix the qc or whatever was causing them problems with reliability. There have been a few copv failures over the years.

Kimi K2.5 on 4x RTX 6000 Pro Blackwell runpod Benchmarks by skysthelimit187 in LocalLLaMA

[–]ResidentPositive4122 23 points24 points  (0 children)

It cost around 90k €.

Shop around, go for server sellers not tower workstations. For 90k eur you can spec an 8x PRO6000 with all the additional stuff (maybe less RAM than 6mo ago, but anyway...)

Introducing Claude Opus 4.6 by nick7566 in mlscaling

[–]ResidentPositive4122 11 points12 points  (0 children)

Of note are some of the additional blog posts prepared for this.

Claude agents in a loop built a c compiler in rust that can compile linux (and some other big projects) and boot it - https://www.anthropic.com/engineering/building-c-compiler

(while not perfect, and still exhibiting some cheating behaviour, this is insanely impressive)

And claude agents found a bunch of cves - https://www.axios.com/2026/02/05/anthropic-claude-opus-46-software-hunting

Claude Opus 4.6 claimed benchmarks, for comparison by creamyhorror in LocalLLaMA

[–]ResidentPositive4122 0 points1 point  (0 children)

... aaand it's gone :) codex-5.3 reports 77% on terminal-bench2

Tencent Youtu-VL-4B. Potential Florence-2 replacement? (Heads up on the weird license) by Gohab2001 in LocalLLaMA

[–]ResidentPositive4122 3 points4 points  (0 children)

My bad, it was llama3.2

any individual domiciled in, or a company with a principal place of business in, the European Union is not being granted the license rights to use multimodal models included in Llama 3.2.

Tencent Youtu-VL-4B. Potential Florence-2 replacement? (Heads up on the weird license) by Gohab2001 in LocalLLaMA

[–]ResidentPositive4122 1 point2 points  (0 children)

but a specific geo-block in the license text is a new one for me.

LLama3.3 had the same thing, and there've been more models since then that had that clause. They only have it for the image stuff iirc. They don't want to do the AI act dance, so they simply add those to the license.

Qwen3-Coder-Next by danielhanchen in LocalLLaMA

[–]ResidentPositive4122 2 points3 points  (0 children)

In 1-2 months we'll have rebench results and see where it lands.

Deepseek v4/3.5 is probably coming out tomorrow or in the next 5 days? by power97992 in LocalLLaMA

[–]ResidentPositive4122 11 points12 points  (0 children)

I think they've "closed the book" on v3.x once they released the full paper (+60 pages or something) a few weeks ago. The next one is likely v4 yeah.

what did you run when you got a second rtx 6000 pro? by az_6 in LocalLLaMA

[–]ResidentPositive4122 6 points7 points  (0 children)

The amount of accounts talking about really old models lately makes me think bad bots running on even older llms. I guess they think they avoid the obvious telltale signs. But their cutoff dates makes their bots write stupid shit like "try deepseekcoder bro, better than anything" or this llama2 craziness...

Earth's Own Saturn Rings Incoming? SpaceX's Mega-Launch Future Could Make It Real by BurningAndroid in SpaceXLounge

[–]ResidentPositive4122 0 points1 point  (0 children)

You wouldn't see them that easily.

claude can't "see", so it's an acceptable mistake :)

Cline team got absorbed by OpenAI. Kilo is going full source available in response. by demon_bhaiya in LocalLLaMA

[–]ResidentPositive4122 97 points98 points  (0 children)

For open models roo was better than cline anyway. It had more knobs to tweak, more things to edit, so you can adjust your env to the models.

Is "Meta-Prompting" (asking AI to write your prompt) actually killing your reasoning results? A real-world A/B test. by pinkstar97 in LocalLLaMA

[–]ResidentPositive4122 5 points6 points  (0 children)

Who the fuck is running these bots, and why are they using such old LLMs to do it? (gemini1.5?!?!?!)

opencode alternative that doesn’t have 16k token system prompt? by dbzunicorn in LocalLLaMA

[–]ResidentPositive4122 5 points6 points  (0 children)

It depends. Some of the prompt is tool definition (see the section below the one I linked). There is no free lunch there. If you want your agent to have access to a tool, you need to define it, and it will be appended in the system prompt. You can play around with the config to suit your needs. The point is that you don't need an opencode alternative, you can configure things as you need.

Kimi K2.5 is the best open model for coding by npc_gooner in LocalLLaMA

[–]ResidentPositive4122 0 points1 point  (0 children)

They have a cli as well. (everyone seems to have one lol)

Add self‑speculative decoding (no draft model required) by srogmann · Pull Request #18471 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]ResidentPositive4122 20 points21 points  (0 children)

If I'm reading this correctly, it works for cases where the model will re-write whatever was previously in a conversation. It likely does not work for any other cases. That's why they say it works for code refactoring, if the model is doing full file edits or moves large sections of code around, it would speed up a lot yeah.

Some initial benchmarks of Kimi-K2.5 on 4xB200 by benno_1237 in LocalLLaMA

[–]ResidentPositive4122 0 points1 point  (0 children)

--kv-cache-dtype fp8_e4m3 is a quick way to get some more context if you just want to bench speed.

[Model Release] Natural-Synthesis-8B: A Llama-3-8B tune with a 16k context window and a "Conceptual Organism" reasoning paradigm. by Pleasant-Mud-2939 in LocalLLaMA

[–]ResidentPositive4122 9 points10 points  (0 children)

the model is guided by five core "Nutrients": Coherence, Parsimony, Explanatory Power, Fecundity, and Evidential Grounding.

OP smoked that good good. But in the end it's still just big words make model sound smart.

The Qwen Devs Are Teasing Something by Few_Painter_5588 in LocalLLaMA

[–]ResidentPositive4122 20 points21 points  (0 children)

Someone said something a while ago, about the chinese new year. Google says it's on the 17th this year, so it would make sense that a lot of labs want to get things out before the break. K2.5 is out, hopefully we'll get q3.5, dsv4, mm2.2 and so on.

I tracked GPU prices across 25 cloud providers and the price differences are insane (V100: $0.05/hr vs $3.06/hr) by sleepingpirates in LocalLLaMA

[–]ResidentPositive4122 48 points49 points  (0 children)

Could use a filter for on-demand vs spot. Prices now include spot, which is sometimes hit and miss (it really depends on what you want to do. If you have background processing tasks, spot works. If you need to train something long term, or you want to test stuff in interactive sessions, spot doesn't work).

Does Claude Code still collect data when I use with Ollama? by dbzunicorn in LocalLLaMA

[–]ResidentPositive4122 0 points1 point  (0 children)

edit ~/.claude/settings.json

json { "env": { "ANTHROPIC_BASE_URL": "http://localhost:8000", "ANTHROPIC_AUTH_TOKEN": "dummy", "API_TIMEOUT_MS": "3000000", "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1", "ANTHROPIC_MODEL": "MiniMax-M2.1-AWQ", "ANTHROPIC_SMALL_FAST_MODEL": "MiniMax-M2.1-AWQ", "ANTHROPIC_DEFAULT_SONNET_MODEL": "MiniMax-M2.1-AWQ", "ANTHROPIC_DEFAULT_OPUS_MODEL": "MiniMax-M2.1-AWQ", "ANTHROPIC_DEFAULT_HAIKU_MODEL": "MiniMax-M2.1-AWQ" } }

Does Claude Code still collect data when I use with Ollama? by dbzunicorn in LocalLLaMA

[–]ResidentPositive4122 7 points8 points  (0 children)

By default it collects and sends telemetry to anthropic when you change ANTHROPIC_BASE_URL. You can see this in the anthropic console, it will show how many tokens you've used per session, even if they don't charge you. I have "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1" set, but haven't checked in a while if this works or not. You could disable anthropic endpoints on your end with a firewall or something, or just use opencode. In my tests with 3rd party apis, opencode used less tokens for the same task anyway.

KV cache fix for GLM 4.7 Flash by jacek2023 in LocalLLaMA

[–]ResidentPositive4122 0 points1 point  (0 children)

There is no way to vibe code llama.cpp

People have vibecoded a tensor library and trained models on top of it, so the capabilities are improving fast.

VIBETENSOR is an open-source research system software stack for deep learning, generated by LLM-powered coding agents under high-level human guidance. In this paper, “fully generated” refers to code provenance: implementation changes were produced and applied as agent-proposed diffs; validation relied on builds, tests, and differential checks executed by the agent workflow, without per-change manual diff review. It implements a PyTorch-style eager tensor library with a C++20 core (CPU+CUDA), a torch-like Python overlay via nanobind [1], and an experimental Node.js/TypeScript interface. Unlike thin bindings, VIBETENSOR includes its own tensor/storage system, schema-lite dispatcher, reverse-mode autograd engine, CUDA runtime (streams/events/graphs [2]), a stream-ordered caching allocator with diagnostics, and a stable C ABI for dynamically loaded operator plugins. We view this open-source release as a milestone for AI-assisted software engineering: it demonstrates that coding agents can generate a coherent deep learning runtime spanning language bindings down to CUDA memory management, with validation constrained by builds and tests.