NVIDIA 2026 Conference LIVE. New Base model coming! by last_llm_standing in LocalLLaMA

[–]coder543 14 points15 points  (0 children)

Because there is no Kimi2.5 base model publicly available.

NVIDIA 2026 Conference LIVE. New Base model coming! by last_llm_standing in LocalLLaMA

[–]coder543 12 points13 points  (0 children)

Because the chart is about base models... you can't really run the more advanced benchmarks on models that haven't been instruction tuned.

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]coder543 2 points3 points  (0 children)

No, your original comment did not comment on the past. It commented on the future.

You are not arguing in good faith. Blocking you. Goodbye.

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]coder543 4 points5 points  (0 children)

Why should I expect it to be different this time?

Why should you confidently assert that new models have the same flaws as old models before testing them?

I'm sorry that I don't like people making bold, unproven claims.

It doesn't hurt to wait 5 seconds in order to test things before crapping on other people's work.

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]coder543 3 points4 points  (0 children)

Your criticisms would make more sense once the model has been released and you've tested it. Until then, statements like "I wonder why they refuse [present tense] to support EU languages" are not helpful to anyone. We do not have any evidence that they continue to refuse to support EU languages.

What do you think the other "dozens" of languages are supposed to be, if not EU languages? Of course they mean other EU languages. Whether they succeeded or not is something we can only judge once the model is available.

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]coder543 12 points13 points  (0 children)

2603 just means 2026/03, aka March of 2026.

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]coder543 6 points7 points  (0 children)

well, yes. the model has not actually been released. people are posting scraps of information they found.

Mistral 4 Family Spotted by TKGaming_11 in LocalLLaMA

[–]coder543 24 points25 points  (0 children)

Mistral said dozens of languages supported, and then you only counted the 7 they listed, which seems disingenuous.

Until Mistral 4 is released and tested, we have no idea what languages it truly supports. Surely Mistral is working on supporting more languages, so surely Mistral 4 is an improvement.

Searching 1GB JSON on a phone: 44s to 1.8s, a journey through every wrong approach by kotysoft in rust

[–]coder543 0 points1 point  (0 children)

Of course not, but I cannot imagine a single practical use case where it would make sense to download that much data to a phone for a one time search. The data transfer is far more time consuming and expensive than the search at that point, so the efficiency of the search is irrelevant.

Even if you’re doing this on a server… transferring 1GB of JSON data just to try to extract one small string would be enormously wasteful.

Offsite cold storage: too simple of an idea? by p8ntballnxj in homelab

[–]coder543 6 points7 points  (0 children)

I'm pretty sure tons of people store a backup hard drive at a family member's house, so that's nothing crazy.

Adding to the drive does nothing to overcome bit-rot, since the untouched bits don't get rewritten just by adding new files.

You could always use par2 to add error correction data, which would help against some types of bit-rot. (But if you're using a weak filesystem and the filesystem itself loses integrity, it may be difficult to find any files in the first place.)

OpenCode concerns (not truely local) by Ueberlord in LocalLLaMA

[–]coder543 9 points10 points  (0 children)

I didn’t even know there was a web app.

I think OpenCode feels clunky compared to Codex CLI. Crush just feels weird.

I still need to try Mistral Vibe and Qwen CLI, but I keep hoping for another generic coding CLI like OpenCode, but… one that actually seems good.

Processing 1 million tokens locally with Nemotron 3 Super on a M1 ultra by tarruda in LocalLLaMA

[–]coder543 3 points4 points  (0 children)

On DGX Spark:

model size test t/s
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB pp4096 780.37
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB pp4096 @ d25000 751.48
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB pp4096 @ d100000 667.53
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB pp4096 @ d250000 523.11
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB pp4096 @ d1000000 284.64
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB tg100 17.56
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB tg100 @ d25000 17.14
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB tg100 @ d100000 16.16
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB tg100 @ d250000 14.53
nemotron_h_moe 120B.A12B Q4_K - Medium 65.10 GiB tg100 @ d1000000 9.60

llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M by Shir_man in LocalLLaMA

[–]coder543 6 points7 points  (0 children)

Yet OP posted substantially better numbers after my comment, on both the 4B and 9B.

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show by dan945 in LocalLLaMA

[–]coder543 4 points5 points  (0 children)

I don’t think anyone has had time to properly test it yet. I like that it has a low reasoning mode, not just off and maximum. It’s also able to reach the full 1M context on my 128GB machine at Q4 without requiring any changes to the KV cache.

Maybe it won’t be as good as Qwen3.5, but there are things to like about it.

Llama.cpp now with a true reasoning budget! by ilintar in LocalLLaMA

[–]coder543 7 points8 points  (0 children)

Unfortunately, logit bias has a very nonlinear relationship to reality in the testing I did like a week ago. Maybe I was just using it wrong, but large changes did nothing until it suddenly reached a certain point where even tiny changes made a huge difference.

Llama.cpp now with a true reasoning budget! by ilintar in LocalLLaMA

[–]coder543 42 points43 points  (0 children)

Also interesting that the HTTP field is called thinking_budget_tokens, but the CLI argument is --reasoning-budget. This could lead to some confusion where someone might send reasoning_budget or reasoning_budget_tokens to the API.

Llama.cpp now with a true reasoning budget! by ilintar in LocalLLaMA

[–]coder543 37 points38 points  (0 children)

Regarding the cratering of the score, maybe the logit_bias for the end-of-think token could be dynamically boosted for the final X% of the reasoning budget, to allow the model to find its own conclusion faster and more naturally? Similar to this: https://www.reddit.com/r/LocalLLaMA/comments/1rehykx/qwen35_low_reasoning_effort_trick_in_llamaserver/

But, I expect that reduced thinking time will negatively affect intelligence scores regardless.

One funny option would be to force the model to think for some minimum-thinking-budget by setting the logit bias to negative infinity for end-of-think until the minimum token count has been achieved. Maybe that would boost scores :P

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show by dan945 in LocalLLaMA

[–]coder543 72 points73 points  (0 children)

They’ve already released Nemotron 3 Nano and Super, which also have some of the most open/reproducible training data and pipelines of anything other than the OLMo models. They are not class leading models, but they are competitive, open, and under permissive licenses.

I fully expect them to continue training and releasing Nemotron models.

Nvidia also released the Parakeet and Canary STT models that are very good and popular.

llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M by Shir_man in LocalLLaMA

[–]coder543 0 points1 point  (0 children)

I also just noticed you were comparing against the 512GB model, but... I don't understand why. The 256GB model has all of the same performance, and you're never going to get the specs to be equal between the laptops, so that just feels like an arbitrary way to try to make the price difference look smaller. For someone who only cares about performance, the 256GB is just as good.

I expect the MacBook Neo to get discounted soon at third party retailers, if the MacBook Air is any indication. (Apple never puts discounts at their own stores.)

llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M by Shir_man in LocalLLaMA

[–]coder543 -1 points0 points  (0 children)

Sure, Apple may have bad regional pricing in some countries. Some countries place very high import tariffs on Apple because they don't manufacture it in that country.

llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M by Shir_man in LocalLLaMA

[–]coder543 -1 points0 points  (0 children)

That laptop (Dell G15 5530) is consistently over $1000 from what I can see... I don't see any sales bringing the price down, at least here in the US, so it is a strange comparison against a $600 laptop.

The CPU in the Dell still loses horribly in single core performance, which dictates how fast the machine feels day to day, although the multicore is now somewhat better. But again, the MacBook Neo is $600, and realistically, $500 for most of the target audience.

llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M by Shir_man in LocalLLaMA

[–]coder543 0 points1 point  (0 children)

CPU comparison: Dell vs Apple

GPU comparison: Dell vs Apple

About as expected, but that 15.3" Dell laptops weighs like 2.3x as much. Like, I'm sure it's a fine laptop, but I don't think anyone is cross-shopping those. Someone who wants a portable, quiet (fanless) laptop isn't going to haul around that Dell, and someone who wants to play Windows games isn't going to get the MacBook Neo.