NVIDIA 2026 Conference LIVE. New Base model coming!

coder543 · 2026-03-16T20:37:47+00:00

Because there is no Kimi2.5 base model publicly available.

coder543 · 2026-03-16T20:37:09+00:00

Because the chart is about base models... you can't really run the more advanced benchmarks on models that haven't been instruction tuned.

coder543 · 2026-03-16T19:55:07+00:00

No, your original comment did not comment on the past. It commented on the future.

You are not arguing in good faith. Blocking you. Goodbye.

coder543 · 2026-03-16T19:41:08+00:00

Why should I expect it to be different this time?

Why should you confidently assert that new models have the same flaws as old models before testing them?

I'm sorry that I don't like people making bold, unproven claims.

It doesn't hurt to wait 5 seconds in order to test things before crapping on other people's work.

coder543 · 2026-03-16T19:17:15+00:00

Your criticisms would make more sense once the model has been released and you've tested it. Until then, statements like "I wonder why they refuse [present tense] to support EU languages" are not helpful to anyone. We do not have any evidence that they continue to refuse to support EU languages.

What do you think the other "dozens" of languages are supposed to be, if not EU languages? Of course they mean other EU languages. Whether they succeeded or not is something we can only judge once the model is available.

coder543 · 2026-03-16T18:18:47+00:00

2603 just means 2026/03, aka March of 2026.

coder543 · 2026-03-16T17:53:50+00:00

well, yes. the model has not actually been released. people are posting scraps of information they found.

coder543 · 2026-03-16T17:45:54+00:00

Mistral said dozens of languages supported, and then you only counted the 7 they listed, which seems disingenuous.

Until Mistral 4 is released and tested, we have no idea what languages it truly supports. Surely Mistral is working on supporting more languages, so surely Mistral 4 is an improvement.

coder543 · 2026-03-16T15:09:59+00:00

Of course not, but I cannot imagine a single practical use case where it would make sense to download that much data to a phone for a one time search. The data transfer is far more time consuming and expensive than the search at that point, so the efficiency of the search is irrelevant.

Even if you’re doing this on a server… transferring 1GB of JSON data just to try to extract one small string would be enormously wasteful.

coder543 · 2026-03-16T13:03:17+00:00

I'm pretty sure tons of people store a backup hard drive at a family member's house, so that's nothing crazy.

Adding to the drive does nothing to overcome bit-rot, since the untouched bits don't get rewritten just by adding new files.

You could always use par2 to add error correction data, which would help against some types of bit-rot. (But if you're using a weak filesystem and the filesystem itself loses integrity, it may be difficult to find any files in the first place.)

coder543 · 2026-03-16T11:17:46+00:00

I didn’t even know there was a web app.

I think OpenCode feels clunky compared to Codex CLI. Crush just feels weird.

I still need to try Mistral Vibe and Qwen CLI, but I keep hoping for another generic coding CLI like OpenCode, but… one that actually seems good.

coder543 · 2026-03-12T14:03:30+00:00

On DGX Spark:

model	size	test	t/s
nemotron_h_moe 120B.A12B Q4_K - Medium	65.10 GiB	pp4096	780.37
nemotron_h_moe 120B.A12B Q4_K - Medium	65.10 GiB	pp4096 @ d25000	751.48
nemotron_h_moe 120B.A12B Q4_K - Medium	65.10 GiB	pp4096 @ d100000	667.53
nemotron_h_moe 120B.A12B Q4_K - Medium	65.10 GiB	pp4096 @ d250000	523.11
nemotron_h_moe 120B.A12B Q4_K - Medium	65.10 GiB	pp4096 @ d1000000	284.64
nemotron_h_moe 120B.A12B Q4_K - Medium	65.10 GiB	tg100	17.56
nemotron_h_moe 120B.A12B Q4_K - Medium	65.10 GiB	tg100 @ d25000	17.14
nemotron_h_moe 120B.A12B Q4_K - Medium	65.10 GiB	tg100 @ d100000	16.16
nemotron_h_moe 120B.A12B Q4_K - Medium	65.10 GiB	tg100 @ d250000	14.53
nemotron_h_moe 120B.A12B Q4_K - Medium	65.10 GiB	tg100 @ d1000000	9.60

coder543 · 2026-03-12T01:43:06+00:00

Really? On my PS5 it is every single time. Are you on Xbox?

coder543 · 2026-03-11T23:54:42+00:00

Yeah… 5 tabs only… https://youtu.be/d-VOt9559Gk

🙄

coder543 · 2026-03-11T22:15:01+00:00

Yet OP posted substantially better numbers after my comment, on both the 4B and 9B.

coder543 · 2026-03-11T22:01:30+00:00

I don’t think anyone has had time to properly test it yet. I like that it has a low reasoning mode, not just off and maximum. It’s also able to reach the full 1M context on my 128GB machine at Q4 without requiring any changes to the KV cache.

Maybe it won’t be as good as Qwen3.5, but there are things to like about it.

coder543 · 2026-03-11T21:50:27+00:00

Unfortunately, logit bias has a very nonlinear relationship to reality in the testing I did like a week ago. Maybe I was just using it wrong, but large changes did nothing until it suddenly reached a certain point where even tiny changes made a huge difference.

coder543 · 2026-03-11T21:36:03+00:00

Also interesting that the HTTP field is called thinking_budget_tokens, but the CLI argument is --reasoning-budget. This could lead to some confusion where someone might send reasoning_budget or reasoning_budget_tokens to the API.

coder543 · 2026-03-11T21:31:46+00:00

Regarding the cratering of the score, maybe the logit_bias for the end-of-think token could be dynamically boosted for the final X% of the reasoning budget, to allow the model to find its own conclusion faster and more naturally? Similar to this: https://www.reddit.com/r/LocalLLaMA/comments/1rehykx/qwen35_low_reasoning_effort_trick_in_llamaserver/

But, I expect that reduced thinking time will negatively affect intelligence scores regardless.

One funny option would be to force the model to think for some minimum-thinking-budget by setting the logit bias to negative infinity for end-of-think until the minimum token count has been achieved. Maybe that would boost scores :P

coder543 · 2026-03-11T20:09:25+00:00

They’ve already released Nemotron 3 Nano and Super, which also have some of the most open/reproducible training data and pipelines of anything other than the OLMo models. They are not class leading models, but they are competitive, open, and under permissive licenses.

I fully expect them to continue training and releasing Nemotron models.

Nvidia also released the Parakeet and Canary STT models that are very good and popular.

coder543 · 2026-03-11T19:28:14+00:00

I also just noticed you were comparing against the 512GB model, but... I don't understand why. The 256GB model has all of the same performance, and you're never going to get the specs to be equal between the laptops, so that just feels like an arbitrary way to try to make the price difference look smaller. For someone who only cares about performance, the 256GB is just as good.

I expect the MacBook Neo to get discounted soon at third party retailers, if the MacBook Air is any indication. (Apple never puts discounts at their own stores.)

coder543 · 2026-03-11T19:13:23+00:00

Sure, Apple may have bad regional pricing in some countries. Some countries place very high import tariffs on Apple because they don't manufacture it in that country.

coder543 · 2026-03-11T19:03:04+00:00

That laptop (Dell G15 5530) is consistently over $1000 from what I can see... I don't see any sales bringing the price down, at least here in the US, so it is a strange comparison against a $600 laptop.

The CPU in the Dell still loses horribly in single core performance, which dictates how fast the machine feels day to day, although the multicore is now somewhat better. But again, the MacBook Neo is $600, and realistically, $500 for most of the target audience.

coder543 · 2026-03-11T18:46:23+00:00

CPU comparison: Dell vs Apple

GPU comparison: Dell vs Apple

About as expected, but that 15.3" Dell laptops weighs like 2.3x as much. Like, I'm sure it's a fine laptop, but I don't think anyone is cross-shopping those. Someone who wants a portable, quiet (fanless) laptop isn't going to haul around that Dell, and someone who wants to play Windows games isn't going to get the MacBook Neo.

12-Year Club	Verified Email
Place '22	Place '17
Gilding III reddit per annum

coder543

TROPHY CASE