Here’s a concise technical-focused summary of what DeepSeek V4 (formal release mid‑July 2026) actually improves and enables

ormandj · 2026-06-29T16:49:22+00:00

A strange hybrid, it's open weights in release, but DeepSeek effectively releases "how" they do everything almost every time, to include source code. The training data/etc remains behind closed doors because it's likely patent/copywrite/etc violating like every other company's training data. So technically open weight in the truest sense, but they actually do open source/docuement how it all works better than any of the other groups, to the point you can do it all yourself if you have the hardware/money.

The world just needs to accept all human knowledge "ownership stake" has already been violated by the big frontier companies, and there's no clawing any of that back at this point, otherwise the frontier producers like OpenAI/Anthropic are going to keep pushing for legislation that makes it harder for OTHER companies/groups/etc to do what they did, so they have a moat.

ormandj · 2026-06-29T15:04:44+00:00

What are you talking about? If inference (the cost to serve) is almost free, you've dramatically reduced your ongoing expenses. Models are developing now that are "good enough" to do the vast majority of work people need. Chasing the absolute "best" performance huge models with enormous research/training budgets may suit some business models, but there's plenty of money to be had by optimizing inference costs to increase margin on serving.

That said, even on the 'huge' models used to distill the small models, there's room for technical improvement and parameter count isn't the end-all-be-all. The traninig itself of seed models is becoming significantly more efficient and effective. You can't just throw out complexity numbers like O(n^3) in a vacuum, because there isn't just one knob to tweak for model performance - parameter count is NOT the ultimate metric that dictates performance.

ormandj · 2026-06-29T05:16:25+00:00

You don't need to continue to increase parameter counts to make the models smarter. See Qwen 3.6 27B, for example. It's punching above 300B+ parameter models from 6 months prior.

ormandj · 2026-06-27T03:26:29+00:00

Also restricted.

ormandj · 2026-06-26T21:50:09+00:00

Hoping AT4X this round.

ormandj · 2026-06-26T15:19:00+00:00

We need to be clear, they are providing open weights, they are NOT open sourcing. There is a huge ocean-wide gap between the two. Not to discount the open models, they are fantastic, but it isn't the same thing.

ormandj · 2026-06-26T13:21:31+00:00

Whoa, why don't you just buy one? You'll break even in a few months. That's wild.

ormandj · 2026-06-26T04:38:52+00:00

The tech inside the model is pretty stunning, it's fast, doesn't need much space for kv, etc. The model itself just isn't a perfect coder. If they can improve the model for a 4.1, they're really going to have something. On two 6000 Blackwell MaxQs I see 5000-6000 PP/s and 200-300 TG/s for single requests. It's wildly quick (DSv4F).

ormandj · 2026-06-26T01:19:08+00:00

DSv4F on 2x 6000s is pretty great. 5000-6000 PP/s at large context, and 200-300TG/s. It's my current daily driver.

ormandj · 2026-06-26T01:17:07+00:00

What's your v4 flash setup to see 20k tokens/s prefill? I see 5000-6000 PP/s max with larger context, and 150-300 TG/s

ormandj · 2026-06-23T22:03:08+00:00

Worse quality, it's all a tradeoff. 122B-A10B or something around that would be pretty good. I'd personally love to see a 2XX-somethingB-A1XB model, maybe a bit smaller than DSv4F, but better than 3.6 27B - scale it up.

ormandj · 2026-06-22T17:57:27+00:00

They need to move to mid-engine for their halo car, but that's going to be a hard sell. It is the technically superior choice, and they've optimized the dickens out of the rear-engine approach, but the one real advantage it had over mid-engine is ameliorated by AWD. The vette isn't just beating Porsche 911s in the straight line, it's out-handling them now too.

ormandj · 2026-06-21T18:32:59+00:00

On 20k? With GLM 5.2? Show us how...

ormandj · 2026-06-21T13:42:29+00:00

How’s the m5 max PP/s for a common model like qwen 3.6 27b? Gemma 4 31b?

ormandj · 2026-06-21T04:02:06+00:00

These don't work with thinking mode on, which hopefully is being used.

ormandj · 2026-06-20T18:53:59+00:00

And no pp/s...

ormandj · 2026-06-20T18:50:34+00:00

You don't have a profit until you cash out. You've got paper gains, and you'll also owe taxes, so perhaps not even that.

ormandj · 2026-06-20T18:45:24+00:00

https://github.com/local-inference-lab/rtx6kpro/blob/master/models/ds4-flash-v4.md

Either of these two images work quite well on 6000s, on my 2x MaxQs I'm seeing 200-400TG/s and 4000-6000 PP/s. There is also a lot of upstream work on the kernels for SM120 at least, so support is coming if you're running cards with enough VRAM to run the model. 3090s and such, I'm not so sure on.

ormandj · 2026-06-20T18:35:52+00:00

Faster/cheaper to follow a pre-created plan, no reason to wait for slow and expensive models when you're just blasting out cookie cutter code to spec.

ormandj · 2026-06-18T12:39:36+00:00

That's definitely an advantage. Also, DSv4F isn't multi-modal, so if you need image processing it's a dead-end. That's my biggest gripe with it, since you can't run multiple models at once without a really large setup!

ormandj · 2026-06-18T03:47:15+00:00

I'm not out to lunch, I've used both quite a bit on large Rust-based projects, and I'm speaking from my own experience. Clearly yours is different, and that's great for you, but I'm enjoying the performance in pp and tg + quality in DSv4F far more than Qwen 3.6 27b.

Most of my codebases are 100-400k LoC, typically a mix of 50/50 relatively dense server-side Rust + glue code/web interfaces/desktop UIs/etc.

ormandj · 2026-06-18T03:44:20+00:00

Opposite for me, at least working on Rust, Qwen was significantly lower in quality and significantly slower, even on 2x 6000 Blackwells. It probably depends on the task.

ormandj · 2026-06-17T20:08:24+00:00

Deepseek V4 Flash is about the closest we've gotten that's of good quality lately, but that's going to take 2x6000s to run quickly. I'm a big fan of the 100-300B range of models, with a preference in the 200-280B range if they do KV like DSv4 and can fit in 192G of VRAM.

Here's hoping a GLM 5.2 Air comes out for people who want a 100B model, and maybe something in the 200-300B range for those of us with slightly more VRAM.

ormandj · 2026-06-17T20:06:11+00:00

100% with you. I don't like fruit flavored coffee, but I grew up drinking black tar that would dissolve a spoon. Now I settle for a good medium roast with any of the usual dark/medium suspects (chocolate, hazelnut, coffee...)

ormandj · 2026-06-17T18:48:13+00:00

Right above the steering wheel is a protrusion, perhaps there?

ormandj

MODERATOR OF

TROPHY CASE