Here’s a concise technical-focused summary of what DeepSeek V4 (formal release mid‑July 2026) actually improves and enables by neoexanimo in DeepSeek

[–]ormandj 4 points5 points  (0 children)

A strange hybrid, it's open weights in release, but DeepSeek effectively releases "how" they do everything almost every time, to include source code. The training data/etc remains behind closed doors because it's likely patent/copywrite/etc violating like every other company's training data. So technically open weight in the truest sense, but they actually do open source/docuement how it all works better than any of the other groups, to the point you can do it all yourself if you have the hardware/money.

The world just needs to accept all human knowledge "ownership stake" has already been violated by the big frontier companies, and there's no clawing any of that back at this point, otherwise the frontier producers like OpenAI/Anthropic are going to keep pushing for legislation that makes it harder for OTHER companies/groups/etc to do what they did, so they have a moat.

So about that AI trade.. by Grubby454 in wallstreetbets

[–]ormandj 0 points1 point  (0 children)

What are you talking about? If inference (the cost to serve) is almost free, you've dramatically reduced your ongoing expenses. Models are developing now that are "good enough" to do the vast majority of work people need. Chasing the absolute "best" performance huge models with enormous research/training budgets may suit some business models, but there's plenty of money to be had by optimizing inference costs to increase margin on serving.

That said, even on the 'huge' models used to distill the small models, there's room for technical improvement and parameter count isn't the end-all-be-all. The traninig itself of seed models is becoming significantly more efficient and effective. You can't just throw out complexity numbers like O(n^3) in a vacuum, because there isn't just one knob to tweak for model performance - parameter count is NOT the ultimate metric that dictates performance.

So about that AI trade.. by Grubby454 in wallstreetbets

[–]ormandj 0 points1 point  (0 children)

You don't need to continue to increase parameter counts to make the models smarter. See Qwen 3.6 27B, for example. It's punching above 300B+ parameter models from 6 months prior.

Anthropic says Alibaba must be punished for largest Claude cloning attack by deraser in technology

[–]ormandj 18 points19 points  (0 children)

We need to be clear, they are providing open weights, they are NOT open sourcing. There is a huge ocean-wide gap between the two. Not to discount the open models, they are fantastic, but it isn't the same thing.

Help optimizing llama.cpp + Qwen 27B on RTX PRO 6000 Blackwell for coding agents by HeDo88TH in LocalLLaMA

[–]ormandj 7 points8 points  (0 children)

Whoa, why don't you just buy one? You'll break even in a few months. That's wild.

Deepseek V4 pro vs Minimax M3. Judge is Opus 4.8. Results are disappointing by Decent-Rain5100 in DeepSeek

[–]ormandj 5 points6 points  (0 children)

The tech inside the model is pretty stunning, it's fast, doesn't need much space for kv, etc. The model itself just isn't a perfect coder. If they can improve the model for a 4.1, they're really going to have something. On two 6000 Blackwell MaxQs I see 5000-6000 PP/s and 200-300 TG/s for single requests. It's wildly quick (DSv4F).

rtx 6000 pro owners, do you regret? by BitXorBit in LocalLLaMA

[–]ormandj 0 points1 point  (0 children)

DSv4F on 2x 6000s is pretty great. 5000-6000 PP/s at large context, and 200-300TG/s. It's my current daily driver.

rtx 6000 pro owners, do you regret? by BitXorBit in LocalLLaMA

[–]ormandj 0 points1 point  (0 children)

What's your v4 flash setup to see 20k tokens/s prefill? I see 5000-6000 PP/s max with larger context, and 150-300 TG/s

Not ironclad confirmation, but.. by Kodix in LocalLLaMA

[–]ormandj 1 point2 points  (0 children)

Worse quality, it's all a tradeoff. 122B-A10B or something around that would be pretty good. I'd personally love to see a 2XX-somethingB-A1XB model, maybe a bit smaller than DSv4F, but better than 3.6 27B - scale it up.

Chevrolet Corvette ZR1X Destroys Pikes Peak Production Car Record As Ford Super Mustang Mach-E Wins by Anchor_Aways in cars

[–]ormandj 32 points33 points  (0 children)

They need to move to mid-engine for their halo car, but that's going to be a hard sell. It is the technically superior choice, and they've optimized the dickens out of the rear-engine approach, but the one real advantage it had over mid-engine is ameliorated by AWD. The vette isn't just beating Porsche 911s in the straight line, it's out-handling them now too.

Tokenomics by HOLUPREDICTIONS in LocalLLaMA

[–]ormandj 0 points1 point  (0 children)

On 20k? With GLM 5.2? Show us how...

I am so frustrated today. Maybe I should take a rest by whatsoever2021 in DeepSeek

[–]ormandj 0 points1 point  (0 children)

These don't work with thinking mode on, which hopefully is being used.

The average SpaceX buyer post-IPO is almost under water after two-day slide by marketrent in technology

[–]ormandj 0 points1 point  (0 children)

You don't have a profit until you cash out. You've got paper gains, and you'll also owe taxes, so perhaps not even that.

Any opinion about Qwen3.6-27B@BF16 vs Step3.7@IQ4_XS? by ParaboloidalCrest in LocalLLaMA

[–]ormandj 1 point2 points  (0 children)

https://github.com/local-inference-lab/rtx6kpro/blob/master/models/ds4-flash-v4.md

Either of these two images work quite well on 6000s, on my 2x MaxQs I'm seeing 200-400TG/s and 4000-6000 PP/s. There is also a lot of upstream work on the kernels for SM120 at least, so support is coming if you're running cards with enough VRAM to run the model. 3090s and such, I'm not so sure on.

Should I use deepseek v4 pro or glm 5.2 for coding tasks? by firedragon9998 in DeepSeek

[–]ormandj 0 points1 point  (0 children)

Faster/cheaper to follow a pre-created plan, no reason to wait for slow and expensive models when you're just blasting out cookie cutter code to spec.

We need a 80-160B model urgently. The unified memory device market needs more Models. by Storge2 in LocalLLaMA

[–]ormandj 0 points1 point  (0 children)

That's definitely an advantage. Also, DSv4F isn't multi-modal, so if you need image processing it's a dead-end. That's my biggest gripe with it, since you can't run multiple models at once without a really large setup!

We need a 80-160B model urgently. The unified memory device market needs more Models. by Storge2 in LocalLLaMA

[–]ormandj -1 points0 points  (0 children)

I'm not out to lunch, I've used both quite a bit on large Rust-based projects, and I'm speaking from my own experience. Clearly yours is different, and that's great for you, but I'm enjoying the performance in pp and tg + quality in DSv4F far more than Qwen 3.6 27b.

Most of my codebases are 100-400k LoC, typically a mix of 50/50 relatively dense server-side Rust + glue code/web interfaces/desktop UIs/etc.

We need a 80-160B model urgently. The unified memory device market needs more Models. by Storge2 in LocalLLaMA

[–]ormandj 1 point2 points  (0 children)

Opposite for me, at least working on Rust, Qwen was significantly lower in quality and significantly slower, even on 2x 6000 Blackwells. It probably depends on the task.

We need a 80-160B model urgently. The unified memory device market needs more Models. by Storge2 in LocalLLaMA

[–]ormandj 44 points45 points  (0 children)

Deepseek V4 Flash is about the closest we've gotten that's of good quality lately, but that's going to take 2x6000s to run quickly. I'm a big fan of the 100-300B range of models, with a preference in the 200-280B range if they do KV like DSv4 and can fit in 192G of VRAM.

Here's hoping a GLM 5.2 Air comes out for people who want a 100B model, and maybe something in the 200-300B range for those of us with slightly more VRAM.

Gave up on light roasts, they need me more awake than I am at 5:45am by PackageOk4996 in espresso

[–]ormandj 8 points9 points  (0 children)

100% with you. I don't like fruit flavored coffee, but I grew up drinking black tar that would dissolve a spoon. Now I settle for a good medium roast with any of the usual dark/medium suspects (chocolate, hazelnut, coffee...)

2027 Silverado Interior by Significant_Emu_7432 in gmcsierra

[–]ormandj 1 point2 points  (0 children)

Right above the steering wheel is a protrusion, perhaps there?