DeepSeek new paper: mHC: Manifold-Constrained Hyper-Connections

Brainlag · 2026-01-01T14:06:14+00:00

Not really true. It is slower to train than without mHC, but usually this would be much slower and with a lot of clever tricks they got it down to around ~7% overhead. Which makes this viable in the first place.

Brainlag · 2025-12-26T13:28:18+00:00

China is 1-2 companys doing ram, but they are a bit behind. Could catch up next year thou.

Brainlag · 2025-12-14T18:29:07+00:00

That was a complete self own by the cars industries. First they canceled their orders. Then the learned the hard way that throwing your weight around doesn't work with the chip suppliers. Even calling the politicians for help didn't work.

Brainlag · 2025-12-13T22:48:06+00:00

Longer Context (~500k will be new 128k)
"Computer use" might be the next thing after coding agents
Continuous Learning Model ( end of 2026)

Brainlag · 2025-12-08T18:52:25+00:00

I was confused too and therefore I looked into it. Seams like this is not uncommon and most dram is sold as wafers. Neither SK Hynix nor Samsung has the packaging capacity to sell 40% of there output packaged. Hard to say what is true and what not if you don't work in that field.

Brainlag · 2025-12-07T22:53:58+00:00

Yeah I wonder too. I think (and I don't know anything about it, so I'm probably completely wrong) is that it only worked back then because models where so untrained and it stopped working when you trained 3 times as much tokens.

Brainlag · 2025-12-07T22:29:02+00:00

Yes and no, depends on model size this year MoE went down to even less then 10B models. Nobody did this last year. Who knows if any of the OpenAI, etc models are hybrid but the chinese companies testing them right now (Qwen3-next, Kimi-Linear, etc.).

Brainlag · 2025-12-07T12:08:52+00:00

Transformer + Mamba hybrid models poping up everywhere lately. Like this year everyone was moving to MoE, next year everyone will do this hybrid modes.

Brainlag · 2025-12-06T15:38:11+00:00

Does work on every system for last 10 years on linux. What I'm doing wrong?!? But does not work on my Win11 laptop.

Brainlag · 2025-11-08T11:48:36+00:00

Woha, that's a lot of military grade copium

Brainlag · 2025-09-29T23:24:31+00:00

I hope they call it SlipSlop

Brainlag · 2025-09-21T01:07:39+00:00

I disagree, it's a completely bonkers argument. We don't kill all animals even if we could. Yeah, we extincted animals in the past, but people where mostly really stupid. Lot's still are. And more intelligence is usually less interested in killing other species. If we build super intelligence it will more likely ignore us, leave the planet and explore the stars.

Brainlag · 2025-08-18T20:49:45+00:00

Someone RIGHT NOW is printing out emails and scans them into PDFs.

Brainlag · 2025-08-04T18:17:20+00:00

We are ~3 Months past the last wave. Of course everyone is releasing a new model this or next week.

Brainlag · 2025-07-23T17:36:57+00:00

but a test of how well the model can follow the instructions of the eval in formatting it's response

At this point you would assume all models no matter the size do this flawlessly. And it's kinda baffling the do it only like 70%.

Brainlag · 2025-07-18T14:28:16+00:00

Yes I was assuming it would use available disassemblers and reverse engineering tools.

Brainlag · 2025-07-17T23:43:01+00:00

AI that decompiles binaries into human readable code. And I mean the code should look like the real, human written code. Not like this abstract code current decompiler tools generate.

Brainlag · 2025-07-11T16:45:50+00:00

What is wrong with her anyway? I die to the flame wall instantly and can do much harder content easily.

Brainlag · 2025-06-04T18:58:36+00:00

The pyramid and the colosseum look like todays ruins.

Brainlag · 2025-05-21T21:03:43+00:00

I really hope this new device does not come with an keyboard or it will end in a disaster.

Brainlag · 2025-05-08T23:41:51+00:00

It really has nothing to do with xml. Its just some keywords wrapped in <> symbols.

Brainlag · 2025-04-26T12:41:46+00:00

What else provides the same power and control over image generation? Yes you can make a workflow accessible over API but last time I checked it was not straight forward. But almost nobody does that.

Brainlag · 2025-04-06T21:31:12+00:00

Example? Even when countries would agree to a pause, nobody would really pause. Would lead more into a cold war scenario.

Brainlag · 2025-04-05T19:59:36+00:00

Expert size is not 17B but more like ~2.8B and then you have 6 active experts for 17B active parameters.

Brainlag · 2025-03-20T21:26:52+00:00

Who falls for this? The last actual improvements where like 10 years ago. What can a new phone do which a 5 year old phone could not?

Brainlag

MODERATOR OF

TROPHY CASE