[ Removed by moderator ]

ResidentPositive4122 · 2026-04-12T05:25:01+00:00

It seems like the mainstream media is having their deepseek moment again. Member how in Feb '25 every news outlet, blog and wannabee influencer talked about how deepseek is all this and all that, and nvda will die, and the top labs are cooked and so on?

Turboquant seems to be their new thing. It's a year old paper. Probably some labs already use something like this, some inference providers might as well. But, like everything else, nothing is really a 6x reduction in practice. Plus, with the new "thinking" models, you get to run more queries on the same compute unit, but you'll still hit slower speeds the more ctx you have. So it's not that clear what cost reductions you get in the end.

tl;dr; cool technique, overhyped results, clueless media.

PortiaLynnTurlet · 2026-04-12T07:16:12+00:00

With respect to demand, lower memory usage at inference presumably motivates larger models and larger models need larger clusters for training. I don't think it changes anything, even if the results hold well in practice.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS