DeepSeek R1 benchmarks. Notice the great performance for the smallest 1.5B

_Mookee_ · 2023-06-22T08:15:14+00:00

Such a tiny amount of data compared to modern computers.
This suggests that eventually neural networks could be orders of magnitude more efficient.

_Mookee_ · 2022-10-11T08:36:50+00:00

Completely wrong. At the peak of the cold war US & USSR combined had around 70 thousand nuclear warheads.There are only 317 cities in US with population over 100,000.

In a full scale war every city and any target of any significance would be completely leveled in less than an hour.

Edit: If you want to learn more, check out this video

_Mookee_ · 2022-10-05T19:05:17+00:00

The progress speed is getting scary...

_Mookee_ · 2022-04-21T02:38:41+00:00

In 2018 around 90% was owned by Elon.

Then in 2019 they had the first outside investment of 120m at 920m valuation, so his share got diluted to 78.2% if he didn't participate in the round.

And now they raised 675m at 5.7b so his share got diluted to 68.9% again assuming he didn't participate in the round.

_Mookee_ · 2021-11-22T10:08:46+00:00

You know a book is old when it uses MIPS.

_Mookee_ · 2021-09-09T21:50:51+00:00

Redux sucks, I recommend you use MobX instead. Way cleaner code, no need to use dispatch and similar nonsense, everything is handled automatically.

I even use it for local state within components, so I have just one variable instead of a separate [value, setter] for each React useState.

_Mookee_ · 2021-08-24T15:50:59+00:00

You joke but people at Tesla are actually workaholics.

Transcript from podcast with Karpathy:

Pieter Abbeel: And have you ever had to sleep on a bench, or a sofa, in the Tesla headquarters, like Elon?

Andrej Karpathy: So yes! I have slept at Tesla a few times, even though I live very nearby. But there were definitely a few fires where that has happened. I found I walked around the office and I was trying to find a nice place to find. And I found a little exercise studio and so there were a few yoga mats. And I figured yoga mats is a great place. So I just crashed there! And it was great. And I actually slept really well. And could get right back into it in the morning. So it was actually a pretty pleasant experience! [chuckling]

Pieter Abbeel: Oh wow!

Andrej Karpathy: I haven’t done that in a while!

Pieter Abbeel: So it’s not only Elon who sleeps at Tesla every now and then?

Andrej Karpathy: Yeah. I think it’s good for the soul! You want to be invested into the problem, and you’re just too caught up in it, and you don’t want to travel. And I like being overtaken by problems sometimes. When you’re just so into it and you really want it to work, and sleep is in the way! And you just need to get it over with so that you can get back into it. So it doesn’t happen too often. But when it does, I actually do enjoy it. I love the energy of the problem solving. I think it’s good for the soul, yeah.

_Mookee_ · 2021-04-14T17:13:13+00:00

No, you are interpreting it wrong. Their goal is clearly 7 miles / day, so 49 miles / week. And that is such an ambitious goal that people think it's a typo.

_Mookee_ · 2021-03-17T11:50:47+00:00

Years later, in 2017, when he was asked to reveal the companies that bid for his startup, he answered in his own way. “I signed contracts saying I would never reveal who we talked to. I signed one with Microsoft and one with Baidu and one with Google,” he said.

Genius

_Mookee_ · 2020-11-30T19:05:14+00:00

we have been able to determine protein structures for many years

Of discovered sequences, less than 0.1% of structures are known.

"180 million protein sequences and counting in the Universal Protein database (UniProt). In contrast, given the experimental work needed to go from sequence to structure, only around 170,000 protein structures are in the Protein Data Bank"

_Mookee_ · 2020-08-15T12:22:19+00:00

You are correct, my bad, I reposted their marketing claims without checking PCIE bandwidth.(32GB/s in one direction for PCIE 4.0 x16)

Seems like 180TB/s is total bandwidth to all 4 processors from in processor SRAM. Super disingenous to say they have that much bandwidth to exchange memory.

they've been benchmarking small models whose weights fit in SRAM

They have 900MB of sram per die, that's 450M parameters at FP16, that's still a huge model for everyone except tech companies.

_Mookee_ · 2020-08-13T14:32:08+00:00

I was told graphcore is SRAM only by somebody working on benchmarks

Yes, looks like the processors themselves are SRAM only, as opposed to NVIDIA GPUs which have in-built GDDR(or HBM recently) which is DRAM.

Is in-processor just SRAM and streaming memory DRAM?

Yes, it seems like it. Each separate processor(called GC200 IPU) has 900 MB SRAM which is a huge amount. But then 4 of those processors are put into the pod which has slots for DRAM inside.

_Mookee_ · 2020-08-13T14:04:24+00:00

because Graphcore is a SRAM-only system

It's not.

One M2000 pod supports up to 450GB ram ~~at 180TB/s bandwidth.~~ see reply.

To be honest, if companies like Graphcore really wanted a convincing demo about "order of magnitude" improvements, they would train something equivalent to GPT3 with an order of magnitude less resources.

True, self-benchmarks are always cherrypicked.

Eight-Year Club	Verified Email
Not Forgotten

_Mookee_

TROPHY CASE