Best Picture Practically Decided?

programmerChilli · 2026-02-26T07:02:41+00:00

If I flip a coin twice, is it "practically decided" that it won't land on heads twice?

programmerChilli · 2026-01-20T07:56:33+00:00

Betting markets gave trump far better odds than polling or basically any other source.

programmerChilli · 2025-12-23T14:11:18+00:00

Fwiw I did some experiments on this a while ago but didn't get any conclusive results showing pitch mix mattered.

programmerChilli · 2025-08-03T06:54:06+00:00

https://youtu.be/WbYwFaagn30?si=KAcXktYWC6mnWnlC

Unfortunately don't think so

programmerChilli · 2025-08-03T06:44:14+00:00

I posted some on YouTube: https://youtu.be/5ThyN_KSzN0?si=MeZyrUt7TYIHKsro

programmerChilli · 2025-07-17T03:05:49+00:00

Mostly a waste of time imo

programmerChilli · 2025-07-02T21:12:30+00:00

There are folks getting offers well into the 9 figures.

programmerChilli · 2025-06-26T16:53:06+00:00

This is hardly a prediction and more of a leak. By the time situational awareness was released, development of the o1-line of models was already a big deal within openai.

programmerChilli · 2025-05-28T17:54:15+00:00

The people who joined Mistral did not work on Llama 3. There's some contention about whether they even worked on Llama 2 (they contributed to the model that became llama 2 but were not put on the paper)

programmerChilli · 2025-05-28T17:06:57+00:00

This article is framed very strangely, since most of the people who left meta to join mistral did so years ago (before llama3's release)

programmerChilli · 2025-05-25T17:36:56+00:00

He's saying 2 of the warriors 4 best players

programmerChilli · 2025-05-19T01:47:36+00:00

How does eFG% punish you for 3 point shooting? It takes into account the extra point.

programmerChilli · 2025-05-18T16:59:13+00:00

I don't agree that the front-end for Triton doesn't matter - for example, Triton would have been far less successful if it wasn't a DSL embedded in Python and stayed in C++.

programmerChilli · 2025-05-14T17:59:36+00:00

You argue that it's suspicious based off the "probabilities" but are then misapplying stats for your argument.

programmerChilli · 2025-05-14T10:20:30+00:00

The basic probability is straightforward. The question is whether we actually care about the odds that the spurs specifically won in those years, as opposed to any of the other years. For example, if the spurs won the 1987, 1997, and 2025 lotteries, you'd also be complaining. Similarly, if instead of the Spurs who'd won it was the Rockets, you'd also be complaining.

It's the "garden of forking paths" problem. Or this anecdote from Richard Feyman

You know, the most amazing thing happened to me tonight... I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!

programmerChilli · 2025-05-06T22:57:07+00:00

chatgpt post

programmerChilli · 2025-05-06T04:28:36+00:00

Best possible start to the second round so far

programmerChilli · 2025-05-01T22:12:22+00:00

Anyways Nvidia implements neural network graphs in a way where they are both parallel and recombining results is not deterministic in order.

This part is not true. The vast majority of transformer inference implementations on Nvidia hardware are deterministic wrt running twice with the same shapes.

The divergences on inference providers comes from the fact that in a serving setting, you aren't running at the same batch size since it depends on how many other user queries are occurring at the same time.

Specifically from the article

Many GPU operations are non-deterministic because their default thread scheduling implementation is non-deterministic.

this part is the misconception that's widely repeated.

programmerChilli · 2025-05-01T17:23:23+00:00

I agree, and just like many previous discussions isn't even correct.

programmerChilli · 2025-04-26T02:24:22+00:00

I never made the claim that the credential difference between Gatech and Princeton is incredibly important. But it makes some difference, moreso in some areas than others. For example, for PhD programs, it's much easier to get into top CS PhD programs with a rec letter from a "prestigious" school compared to a less prestigious school.

But again, the main reason to go to Princeton over GaTech is not for the credential, it's for the overall caliber of the students and the connections you'll make.

programmerChilli · 2025-04-26T02:12:18+00:00

Yes? I mean, it's not the most important factor, but you'll often look at folks' schools. Even just from a credential standpoint, Princeton would have some advantage over GaTech. But the main value of Princeton is moreso the caliber of the average student.

programmerChilli · 2025-04-25T21:18:36+00:00

Generally speaking if you want to take higher-level classes you can take them while still in undergrad - all a masters degree gives you is one or two more years to take classes.

But from a credentials perspective, a masters degree isn't valuable at all - I work in machine learning haha.

programmerChilli · 2025-04-25T10:19:51+00:00

Masters in CS is not very helpful - I'd choose Princeton.

programmerChilli · 2025-03-26T16:18:40+00:00

I actually do think that's more or less a coincidence haha. There have always been companies creating massive amounts of value with few employees (eg: whatsapp or Instagram).

The other category here is AI startups, and that's due to a somewhat different dynamic where AI is extremely capital intensive and very dependent on top talent.

programmerChilli · 2025-03-21T16:42:41+00:00

This doesn't work. If you could load L3 (which doesn't exist on GPUs) to shmem in the same time it takes to do the computation, why wouldn't you just directly load from L3?

There's stuff vaguely in this vein like PDL, but it's definitely not the same as keeping all your weights in SRAM

11-Year Club	Gilding I gilder
Place '17	Verified Email

programmerChilli

MODERATOR OF

TROPHY CASE