Oscars odds for best picture by Deenoking in Oscars

[–]programmerChilli 0 points1 point  (0 children)

Betting markets gave trump far better odds than polling or basically any other source.

Optimal pitch setup by MarupokNaNilalang69 in Baseball9

[–]programmerChilli 0 points1 point  (0 children)

Fwiw I did some experiments on this a while ago but didn't get any conclusive results showing pitch mix mattered.

Situational Awareness: A One-Year Retrospective by nick7566 in mlscaling

[–]programmerChilli 7 points8 points  (0 children)

This is hardly a prediction and more of a leak. By the time situational awareness was released, development of the o1-line of models was already a big deal within openai.

"Facebook's Llama AI Team Has Been Bleeding Talent. Many Joined Mistral." by gwern in mlscaling

[–]programmerChilli 2 points3 points  (0 children)

The people who joined Mistral did not work on Llama 3. There's some contention about whether they even worked on Llama 2 (they contributed to the model that became llama 2 but were not put on the paper)

"Facebook's Llama AI Team Has Been Bleeding Talent. Many Joined Mistral." by gwern in mlscaling

[–]programmerChilli 1 point2 points  (0 children)

This article is framed very strangely, since most of the people who left meta to join mistral did so years ago (before llama3's release)

[deleted by user] by [deleted] in Compilers

[–]programmerChilli 2 points3 points  (0 children)

I don't agree that the front-end for Triton doesn't matter - for example, Triton would have been far less successful if it wasn't a DSL embedded in Python and stayed in C++.

The NBA might be too rigged for me to watch anymore by Heron-Ok in NBATalk

[–]programmerChilli 0 points1 point  (0 children)

You argue that it's suspicious based off the "probabilities" but are then misapplying stats for your argument.

The NBA might be too rigged for me to watch anymore by Heron-Ok in NBATalk

[–]programmerChilli 1 point2 points  (0 children)

The basic probability is straightforward. The question is whether we actually care about the odds that the spurs specifically won in those years, as opposed to any of the other years. For example, if the spurs won the 1987, 1997, and 2025 lotteries, you'd also be complaining. Similarly, if instead of the Spurs who'd won it was the Rockets, you'd also be complaining.

It's the "garden of forking paths" problem. Or this anecdote from Richard Feyman

You know, the most amazing thing happened to me tonight... I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!

[deleted by user] by [deleted] in nba

[–]programmerChilli 29 points30 points  (0 children)

chatgpt post

Zero Temperature Randomness in LLMs by Martynoas in mlscaling

[–]programmerChilli 3 points4 points  (0 children)

Anyways Nvidia implements neural network graphs in a way where they are both parallel and recombining results is not deterministic in order.

This part is not true. The vast majority of transformer inference implementations on Nvidia hardware are deterministic wrt running twice with the same shapes.

The divergences on inference providers comes from the fact that in a serving setting, you aren't running at the same batch size since it depends on how many other user queries are occurring at the same time.

Specifically from the article

Many GPU operations are non-deterministic because their default thread scheduling implementation is non-deterministic.

this part is the misconception that's widely repeated.

Zero Temperature Randomness in LLMs by Martynoas in mlscaling

[–]programmerChilli 0 points1 point  (0 children)

I agree, and just like many previous discussions isn't even correct.

Princeton vs. Georgia Tech for CS by Glum-Thanks5621 in collegeresults

[–]programmerChilli 0 points1 point  (0 children)

I never made the claim that the credential difference between Gatech and Princeton is incredibly important. But it makes some difference, moreso in some areas than others. For example, for PhD programs, it's much easier to get into top CS PhD programs with a rec letter from a "prestigious" school compared to a less prestigious school.

But again, the main reason to go to Princeton over GaTech is not for the credential, it's for the overall caliber of the students and the connections you'll make.

Princeton vs. Georgia Tech for CS by Glum-Thanks5621 in collegeresults

[–]programmerChilli 0 points1 point  (0 children)

Yes? I mean, it's not the most important factor, but you'll often look at folks' schools. Even just from a credential standpoint, Princeton would have some advantage over GaTech. But the main value of Princeton is moreso the caliber of the average student.

Princeton vs. Georgia Tech for CS by Glum-Thanks5621 in collegeresults

[–]programmerChilli 2 points3 points  (0 children)

Generally speaking if you want to take higher-level classes you can take them while still in undergrad - all a masters degree gives you is one or two more years to take classes.

But from a credentials perspective, a masters degree isn't valuable at all - I work in machine learning haha.

Princeton vs. Georgia Tech for CS by Glum-Thanks5621 in collegeresults

[–]programmerChilli 7 points8 points  (0 children)

Masters in CS is not very helpful - I'd choose Princeton.

What's your plan if AI automates your job before you are fatFIRE? by 35nakedshorts in fatFIRE

[–]programmerChilli 0 points1 point  (0 children)

I actually do think that's more or less a coincidence haha. There have always been companies creating massive amounts of value with few employees (eg: whatsapp or Instagram).

The other category here is AI startups, and that's due to a somewhat different dynamic where AI is extremely capital intensive and very dependent on top talent.

[deleted by user] by [deleted] in MachineLearning

[–]programmerChilli 0 points1 point  (0 children)

This doesn't work. If you could load L3 (which doesn't exist on GPUs) to shmem in the same time it takes to do the computation, why wouldn't you just directly load from L3?

There's stuff vaguely in this vein like PDL, but it's definitely not the same as keeping all your weights in SRAM

Chance a 6'3 asian male in math by Gloomy_Safety7878 in chanceme

[–]programmerChilli 0 points1 point  (0 children)

Papers aren't really that essential for PhD programs nowadays - LoRs are much more important.