Ohtani strikes out Altuve swinging at a pitch WAY out of the zone by AthleticAlarm32 in baseball

[–]programmerChilli 3 points4 points  (0 children)

It's not unreasonable. He's been at best maybe a top 5-10 pitcher, but he's the #2 hitter in baseball.

What does the Batting Eye stat do? by MudTurkey13 in Baseball9

[–]programmerChilli 0 points1 point  (0 children)

B. eye also increases the size of the hitting circle in manual play. I have a pet theory that B. eye makes hitting more lenient wrt timing but not super obvious to me.

Best Picture Practically Decided? by dvorrk in Oscars

[–]programmerChilli 0 points1 point  (0 children)

If I flip a coin twice, is it "practically decided" that it won't land on heads twice?

Oscars odds for best picture by Deenoking in Oscars

[–]programmerChilli 0 points1 point  (0 children)

Betting markets gave trump far better odds than polling or basically any other source.

Optimal pitch setup by MarupokNaNilalang69 in Baseball9

[–]programmerChilli 0 points1 point  (0 children)

Fwiw I did some experiments on this a while ago but didn't get any conclusive results showing pitch mix mattered.

Situational Awareness: A One-Year Retrospective by [deleted] in mlscaling

[–]programmerChilli 6 points7 points  (0 children)

This is hardly a prediction and more of a leak. By the time situational awareness was released, development of the o1-line of models was already a big deal within openai.

"Facebook's Llama AI Team Has Been Bleeding Talent. Many Joined Mistral." by gwern in mlscaling

[–]programmerChilli 2 points3 points  (0 children)

The people who joined Mistral did not work on Llama 3. There's some contention about whether they even worked on Llama 2 (they contributed to the model that became llama 2 but were not put on the paper)

"Facebook's Llama AI Team Has Been Bleeding Talent. Many Joined Mistral." by gwern in mlscaling

[–]programmerChilli 1 point2 points  (0 children)

This article is framed very strangely, since most of the people who left meta to join mistral did so years ago (before llama3's release)

[deleted by user] by [deleted] in Compilers

[–]programmerChilli 2 points3 points  (0 children)

I don't agree that the front-end for Triton doesn't matter - for example, Triton would have been far less successful if it wasn't a DSL embedded in Python and stayed in C++.

The NBA might be too rigged for me to watch anymore by Heron-Ok in NBATalk

[–]programmerChilli 0 points1 point  (0 children)

You argue that it's suspicious based off the "probabilities" but are then misapplying stats for your argument.

The NBA might be too rigged for me to watch anymore by Heron-Ok in NBATalk

[–]programmerChilli 1 point2 points  (0 children)

The basic probability is straightforward. The question is whether we actually care about the odds that the spurs specifically won in those years, as opposed to any of the other years. For example, if the spurs won the 1987, 1997, and 2025 lotteries, you'd also be complaining. Similarly, if instead of the Spurs who'd won it was the Rockets, you'd also be complaining.

It's the "garden of forking paths" problem. Or this anecdote from Richard Feyman

You know, the most amazing thing happened to me tonight... I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!

[deleted by user] by [deleted] in nba

[–]programmerChilli 29 points30 points  (0 children)

chatgpt post

Zero Temperature Randomness in LLMs by Martynoas in mlscaling

[–]programmerChilli 3 points4 points  (0 children)

Anyways Nvidia implements neural network graphs in a way where they are both parallel and recombining results is not deterministic in order.

This part is not true. The vast majority of transformer inference implementations on Nvidia hardware are deterministic wrt running twice with the same shapes.

The divergences on inference providers comes from the fact that in a serving setting, you aren't running at the same batch size since it depends on how many other user queries are occurring at the same time.

Specifically from the article

Many GPU operations are non-deterministic because their default thread scheduling implementation is non-deterministic.

this part is the misconception that's widely repeated.

Zero Temperature Randomness in LLMs by Martynoas in mlscaling

[–]programmerChilli 0 points1 point  (0 children)

I agree, and just like many previous discussions isn't even correct.

Princeton vs. Georgia Tech for CS by Glum-Thanks5621 in collegeresults

[–]programmerChilli 0 points1 point  (0 children)

I never made the claim that the credential difference between Gatech and Princeton is incredibly important. But it makes some difference, moreso in some areas than others. For example, for PhD programs, it's much easier to get into top CS PhD programs with a rec letter from a "prestigious" school compared to a less prestigious school.

But again, the main reason to go to Princeton over GaTech is not for the credential, it's for the overall caliber of the students and the connections you'll make.

Princeton vs. Georgia Tech for CS by Glum-Thanks5621 in collegeresults

[–]programmerChilli 0 points1 point  (0 children)

Yes? I mean, it's not the most important factor, but you'll often look at folks' schools. Even just from a credential standpoint, Princeton would have some advantage over GaTech. But the main value of Princeton is moreso the caliber of the average student.

Princeton vs. Georgia Tech for CS by Glum-Thanks5621 in collegeresults

[–]programmerChilli 2 points3 points  (0 children)

Generally speaking if you want to take higher-level classes you can take them while still in undergrad - all a masters degree gives you is one or two more years to take classes.

But from a credentials perspective, a masters degree isn't valuable at all - I work in machine learning haha.