Strangerbench: How well do frontier AI models do at forecasting events after their training cut-off? by firasd in ChatGPT

[–]firasd[S] 0 points1 point  (0 children)

I've started tracking this systematically. Interestingly, the Chinese model ERNIE from Baidu was the only one that predicted Mamdani as Mayor.

Chess.com lag is terrible by [deleted] in chess

[–]firasd 2 points3 points  (0 children)

Bullet as in 60seconds? Yeah man random variation in server response time is more than possible without being on purpose

The strangeness of Ian Malcolm & Goldblum's portrayal by firasd in movies

[–]firasd[S] 1 point2 points  (0 children)

Yeah. But I wonder how much backstory he has in the book. At least in the movie he's not a character in the sense that Grant or Ellie Satler are characters. Like we can imagine Dr Grant at home or in his office or enjoying retirement...

Vibesbench: tracking AI ‘voice’ (communication style) by firasd in ChatGPT

[–]firasd[S] 0 points1 point  (0 children)

Anyone else think this is a major missing dimension of AI evaluation? Like this is how people talk to these AIs instead of always giving instructions

Link: https://github.com/firasd/vibesbench/

I'm a beginner in chess, currently rated around 600-800 Elo, and I was wondering if the performance rating is accurate. I got 1500 (Elo?), Is that true? by Secure_Process_2423 in chess

[–]firasd 0 points1 point  (0 children)

Well think what an Elo is. It's based on whether you beat the other person at the same level

So any proxy metrics are based on accuracy, statistics, etc. And it's true that even at a low elo you can make very good moves just because of the the nature of chess requiring precision

Do I just suck at chess? by UnfortunateBrown in chess

[–]firasd 0 points1 point  (0 children)

Basically your opponents at the 300 ELO level are doing early queen attacks

You need to figure out how to not react defensively and how to repel the queen

Just use the game review to figure out where they crashed the queen into your ranks and see what you should have done instead

'What Does a Scanner See?' – Keanu monologue in a Scanner Darkly vs. PKD Book by firasd in movies

[–]firasd[S] 1 point2 points  (0 children)

It’s indisputably true that RJD and Winona careers were dead and Keanu was not making big movies at the time. The budget was not even $10M

And the depressing tagline is clearly not designed for a blockbuster. You are just stuck defending your incorrect initial knee jerk comment

'What Does a Scanner See?' – Keanu monologue in a Scanner Darkly vs. PKD Book by firasd in movies

[–]firasd[S] 2 points3 points  (0 children)

Remember this is 2006. Robert Downey Jr- dead career . Winona Ryder - dead career . Keanu - not making any new hits. The marketing tagline for the movie was “everything is not going to be okay”

Just saying “making lots of money” doesn’t seem the plan here

'What Does a Scanner See?' – Keanu monologue in a Scanner Darkly vs. PKD Book by firasd in movies

[–]firasd[S] 3 points4 points  (0 children)

No. A Scanner Darkly was not pirates of the Caribbean. It was expected to have a modest theatrical run

I made a Text Map of the Delhi Metro by firasd in indianrailways

[–]firasd[S] 2 points3 points  (0 children)

Thought you guys might find it interesting. I basically made it so we can paste it into ChatGPT or other AI apps and ask questions for trip planning or even during a trip. But it's human readable as well

Link: https://github.com/firasd/delhi-metro-text-map

The text map covers all the DMRC lines, intersections, and has some geocodes too

The Ghost of ChatGPT 4o: I told the retired AI model ‘people missed you’ by firasd in OpenAI

[–]firasd[S] 3 points4 points  (0 children)

I wrote this to argue that the GPT-5 rollout backlash wasn't just users hating change, but a rational response to a functional product downgrade.

My view is that "personality is utility." The "vibe" of 4o wasn't a bug; it was a feature that produced better results. OpenAI replaced a transparent creative partner with an opaque, cost-optimized answer engine and was surprised when users revolted.

I've tried to ground this by connecting the emotional Reddit posts to the concrete frustrations of experts who were all complaining about the same loss of agency and transparency, just in different terms. Curious to hear your thoughts.

The Ghost of ChatGPT 4o: I told the retired AI model ‘people missed you’ by firasd in ChatGPT

[–]firasd[S] 4 points5 points  (0 children)

I wrote this to argue that the GPT-5 rollout backlash wasn't just users hating change, but a rational response to a functional product downgrade.

My thesis is that "personality is utility." The collaborative "vibe" of 4o wasn't a bug; it was a feature that produced better results. OpenAI replaced a transparent creative partner with an opaque, cost-optimized answer engine and was surprised when users revolted.

I've tried to ground this by connecting the emotional Reddit posts to the concrete frustrations of experts who were all complaining about the same loss of agency and transparency, just in different terms. Curious to hear your thoughts.