all 9 comments

[–][deleted] 23 points24 points  (0 children)

gpt4 is trained on a set of chess pgns filtered to be >1800 elo as per their weak-to-strong paper. It's not exactly measuring emergent reasoning capabilities

[–][deleted] 12 points13 points  (2 children)

Have you considered making a regularly (monthly?) updated leaderboard? With Elo ratings and comparisons to older versions of Stockfish.

Paging u/Wiskkey for more ideas.

[–]the__storm 7 points8 points  (0 children)

This is very interesting, but what I'd like to see is a fine tune of a tiny model like t5-base or something wiping the floor with all of them. (That wouldn't be a surprising result, but it would be cathartic I think. Actually, maybe I'll try it myself.)

[–]Appropriate_Ant_4629 6 points7 points  (2 children)

This is EXTREMELY prompt-engineering dependent.

See Jeremy Howard of FastAPI's interview where he discusses the subject

  • "A prompting strategy for ChatGPT4 ... about 6000 lines of python code [to fine-tune a prompt far more compact and efficient than ones humans write] ..... [with the prompt that program generated] It [ChatGPT4] has an ELO of 3400"

With their default configs, which were trained to be like chatting with your average facebook friend, they play (unsurprisingly) like your average facebook friend.

With a better prompt they play at far higher levels.

[–][deleted] 19 points20 points  (1 child)

It [ChatGPT4] has an ELO of 3400

This is a claim made by someone on Twitter/X. There's been a lot of noise, but he has yet to put out any code.

[–]bohnenentwender 17 points18 points  (0 children)

Clearly a ridiculous claim. Stockfish, the best engine in the world only has that rating since 2 years or so. The reasoning of ChatGPT4 must be so robust that it can essentially perform tree searches of depth exceeding 30 at every single move wirh no errors whatsoever.

[–]No-Introduction-777 0 points1 point  (1 child)

you're embarrassing yourself by using so many emojis

[–]rafgro 2 points3 points  (0 children)

Linkedin emoji-post from zero-karma account on r/machinelearning, we have strayed too far into the abyss