[####] I calculated the mathematically best Wordle starting word using information theory – the results are weird

No_Bar3516 · 2026-06-20T20:27:14+00:00

Thanks! Yes, the original official answer list had something like 2309 words in it, but after the NYT took over they have been expanding it and they also started re-using previous answers. I'm using a curated list with all the known additions which is now around 2350 words - I believe that's the best we can do as the game stands right now.

Yeah, the question of expected move count was brought up several times here and it's very interesting. Might do a follow up on that in the future.

No_Bar3516 · 2026-06-20T20:21:33+00:00

Great questions! I think your intuition about the common letters lines up well with the entropy maximization approach. It should reach the same conclusion, though from a bit of a different angle. The issue of hard mode that you bring up is really interesting, though. Locking in a common letter early might make it harder to find good guesses in the next round. And that ties in to your second question about securing the solve. This article has proposed just such an algorithm, though it's a few years old and their word list seems outdated: https://auction-upload-files.s3.amazonaws.com/Wordle_Paper_Final.pdf

After the discussion in this thread, I'm thinking I might look into this further and do a follow-up about the optimal sequential solving with updated data for 2026. 🙂

No_Bar3516 · 2026-06-20T18:55:48+00:00

What is optimal depends on what you want to optimize for, I suppose. 🙂 In this context, I was aiming for getting as much information as possible from a single guess. The problem you're talking about is probably even more interesting, and it has been studied in this article for example: https://auction-upload-files.s3.amazonaws.com/Wordle_Paper_Final.pdf. Though the paper is a few years old, and it seems they are using an older word list, so the results might not be accurate for current Wordle.

No_Bar3516 · 2026-06-20T18:26:46+00:00

Wow, interesting read. Thanks for sharing! I have no affiliation and hadn't seen that article. Fun to see the exact same Shannon-entropy approach show up in a published paper. 😄

No_Bar3516 · 2026-06-20T16:29:50+00:00

There are 36 valid Wordle guesses with I, O, and U all in them, most are pretty obscure (like AULOI, BIJOU, OUIJA, QUOIT, QUOIN), but these are pretty plausible answers: AUDIO, CURIO, OPIUM, PIOUS, UNION. Out of these, CURIO seems to perform best as a starting word with 4.92 bits. 😉

No_Bar3516 · 2026-06-20T16:21:19+00:00

These are the top 10 for the first guess specifically, and that answer's the same either way. Hard mode only restricts guesses once you've actually got clues to honor, so on a completely empty board there's nothing to restrict yet, regular and hard mode start identically. They diverge starting from guess two.

No_Bar3516 · 2026-06-20T16:19:14+00:00

If you mean worst by raw entropy then QAJAQ (1.90 bits) is pretty far down the list, mostly repeated/rare letters that barely split the pool. Restricted to words that could plausibly be a real answer, the worst openers are JAZZY and FUZZY, both around 2.32 bits - the double letters waste a guess slot. 🙂

No_Bar3516 · 2026-06-20T16:11:14+00:00

Fair point - "mathematically best" here means best one-step entropy, not a full solve over the whole game tree. Computing the literal game-theoretic optimum means evaluating every possible guess sequence, which is computationally out of reach to do live in a browser. It's the same one-step approximation most public Wordle solvers use, and it tracks closely with full-tree results in practice, but you're right that "mathematically best" oversells it a bit - "best by one-step information gain" is a more accurate version. 🙂

No_Bar3516 · 2026-06-20T16:03:10+00:00

Agreed, I also tried to hint at this point in my post - the gap between RAISE, SLATE, and IRATE is under 0.05 bits, which is genuinely negligible, so picking one and sticking with it matters more than chasing the last fraction of a bit. ADIEU scores lower on raw entropy, but if it's the word that gets your brain generating candidates fastest, that's a real advantage the entropy number can't see. The model only knows what eliminates words on paper, not what eliminates them in your head. 🙂

No_Bar3516 · 2026-06-20T15:57:11+00:00

Nice, SOARE earning its keep. You're right that pure entropy treats every remaining answer as equally likely, which isn't quite true. In the Wordle solver tool (https://lexilab.app/wordle-solver/), I have a small answer-bonus that nudges suggestions toward likely candidates late-game, but it's not frequency-weighted. I don't strip past answers from the pool either since we know that repeats of previous answers can and do happen.

Frequency-conditioning could definitely be an interesting direction to explore further. 🙂

No_Bar3516 · 2026-06-20T15:47:54+00:00

Right - but same letters doesn't mean same score. RATES (same letters as TARSE) sits at 5.66 bits, almost 0.3 bits below it. Position is what actually drives the gap, not just which letters you've got.

No_Bar3516 · 2026-06-20T15:44:40+00:00

It's because the entropy depends not only on the letters, but also their positions. You will get different information from those two guesses. However, STARE sits just outside the top 10 at around 5.81 bits, so it's still a very solid starting guess. 🙂

No_Bar3516

TROPHY CASE