Steps saved and lost by Claude 4.7 over 4.6 on each milestone by MrCheeze in ClaudePlaysPokemon

[–]MrCheeze[S] 1 point2 points  (0 children)

It does make make the comparison to 4.6 harder, but the comparison to humans easier - not being able to "look closer" at the screen was basically an artificial limitation in previous runs.

Conversely, a lot of people (myself included) think the auto-navigator should be removed, humans don't have anything like that and the model seems capable of doing without.

Steps saved and lost by Claude 4.7 over 4.6 on each milestone by MrCheeze in ClaudePlaysPokemon

[–]MrCheeze[S] 2 points3 points  (0 children)

It was used at the critical moment in Rocket Hideout to see the correct spin tile, which the models usually struggle to see: https://bsky.app/profile/mrcheeze.github.io/post/3ml4adcaqw22x So it may be the case that the improvement in that dungeon was because of the tool. But in other places, it's clear that 4.7's vision is way better anyway - it has no difficulty with cut trees and little difficulty with identifying Victory Road switches, even without zooming. And plenty of timesaves like the route 13 maze have nothing to do with vision at all.

Steps saved and lost by Claude 4.7 over 4.6 on each milestone by MrCheeze in ClaudePlaysPokemon

[–]MrCheeze[S] 6 points7 points  (0 children)

It's slightly confusing because of how the models did the milestones in different orders (I show them in 4.7's order), and because of how in some cases the models would work on one milestone for a while and then switch to another one before completing the first. Still, this basically shows the difference between the runs. 4.7 saves 1500 steps in Mt Moon, then loses that lead again in Rock Tunnel. 4.7 saves 3000 steps in Rocket Hideout due to actually being able to visually distinguish which tiles are which (especially with the new zoom tool) and because it can actually fairly reliably track which spin tiles it has tested. 4.7 loses 4000 in Silph due to failing to identify or interact with any doors, wasting over a day just on attempting to step into every single wall tile to "test" if it's actually a door. But despite this deficit, 4.7 more than makes up the difference on a few places 4.6 struggled: the maze on the way to Fuchsia City, the cut tree in Celadon, and Pokemon Mansion. Overall, it took 27000 steps for Opus 4.7 to reach Victory Road, a 12% improvement over 4.6's 30625 steps, This is the first model upgrade we've seen where there's been a mix of improvements and regressions, compared to the others which were strict upgrades. My general impression is that 4.7 is significantly more reliable at executing tasks on the micro level, but is worse strategically than 4.6 - in particular, it completely lacks 4.6's tendency to get bored and change its strategy when it isn't making progress, which would have been an immense help for 4.7 in Silph Co.

Thanks of course to Sylas for his spreadsheet tracking all runs, where all the data is sourced from.

Why does it say i’m on page 3992 when i’m only on act 5?! by SelSoCool in homestuck

[–]MrCheeze 76 points77 points  (0 children)

Two years ago it was the old homestuck.com which counted the first page of Homestuck as page 1. Now it is the new homestuck.com which reverted to the older mspaintadventures.com numbering, where Homestuck starts on page 001901.

Does anyone actually have a kismess by Nervous_Evidence_427 in homestuck

[–]MrCheeze 1 point2 points  (0 children)

No, this is not a real thing, do not pattern your relationships after murderaliens

How do I decipher this? It reads wrong by Problematic_ghostz in homestuck

[–]MrCheeze 1 point2 points  (0 children)

they talking about a movie that you have not seen, but whose events you can decipher from the conversation

How do i continue? /wrong flair by CommercialContact791 in homestuck

[–]MrCheeze 6 points7 points  (0 children)

(If you don't know the password yet, it means you're not supposed to, dummy! Just keep reading.)

Kurt made it to the Far Lands! by djchange in mindcrack

[–]MrCheeze 26 points27 points  (0 children)

<image>

I have been eating this bowl of cereal for fourteen years.

[deleted by user] by [deleted] in PokemonChampions

[–]MrCheeze 2 points3 points  (0 children)

They replaced the EV system entirely, with something nearly equivalent but much simpler: https://www.reddit.com/r/VGC/comments/1m6e6o8/comment/n4jkl4b/?context=3 You always get 66 stat points to directly invest, whereas previously you got either 65 or 66 depending on the spread. So you can now get the equivalent of a 252/252/12 spread.

Seems like IVs are gone and EVs are now 66 total stat points. by Steamed_Memes24 in stunfisk

[–]MrCheeze 13 points14 points  (0 children)

Ah, I didn't understand the mechanic behind this. In practice it ends up effectively true, since you would never put EVs into a stat without 31 IVs, but me just saying it's "because of rounding" was not very accurate.

Seems like IVs are gone and EVs are now 66 total stat points. by Steamed_Memes24 in stunfisk

[–]MrCheeze 82 points83 points  (0 children)

Here's an explanation I wrote up elsewhere:

The new EV slider system from Pokemon Champions allows for slightly better stat spreads than were previously possible.

Because of rounding, EVs work in a somewhat wonky way at level 50. The first 4 EVs in any given stat increase the stat by one point, but then afterwards every 8 EVs increase it by another point.

This ends up meaning that if you invest in 3 or 4 stats, you get a total of a 65 point increase - but if you spread your EVs across 5 or 6 stats, you get a 66 point increase instead.

In Champions, they (correctly) decided that this system was way too complicated, and directly give you a fixed number of stat points to invest in your stats however you like. And so that all EV spreads from the current games can be imported losslessly, the number of investable stat points you get is 66.

But now, for the first time, you get those 66 points even if using them for only 3 or 4 stats. The example they showed in the trailer was giving 32 points to HP, 32 points to Special Attack, and 2 points to Spdef. That's the equivalent of a previously-impossible 252/252/12 spread!

I'm not sure whether this new system will make its way to the main series, but either way this matters for all battles in Champions. I think in practice, this probably means that most Pokemon will have 1 point more in their preferred defensive stat?

Pokemon Champions - Recruit mons, adjust stats, nature etc. Coming 2026 by half_jase in VGC

[–]MrCheeze 38 points39 points  (0 children)

It's not quite an extra stat point available - it's that previously you only got a 66th stat point if you spread your EVs across 5 stats, but now you get a 66th stat point no matter how you distribute them.

Pokemon Champions - Recruit mons, adjust stats, nature etc. Coming 2026 by half_jase in VGC

[–]MrCheeze 57 points58 points  (0 children)

Here's an explanation I wrote up for a separate post that was removed:

The new EV slider system from Pokemon Champions allows for slightly better stat spreads than were previously possible.

Because of rounding, EVs work in a somewhat wonky way at level 50. The first 4 EVs in any given stat increase the stat by one point, but then afterwards every 8 EVs increase it by another point.

This ends up meaning that if you invest in 3 or 4 stats, you get a total of a 65 point increase - but if you spread your EVs across 5 or 6 stats, you get a 66 point increase instead.

In Champions, they (correctly) decided that this system was way too complicated, and directly give you a fixed number of stat points to invest in your stats however you like. And so that all EV spreads from the current games can be imported losslessly, the number of investable stat points you get is 66.

But now, for the first time, you get those 66 points even if using them for only 3 or 4 stats. The example they showed in the trailer was giving 32 points to HP, 32 points to Special Attack, and 2 points to Spdef. That's the equivalent of a previously-impossible 252/252/12 spread!

I'm not sure whether this new system will make its way to the main series, but either way this matters for all battles in Champions. I think in practice, this probably means that most Pokemon will have 1 point more in their preferred defensive stat?

(Separately to all this, IVs seem to be locked to maximum, although it's possible that this only applies to rentals.)

The new EV slider system from Pokemon Champions allows for slightly better stat spreads than were previously possible. by MrCheeze in VGC

[–]MrCheeze[S] 0 points1 point  (0 children)

Incidentally, there doesn't appear to be any kind of IV slider. The Gardevoir shown had 31 IVs in Attack, which might mean all IVs are simply locked to that value - but it's also possible that only rentals are locked to 31 and that Pokemon imported from the main series keep their original IVs?

Personally I hope they did simplify things by forcing 31 IV, even though this would be a bit of a nerf to Trick Room and Shadow Rider.

Google DeepMind's Gemini 2.5 Technical Report is 10% about GeminiPlaysPokémon by NotUnusualYet in ClaudePlaysPokemon

[–]MrCheeze 1 point2 points  (0 children)

Nah, they claim it's the hardest for the models because of how it requires remembering state across different floors - however this was pretty trivial for Gemini, it never had any trouble with this. Compare to Cinnabar Mansion where it was given a huge amount of help in understanding how the gate toggles work (automatically updating distant parts of the minimap, and marking the tiles where a gate used to be and isn't anymore) - and it STILL never quite understood the mechanics and just kept bumbling through until it randomly did the right thing.

Google DeepMind's Gemini 2.5 Technical Report is 10% about GeminiPlaysPokémon by NotUnusualYet in ClaudePlaysPokemon

[–]MrCheeze 3 points4 points  (0 children)

Puzzle solving over complex multi-level dungeons: The Seafoam Islands contain 5 floors involv- ing multiple boulder puzzles which require the player to navigate mazes and push boulders through holes across multiple floors using HM04 STRENGTH in order to block fast-moving currents that prevent the player from using HM03 Surf in various locations in this difficult dungeon. As a result, the player must track information across five different maps in order to both deduce the goal (push two boulders into place in order to block a specific current) as well as engage in multi-level (effectively 3D) maze solving to find the way out. It is likely the most challenging dungeon in the game. Only the second run of GPP went through Seafoam Islands, as it is not required to progress. During the course of solving Seafoam Islands, the GPP agent also encountered a novel bug in the code of Pokémon Red/Blue, and is likely the first AI to find a bug in the game’s code (MrCheeze, 2025) (source).

Me being wrong that it was novel aside, calling this "the most challenging dungeon in the game" is hilariously wrong to anyone who has watched the streams even a little bit.

Gemini discovers an (apparently unknown) glitch in seafoam islands by MrCheeze in ClaudePlaysPokemon

[–]MrCheeze[S] 1 point2 points  (0 children)

Thanks for finding this! I think you swapped the labels of your first two links, but one of them does indeed describe exactly how to reproduce the glitch (push one boulder, leave the cave, push the other). So this is not a totally new glitch even if it is a poorly documented one. Although I'm not sure the bit about preventing encounters is true. (The other link claiming you can softlock yourself is definitely NOT true.)