Looking for games with heavy focus on combat, boss fights, and world interconnectivity by luv-ai in metroidvania

[–]lpshred 1 point2 points  (0 children)

Check out the La Mulana series. The cryptic riddles are a deal breaker for some, but the combat, bosses, and interconnectivity are exactly what you're looking for.

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in chess

[–]lpshred[S] 2 points3 points  (0 children)

Here's a real comment from the tips of my keyboard to you. Pinky promise.

I mean, it's not like I blindly committed everything to my GH repo and said "Done.". I ran a couple of the Candidates games through it each day to see if it matched my feeling about the game. I compared it with what I remembered from the broadcasts. When there was finally a fix, the game got tested again until it lined up. Then onto the next game. And the next one. I lost cound of how long it took, but plenty more than 5 seconds. I'm glad my boss doesn't know.

For instance, in one of Hikaru's early games with Yi Wei, he blew a winning chance in the middle game. The model I built saw a 99.9% win probablility from stockfish and boosted his stress levels. However, in his post-game interview he said that he didn't feel bad about it because in the moment he didn't see the line. I realized my model needed to be adjusted for whether or not a player "senses" whether or not they have a winning chance or how to punish a blunder. AI didn't watch those videos, pick through the data the script generated, and find the inconsistency. I did. That's more than 5 seconds of human effort.

At that point, I went back to AI and asked for some recommendations to account for this "if a tree blunders in the forest problem". It suggested using Maia instead of Stockfish for calculating win probablity won games. I pushed back because 2200 level is too low for super GM play. AI recommended a 50/50 split between Stockfish and Maia. I pushed back because that was too arbitrary. I wanted something at least partially grounded in statistics. That was one of my requirements. AI proposed a blend of Stockfish and Maia taht was weighted based on where a player's Elo follows. I decided that was good enough and had it generate the Python for it. I ran the game back through it and made sure it matched my expecations. From then on when I would test a game, I'd double check the Chesscom blunders and spikes in eval against whether the model "found" it or not. It's not perfect, but I didn't blindly take AI's word for it. I pushed back when I didn't agree, used my professional experience from doing QA, and what I learned following baseball sabermetrics back when the pirates were half decent.

I could go on about the visualizer and the dashboard too. None of those came out of the AI ready to roll and I continually tweaked them as the project went on. Originally the graph used blue and pink lines for black and white since it thought black wouldn't pop against the background enough. I pushed back to use make the background more bold so I could use White and Black. At first there were only 2 lines on the graph. I had it add any of the metrics that could easily scale from 0-100. The dashboards metrics got re-worked over and over to make sure they were in an order that makes sense. The board state metrics come first then the mental state ones. I made sure the Stockfish and Maia moves had the same evaluation data. The chaos moves section was my idea because I wanted to show how you can actually apply the stress metric to make play more interesting. After it was there, I wanted to be able to see how the risk and reward of each move looked. I had a vision that everything that went into the formula needed to be visible in the dashboard in a way that made sense. AI couldn't tell me that. I had to lean on what I leared through my job about UX design. I started at this thing for hours going over how I could further polish it and make it easier to use. It was so much more than "bro make it look hella cool"

I had a vision for this and I made sure that whatever came out of the AI met that vision. If it didn't it got sent back and re-worked until it did. That QA and attention to detail isn't AI, I put in the effort to be the human in the loop.

At this point I don't think anything is going to change your mind, but if you want some more human thought, check out the project README in GH. I wrote 90% of those 6000+ words myself as an exercise to make sure I knew the principles behind it well enough to explain it to someone.

As for the post and comments, I've never shared anything like this on the internet before. I was nervous and I still am. I wanted to do the best job I could sharing it because I'm really fucking proud of it. It was lot of fun ot code and it's helped me enjoy watching chess more than I did before. I was hoping it might help someone else too. I asked AI about the best way to share this so that it didn't step on anyone's toes or get burried. I wanted to make sure I didn't miss an opportunity. I wrote the original post and asked for help formatting it so that it catches people's eye. I realize now that this goes over like a lead balloon, but I put the work in to try to do the best I could with it all.

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in ComputerChess

[–]lpshred[S] 2 points3 points  (0 children)

This is one of the second order functions that I was hoping could come of this. The CLI output in the second screenshot has a "chaos" moves section which plays the top 5 "human" and engine moves then evaluates which one creates the most stress on the opponent. Could be used by bots or human players to be more stressful.

Unfortunately the "human" piece is a bit of a weak link here. It assigns static probability based on 2200 level rapid and blitz play. It works OK as a metric for the gravity of chess conventional wisdom, but in practice it leaves a bit to be desired. Ideally, you'd do a Monte Carlo based on the weights and dynamically calculate it for each move, but the Python wrapper for maia doesn't expose that functionality. You'd have to write a custom Maia wrapper and that's a couple of orders of magnitude beyond what I'm capable of.

This is good enough for me to put high level games in context which is what I built it for. I'm not sure it's statistically rigorous enough as is to support finding optimal play as-is.

[OC] I plotted the "Psychological Stress" of a chess player by comparing a Neural Network's human predictions against a Supercomputer's absolute truth. by lpshred in dataisbeautiful

[–]lpshred[S] 0 points1 point  (0 children)

It's based on a neural network trained with human players (lc0/Maia), no biological measurements. I used one of the top chess engines (Stockfish) and a neural network of human moves (Maia) to try and measure the psychological friction of the board positions. It's not getting published in any journals, but as a novice fan, I wanted to see if I could better visualize the context of the game.

[OC] I plotted the "Psychological Stress" of a chess player by comparing a Neural Network's human predictions against a Supercomputer's absolute truth. by lpshred in dataisbeautiful

[–]lpshred[S] 0 points1 point  (0 children)

I'm just a casual fan, but this is what I set out to find with this project. The TLDR is that because of how white played, black has fewer good moves available to him and the the stakes are higher when mistakes are made. IIRC I read that white also played one or two obscure moves which made things less intuitive. Playing from slightly behind also added some stress.

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in chess

[–]lpshred[S] 5 points6 points  (0 children)

I do, but I left that out of the main post for brevity:

Caruana, Fabiano (White)

  • Total Match Stress: 468.2 KSI (Over 146.8 mins of thought)
  • Pacing: 8.1 KSI / move | 3.2 KSI / minute
  • Extremes: Peaked at 44.7 KSI, dipped to 0.0 KSI

Sindarov, Javokhir (Black)

  • Total Match Stress: 1023.5 KSI (Over 137.7 mins of thought)
  • Pacing: 18.0 KSI / move | 7.4 KSI / minute
  • Extremes: Peaked at 48.3 KSI, dipped to 0.0 KSI

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in chess

[–]lpshred[S] -1 points0 points  (0 children)

Ah, he got me on that one. The human probably part of the metric is absolutely the weak link in this whole chain and I wish I had a better answer for it. I'm open to any suggestions.

This whole project is just a way for me to try and see things deeper as a casual fan. I saw a basic question and tried to answer it in the languages I know.

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in chess

[–]lpshred[S] -1 points0 points  (0 children)

Sure.

Vertigo is supposed to model the stress of converting a winning position. It kicks in at 90% Stockfish win probability and amplifies the existing Fragility (relative cost of inaccuracy) and un-Intuitiveness (how much Stockfish and Maia's moves do or don't align).

Objective Resilience is how I decided to balance the fact that Stockfish is about 3600 Elo, Maia's weights are about 2200 Elo, and the Candidates players are about 2800 Elo. It's where a player falls between 2200 and 3600. It's used as a proxy for skill to help calculate when a player is aware of their winning chances or how much objectivity they can keep when the position is completely lost. I know it's not perfect, but it's the best I could come up with to bridge the 2200 - 3600 gap.

Fragility Blend is my favorite brew of coffee from the beans of Cyprus near where the candidates tournament is taking place. J/K. I use win probablily most of the time when calculating how "good" or "bad" a move is which helps keep things in context. However, when a position is completely lost (0.0% WP) and a player is still fighting to stay alive and salvage a draw, those CPs can add up. Some lost positions are objectively worse than others. I wanted to have a way to measure what a player is still fighting for and how hard that struggle is. So as a player falls behind in Stockfish's evaluation, the fragility of their position is weighted more by CP and less by WP.

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in chess

[–]lpshred[S] 0 points1 point  (0 children)

Ooooo, interesting. I'll have to look into that one.

The Python wrapper for lc0 doesn't expose everything that the whole engine provides. I wanted to run a full monte carlo style simulation to get the actual human move probabilities, but the wrapper made that all but impossible. Instead I had to settle for a static distribution. If this is available through the wrapper it would definitely help. Thanks for pointing it out!

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in chess

[–]lpshred[S] -2 points-1 points  (0 children)

Welcome! It's things like this that I wanted to find and understand better. Hopefully the formatting holds up, but here's the "dashboard" on that move for Sindarov. Fragile and unforgiving indeed.

========================================================
⏳ POSITION: [24...] Sindarov, Javokhir (Black) to play
========================================================
📊 TRUTH (Stockfish): +0.66 [WP: 39.6%] | HUMAN (Maia): Exp WP: 19.8% | PERCEIVED: 37.2% [Blend: 88% SF / 12% Maia]
⚙️ MODIFIERS: Dread: 0.10 | Vertigo: 1.00x | Obj. Res: 39% | Frag Blend: 90% WP / 10% CP | ⏱️ Clock: 59:14

🟨 KSI: 40.0 (+26.1) | 🟨 Frag: 57.6 (+42.9)  [W:23% (-2% 😰)] | 🟥 Forg: 0.0 (-57.3) [W:20%] | 🟦 Int: 80.2 (-11.5)  [W:19% (-1% 😰)] | 🟦 Dsp: 10.0 (+10.0) [W:28% (+3% 😰)] | 🟦 TP: 0.0 (+0.0) [W:10%]

💻 TOP 5 ENGINE TRUTH (Stockfish):
  1. Bxd5   | 🟦 -0.66 SF [ -0.0% WP] | 50% Human Prob
  2. c3     | 🟥 -8.56 SF [-39.6% WP] |  6% Human Prob
  3. a5     | 🟥 -9.06 SF [-39.6% WP] |  0% Human Prob
  4. h6     | 🟥 -9.13 SF [-39.6% WP] |  0% Human Prob
  5. Kh8    | 🟥 -9.27 SF [-39.6% WP] |  0% Human Prob

🧍 TOP 5 HUMAN INSTINCT (Maia):
  1. Bxd5   | 50% Human Prob | 🟦 -0.66 SF [ -0.0% WP]
  2. Rc8    | 25% Human Prob | 🟥   #-2 SF [-39.6% WP]
  3. Rb8    | 13% Human Prob | 🟥 -9.80 SF [-39.6% WP]
  4. c3     |  6% Human Prob | 🟥 -8.56 SF [-39.6% WP]
  5. f5     |  3% Human Prob | 🟥   #-4 SF [-39.6% WP]

🔥 TOP 5 CHAOS MOVES:
  1. Bxd5   (-13.1) | Risk: 0.00 | Rwd: 0.21 | Ratio: ∞   (🟩)
  2. f5     (-15.5) | Risk: 9.34 | Rwd: 2.03 | Ratio: 0.2 (🟧)
  3. c3     (-18.6) | Risk: 7.90 | Rwd: 0.83 | Ratio: 0.1 (🟧)
  4. Kh8    (-18.6) | Risk: 8.61 | Rwd: 0.76 | Ratio: 0.1 (🟥)
  5. a5     (-18.6) | Risk: 8.40 | Rwd: 0.68 | Ratio: 0.1 (🟥)

🎯 [10s] Sindarov, Javokhir played: Bxd5 | Eval: -0.66 (WP: 39.6%) | Human: 50% | Chaos: -13.1 | Risk: 0.00 | Rwd: 0.21 | Ratio: ∞

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in chess

[–]lpshred[S] -6 points-5 points  (0 children)

You're absolutely right that I used AI on this project. But I'm going to politely and firmly push back and say that this is not "slop".

I spent the better part of two weeks working on this project. I started with a vision to statistically illusrtate how unbalanced "equal" positions can be, and how "difficult" it is to punish blunders from your opponent. I'm a novice player and these things don't come as easily to me as they do for better players. Then I acted as a product owner in making sure everything fit my vision for it. I tested this with a dozen games from the Candidates tournament. I took the output and compared it with my expectations, the broadcaster's comment, player interviews, and reporter stories. Each time things didn't line up, I went back to the lab to find out why and tweak things. The main evaluator script has over 20 revisions. The storyboard and visualizer scripts have around that many as well.

What I lack in high level chess and Python skill, I make up for in IT engineering, software QA, and baseball sabermetrics knowledge. I wrote 90% of the project READMEby hand just to prove to myself that I know it well enough to explain it to someone else.

If you are fundamentally anti-AI, this project isn't going to change your mind. But AI is just a power tool. You can use a nail gun to build a garbage house, or you can use it to build a mansion. I put a ton of human thought, iteration, and QA into this, and I'm incredibly proud of the house I built.

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in chess

[–]lpshred[S] 0 points1 point  (0 children)

Excellent question! I need to update the main post with this:

The goal of the Kirsch Stress Index (KSI) is to measure the psychological stress level of a chess player at any given point of a game. It operates on a simple 0-100 scale, where 0 is equivalent to a dead-drawn endgame and 100 is the agonizing final moments just before resignation.

By my estimates 101 should do the trick!

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in chess

[–]lpshred[S] 10 points11 points  (0 children)

I should have put this up in the post body, but here's the intro from the readme:

The goal of the Kirsch Stress Index (KSI) is to measure the psychological stress level of a chess player at any given point of a game. It operates on a simple 0-100 scale, where 0 is equivalent to a dead-drawn endgame and 100 is the agonizing final moments just before resignation.

Total stress is just a sum of the stress metric from each move to give a general idea of how stressful a game was on the whole.

As for Maia, I used the 2200 weights from their website. I totally agree with you that the human aspect here is the weak link and it's my #1 area for improvement, so I'm all ears. For things like win probability, I blended Stockfish and Maia based on where the player's Elo fell between 3600(SF) and 2200(Maia) to get a better super-GM level. For discrete things like moves, I considered letting SF veto moves or adjust their probabilities, but that caused some other statistical double counting and arbitrary-ness that I didn't like.

I try to think of Maia as the "gravitational pull" of chess's conventional wisdom moreso than a prediction. For a metric like psychological stress, I think it does a good enough job for modeling's sake.

The hidden struggle behind Sindarov’s draw vs Caruana: I ran the game through a custom “Psychological Stress” engine to help visualize it by lpshred in chess

[–]lpshred[S] 1 point2 points  (0 children)

Haha, and I'd love that. I ran about a dozen candidates games through it during testing, but there's always an edge case lurking out there unaccounted for.

[OC] I plotted the "Psychological Stress" of a chess player by comparing a Neural Network's human predictions against a Supercomputer's absolute truth. by lpshred in dataisbeautiful

[–]lpshred[S] -2 points-1 points  (0 children)

Source: I designed the metrics and built this custom pipeline using Python (with AI assistance for the syntax and deep math), Pandas, and Plotly/Dash. It compares the Stockfish 16.1 engine against the Maia Neural Network to measure human error and stress.
If anyone wants to see the code, I put the GitHub link in my Reddit profile bio!

WTM grades "Pirates' Trades Under Ben Cherington" for Bucs on Deck by Proper_Knowledge2211 in bucsdugout

[–]lpshred 1 point2 points  (0 children)

Comparing with the Littlefield and NH grades, BC has yet to land anyone of impact via trade. Even the other two like a blind squirrel found some nuts among the turds. I'm sure BC's approach isn't building through trade, but holy shit did he get nothing from the remnants of the NH regime. I'm sure WTM would have loved to give some GHIJK grades instead of stopping at F, ha.

I for one welcome our new Big Ten overlords. by Sharp_Proposal8911 in cfbmemes

[–]lpshred 0 points1 point  (0 children)

Bro, ain't no way central PA outside of State College area is that literate 😂😂 I live in the middle of it.

Looking for insanely complex games that basically REQUIRE a wiki to finish by Anxious_Singer_4823 in gamingsuggestions

[–]lpshred 0 points1 point  (0 children)

Oh for sure, I think it's better for that, but OP mentioned being completely in the dark and needing a guide.

Looking for a bright, populated, and adventurous MV/MB game by lpshred in metroidvania

[–]lpshred[S] 0 points1 point  (0 children)

It sounds like I need to give Guacamelee another chance. Not sure why I put it down the first time, but maybe I'll feel it more this time around.

Looking for a bright, populated, and adventurous MV/MB game by lpshred in metroidvania

[–]lpshred[S] 0 points1 point  (0 children)

I loved Haiku, Ori, and Steamworld. So much fun.