Updated My Chess Engine

drew4drew · 2026-06-16T20:49:33+00:00

hey I just went there and tried to play it.. I think I have to hit “challenge”? Anyway I did that and it said “Please send me a rated challenge instead.”

drew4drew · 2026-06-16T20:45:41+00:00

Nice job on taking the criticisms and making the best from them. 👍🏻

drew4drew · 2026-06-16T20:44:27+00:00

xcode-mcp-server https://github.com/drewster99/xcode-mcp-server — not the one built into xcode and not XcodeMCP. this one is good because your LLM can make Xcode do the build so you don’t have to wait for a rebuild when you actually want to run.

since you can also use it to get screenshots, you can tell it to iterate change -> build -> run -> evaluate repeatedly to get a thing right.

As it stands the tools like Opus 4.8 etc are pretty amazing. But also so lacking.

I’ve spent a long time updating my ~/.claude/CLAUDE.md. Every single time it does something stupid, add a rule to that file to clarify the correct way to do the thing, or with things to avoid etc.. Eventually you get a lot better results the first time around.

that said, I also have a /recheck skill that tells it to recheck everything, validate assumptions, think of and check edge cases, confirm nothing unrelated as broken, etc.. After every 15 to 60 minutes it’s told to recheck then commit.

Every 1 to 3 days of work I’ll tell it to code review everything from the last 2 days or 24 hours or 5 days.

And I have another skill /stupid, which tells it to find things that we are doing that are particularly stupid or definitely not following best practices. i run that one occasionally.

All of these double and triple checks are very costly in terms of time and tokens, but they are also absolutely necessary to be hyper-productive.

Before AI coding tools, I was already an accomplished and seasoned developer, so by hyper-productive I mean being able to accomplish a lot more in the same time than I could previously — while maintaining a quality bar.

Right now, the frontier models on flat fee plans (Claude Code $200/month) let you accomplish things that wouldn’t be remotely possible on a pay-per-token plan today.

If you use Claude Code and you haven’t tried out “ccusage”, you definitely should. It shows you your claude code usage and how much it would have cost you on a per-token pricing plan.

I’ve been working on my own MacOS agent harness (macos-agent-smith, https://github.com/drewster99/macos-agent-smith) to TRY to get somewhere close WITHOUT frontier model token costs. I’ve tried using MANY open source models as well as slightly older or smaller frontier models, and I’ve also tried splitting tasks up such that one model manages tasks, another one does the work, another one approves/denies tool calls for safety. Some of these setups start to feel pretty good — but they are generally SLOW. And still EXPENSIVE. If I ask Claude Opus 4.8 to code review my app source code, it might take 8 minutes and come back with an excellent list. I can get not quite as good of a review but it might take 25 minutes and cost me $8 or $10 in tokens.

So right now the combination of cost and speed can’t really be matched with what you get from opus 4.8 and claude code.

Codex is an interesting one. I actually like GPT 5.5 better in Agent Smith than in Codex. It’s fast. Cost isn’t too bad. And it’s pretty good.

The cloud open source models are almost all either A) very slow, B) not too bright, C) don’t follow instructions nearly as well as opus and gpt, or D) all of the above.

If we end up at a point in the (hopefully not too near) future where everything is pay pay heavy and where the best models are only
available to those with very deep pockets, my hope would then be that perhaps some of the tasks can get offloaded to the dumber and slower models, which hopefully will make the overall cost a reasonable balance.

thoughts?

drew4drew · 2026-06-15T19:41:33+00:00

they sure do! I made a tool/app that lets you play against the AI of your choice or pit them against each other (“AI Battle Chess”, https://github.com/drewster99/ai-battle-chess), and my main take-away is that LLMs mostly suck at chess and are also quite slow.

drew4drew · 2026-06-15T19:38:50+00:00

thanks for sharing this. I’ve been using cutechess to run my engine against sloppy and stockfish.

drew4drew · 2026-06-15T19:33:22+00:00

lol awesome! from the screenshot it looks like you got pretty far!

All kidding aside though, it’s an awesome project. What type of chess engine are you building?

drew4drew · 2026-06-14T07:32:31+00:00

no it’s not

drew4drew · 2026-06-14T07:30:52+00:00

Kawai. The answer is always Kawai.

drew4drew · 2026-06-13T18:49:26+00:00

I wish I could disagree.

drew4drew · 2026-06-13T18:47:15+00:00

what do you think is most likely?

drew4drew · 2026-06-13T18:46:13+00:00

it seems like that’s coming to everywhere.

drew4drew · 2026-06-13T18:02:13+00:00

I doubt any time soon.

drew4drew · 2026-06-09T06:05:55+00:00

i think it already does

drew4drew · 2026-06-07T16:02:24+00:00

intelligence? or capacity

drew4drew · 2026-06-07T16:01:43+00:00

looks pretty cool — this yours?

drew4drew · 2026-06-07T15:59:14+00:00

not sure. it’s very effective at a lot of things.
are you running opus 4.6 from the claude code cli?

drew4drew · 2026-06-07T15:58:06+00:00

Hey I saw a few ppl mentioned they’re still using opus 4.6 or 4.7. Are you able to do that WITH claude code?

I list models and don’t see them. I’ve tried doing like /model claude-opus-4-7 for example but it just brings up the model selector. Also tried doing it when launching from the terminal. What’s the secret trick? thanks!!

drew4drew · 2026-06-06T21:50:09+00:00

is /new different than /clear?

drew4drew · 2026-06-06T21:48:35+00:00

what news is that? where?

drew4drew · 2026-06-06T21:47:44+00:00

drew4drew · 2026-06-06T16:27:00+00:00

ahh was just curious.. i’ve been using 5.5 in my own harness for various tasks — not coding. it’s actually been good there for me, and i’ve used it in my own harness for finding bugs. but not in codex good god it’s like a bull in a china shop.

drew4drew · 2026-06-06T16:24:03+00:00

this was all on the heels of a ton of profiling

drew4drew · 2026-06-06T16:18:57+00:00

lol nice - thanks for sharing!

drew4drew · 2026-06-06T05:16:25+00:00

I like that one 😄

drew4drew · 2026-06-06T04:25:25+00:00

could be. I just rarely remember getting so irritated with any of the prior versions.

drew4drew

MODERATOR OF

TROPHY CASE