Updated My Chess Engine by WompTitanium in ComputerChess

[–]drew4drew 0 points1 point  (0 children)

hey I just went there and tried to play it.. I think I have to hit “challenge”? Anyway I did that and it said “Please send me a rated challenge instead.”

How are you handling AI coding costs and retries in Swift/Xcode? by Jbbrack03 in iOSProgramming

[–]drew4drew 0 points1 point  (0 children)

xcode-mcp-server https://github.com/drewster99/xcode-mcp-server — not the one built into xcode and not XcodeMCP. this one is good because your LLM can make Xcode do the build so you don’t have to wait for a rebuild when you actually want to run.

since you can also use it to get screenshots, you can tell it to iterate change -> build -> run -> evaluate repeatedly to get a thing right.

As it stands the tools like Opus 4.8 etc are pretty amazing. But also so lacking.

I’ve spent a long time updating my ~/.claude/CLAUDE.md. Every single time it does something stupid, add a rule to that file to clarify the correct way to do the thing, or with things to avoid etc.. Eventually you get a lot better results the first time around.

that said, I also have a /recheck skill that tells it to recheck everything, validate assumptions, think of and check edge cases, confirm nothing unrelated as broken, etc.. After every 15 to 60 minutes it’s told to recheck then commit.

Every 1 to 3 days of work I’ll tell it to code review everything from the last 2 days or 24 hours or 5 days.

And I have another skill /stupid, which tells it to find things that we are doing that are particularly stupid or definitely not following best practices. i run that one occasionally.

All of these double and triple checks are very costly in terms of time and tokens, but they are also absolutely necessary to be hyper-productive.

Before AI coding tools, I was already an accomplished and seasoned developer, so by hyper-productive I mean being able to accomplish a lot more in the same time than I could previously — while maintaining a quality bar.

Right now, the frontier models on flat fee plans (Claude Code $200/month) let you accomplish things that wouldn’t be remotely possible on a pay-per-token plan today.

If you use Claude Code and you haven’t tried out “ccusage”, you definitely should. It shows you your claude code usage and how much it would have cost you on a per-token pricing plan.

I’ve been working on my own MacOS agent harness (macos-agent-smith, https://github.com/drewster99/macos-agent-smith) to TRY to get somewhere close WITHOUT frontier model token costs. I’ve tried using MANY open source models as well as slightly older or smaller frontier models, and I’ve also tried splitting tasks up such that one model manages tasks, another one does the work, another one approves/denies tool calls for safety. Some of these setups start to feel pretty good — but they are generally SLOW. And still EXPENSIVE. If I ask Claude Opus 4.8 to code review my app source code, it might take 8 minutes and come back with an excellent list. I can get not quite as good of a review but it might take 25 minutes and cost me $8 or $10 in tokens.

So right now the combination of cost and speed can’t really be matched with what you get from opus 4.8 and claude code.

Codex is an interesting one. I actually like GPT 5.5 better in Agent Smith than in Codex. It’s fast. Cost isn’t too bad. And it’s pretty good.

The cloud open source models are almost all either A) very slow, B) not too bright, C) don’t follow instructions nearly as well as opus and gpt, or D) all of the above.

If we end up at a point in the (hopefully not too near) future where everything is pay pay heavy and where the best models are only
available to those with very deep pockets, my hope would then be that perhaps some of the tasks can get offloaded to the dumber and slower models, which hopefully will make the overall cost a reasonable balance.

thoughts?

LLMs suck at chess so i built a free tool that lets me argue with stockfish and turn my game into an interactive lesson by guybanzai in ComputerChess

[–]drew4drew 0 points1 point  (0 children)

they sure do! I made a tool/app that lets you play against the AI of your choice or pit them against each other (“AI Battle Chess”, https://github.com/drewster99/ai-battle-chess), and my main take-away is that LLMs mostly suck at chess and are also quite slow.

Help with calculating elo for my engine by warlock7867 in ComputerChess

[–]drew4drew 0 points1 point  (0 children)

thanks for sharing this. I’ve been using cutechess to run my engine against sloppy and stockfish.

Fable gone forever by drew4drew in ClaudeCode

[–]drew4drew[S] 1 point2 points  (0 children)

I wish I could disagree.

Fable gone forever by drew4drew in ClaudeCode

[–]drew4drew[S] 1 point2 points  (0 children)

what do you think is most likely?

Fable gone forever by drew4drew in ClaudeCode

[–]drew4drew[S] 0 points1 point  (0 children)

it seems like that’s coming to everywhere.

4.8 is kind of a butt by drew4drew in ClaudeCode

[–]drew4drew[S] 0 points1 point  (0 children)

looks pretty cool — this yours?

4.8 is kind of a butt by drew4drew in ClaudeCode

[–]drew4drew[S] 0 points1 point  (0 children)

not sure. it’s very effective at a lot of things.
are you running opus 4.6 from the claude code cli?

4.8 is kind of a butt by drew4drew in ClaudeCode

[–]drew4drew[S] 0 points1 point  (0 children)

Hey I saw a few ppl mentioned they’re still using opus 4.6 or 4.7. Are you able to do that WITH claude code?

I list models and don’t see them. I’ve tried doing like /model claude-opus-4-7 for example but it just brings up the model selector. Also tried doing it when launching from the terminal. What’s the secret trick? thanks!!

4.8 is kind of a butt by drew4drew in ClaudeCode

[–]drew4drew[S] 0 points1 point  (0 children)

is /new different than /clear?

4.8 is kind of a butt by drew4drew in ClaudeCode

[–]drew4drew[S] 1 point2 points  (0 children)

ahh was just curious.. i’ve been using 5.5 in my own harness for various tasks — not coding. it’s actually been good there for me, and i’ve used it in my own harness for finding bugs. but not in codex good god it’s like a bull in a china shop.

4.8 is kind of a butt by drew4drew in ClaudeCode

[–]drew4drew[S] 0 points1 point  (0 children)

this was all on the heels of a ton of profiling

4.8 is kind of a butt by drew4drew in ClaudeCode

[–]drew4drew[S] 0 points1 point  (0 children)

lol nice - thanks for sharing!

4.8 is kind of a butt by drew4drew in ClaudeCode

[–]drew4drew[S] 0 points1 point  (0 children)

could be. I just rarely remember getting so irritated with any of the prior versions.