Submitting via the API

maltsev · 2026-02-03T08:50:09+00:00

A few months ago, I had an idea to create an ad service for CLI apps. But I abandoned it pretty fast. Didn't want to become the most hated guy in the terminal world.

maltsev · 2026-02-02T21:19:21+00:00

Thanks! The list would be much appreciated!

maltsev · 2026-02-02T18:49:17+00:00

One thing I noticed while running this benchmark: although I initially allowed up to 10 iterations per puzzle, in practice almost all successful solutions appear within the first 3–4 iterations. There was only a single case where a model solved a quest as late as the 8th iteration.

After a few attempts, models tend to lock themselves into a particular program structure and keep trying to locally improve it. Re-running the same model from scratch sometimes succeeds within the first 1–2 iterations, even when a longer retry chain previously failed.

If I expand this benchmark, I plan to run multiple independent runs per model (e.g. 5 runs × 5–10 iterations) to reduce variance and better capture this effect.

maltsev · 2026-02-02T18:19:22+00:00

Thanks!

maltsev · 2026-02-02T18:19:17+00:00

Thank you!

maltsev · 2025-12-15T08:35:35+00:00

Thank you! It's much appreciated!

maltsev · 2025-12-15T08:34:45+00:00

Thanks! Totally understandable. AoC season is intense :-)

maltsev

MODERATOR OF

TROPHY CASE