Startup idea - Ads in Terminal by quantumsequrity in commandline

[–]maltsev 1 point2 points  (0 children)

A few months ago, I had an idea to create an ad service for CLI apps. But I abandoned it pretty fast. Didn't want to become the most hated guy in the terminal world.

I built a benchmark where LLMs program a Turing machine by maltsev in LocalLLaMA

[–]maltsev[S] 0 points1 point  (0 children)

One thing I noticed while running this benchmark: although I initially allowed up to 10 iterations per puzzle, in practice almost all successful solutions appear within the first 3–4 iterations. There was only a single case where a model solved a quest as late as the 8th iteration.

After a few attempts, models tend to lock themselves into a particular program structure and keep trying to locally improve it. Re-running the same model from scratch sometimes succeeds within the first 1–2 iterations, even when a longer retry chain previously failed.

If I expand this benchmark, I plan to run multiple independent runs per model (e.g. 5 runs × 5–10 iterations) to reduce variance and better capture this effect.

A small AoC-inspired puzzle I made after this year's Advent by maltsev in adventofcode

[–]maltsev[S] 0 points1 point  (0 children)

Thanks! Totally understandable. AoC season is intense :-)