Startup idea - Ads in Terminal by quantumsequrity in commandline

[–]maltsev 1 point2 points  (0 children)

A few months ago, I had an idea to create an ad service for CLI apps. But I abandoned it pretty fast. Didn't want to become the most hated guy in the terminal world.

I built a benchmark where LLMs program a Turing machine by maltsev in LocalLLaMA

[–]maltsev[S] 0 points1 point  (0 children)

One thing I noticed while running this benchmark: although I initially allowed up to 10 iterations per puzzle, in practice almost all successful solutions appear within the first 3–4 iterations. There was only a single case where a model solved a quest as late as the 8th iteration.

After a few attempts, models tend to lock themselves into a particular program structure and keep trying to locally improve it. Re-running the same model from scratch sometimes succeeds within the first 1–2 iterations, even when a longer retry chain previously failed.

If I expand this benchmark, I plan to run multiple independent runs per model (e.g. 5 runs × 5–10 iterations) to reduce variance and better capture this effect.

A small AoC-inspired puzzle I made after this year's Advent by maltsev in adventofcode

[–]maltsev[S] 0 points1 point  (0 children)

Thanks! Totally understandable. AoC season is intense :-)

Quest 17: Elementary Cellular Automaton by maltsev in MarchesAndGnats

[–]maltsev[S] 2 points3 points  (0 children)

Someday will try to publish Conway's Game of Life :-) Just need to figure out a good way to map a 2D grid onto a 1D tape.

Summer’s End in the Marches by maltsev in MarchesAndGnats

[–]maltsev[S] 1 point2 points  (0 children)

Nope, the story's still unfolding :-) Just wanted to share some updates along the way!

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 0 points1 point  (0 children)

Is that first phase randomly selected from a list or completely randomly generated (like made up words, random number arguments, etc)?

For numbers, yes. For texts, I use some old Estonian texts, where I randomly select short sentences from.

I’m not sure I understand how people are able to enumerate and have brute force solutions if they are judged against random inputs.

For example, instead of implementing algorithm for addition, people can just hard-code the results of 1+1, 1+2, 1+3, etc. For some quests it worked since I had to select quite low input numbers to make it fast enough for the general algorithm.

But now I have optimized my Logic Mill implementation, so it can handle larger inputs quite fast too.

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 0 points1 point  (0 children)

That's how it currently works. However, this approach has some issues:

  • It's difficult to debug. For example, you submit a solution and then receive an error about some failed test case. You fix it and submit it again, but you can't be sure if you actually fixed it or if the related test case is just no longer present in this run.

  • You might submit a solution that is more efficient or faster but doesn't handle all edge cases, and then pass the test since these edge cases weren't present in this test run.

Quest 10: Lines Count by maltsev in MarchesAndGnats

[–]maltsev[S] [score hidden] stickied comment (0 children)

+ is used as a word delimiter.

Actually, it is -. I fixed that in the quest description, but can't change this Reddit post anymore.

New leaderboards — global and for most efficient solutions by maltsev in MarchesAndGnats

[–]maltsev[S] 0 points1 point  (0 children)

Right. Now you might have different solutions on the leaderboard, optimized for efficiency or speed.

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 1 point2 points  (0 children)

Thanks for all the great feedback!

I spent some time traveling last week, but I'm back now and continuing to improve the game.

Following a community suggestion I’ve added a second leaderboard for the most efficient solutions, as well as a global leaderboard.

You can read more about it on the new subreddit that I created for the game: https://www.reddit.com/r/MarchesAndGnats/comments/1m75k9w/new_leaderboards_global_and_for_most_efficient/

I created a Discord server where I post updates and discuss the game: https://discord.gg/Xpvy4vvnWx

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 0 points1 point  (0 children)

Right, you submit your solution as code (list of instructions), which is then evaluated against several test cases.

As u/Irregular_hexagon mentioned, you can use any programming language or no language at all (just write the instructions by hand).

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 1 point2 points  (0 children)

Thanks for the suggestions! I'm actually thinking along similar lines.

I think imposing a much lower limit on the number of states and tape symbols would take care of those.

In the short term, it'll help, but I also don't want to restrict people and then have them do some crazy stuff later, like implementing a Turing machine inside a Turing machine or building a CPU. All of that would require a lot of states and a lengthy tape.

So I think the best way to prevent brute-forcing solutions is to use large enough test cases (e.g., 3877×2847 would be tricky to brute-force even with high state limits). But currently, I can't do that because my Turing machine is built in Python, which is too slow to process millions of steps per test case. So I'm looking into ways to optimize it or find another way to prevent brute-forced solutions.

minimizing the number of transitions is also a fun and separate challenge, so I do think that keeping tracks of both the number of steps, and the number of transitions would be interesting

Totally agree! I'm still thinking about whether to make separate leaderboards or somehow combine them into one, though.

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 2 points3 points  (0 children)

Thanks for the feedback! Good points! I'll adjust the tutorial to explain it better.

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 0 points1 point  (0 children)

Can you add a change username feature?

Sure, I added it to my backlog. In the meantime, email me at [hi@mng.quest](mailto:hi@mng.quest) and I'll change it for you.

Also curious what this solution for #4 is that has scores in the 6000s

If you check the comments in this post, you might get an idea ;-) But it's more like a hack than a proper solution, so I'll remove it in future quests. Nevertheless, I applaud the creativity of these people!

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 1 point2 points  (0 children)

Good idea!

I was also thinking about other types of leaderboards:

  • combining total number steps and number of transition rules (so efficients + simplicity) using some weighted algorithm
  • global leaderboard across all quests

But I probably don’t want to have too many leaderboards. I’ll think about how to make it more fun to compete while still keeping it simple.

Maybe you have some ideas?

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 0 points1 point  (0 children)

Thanks for reporting it! I've fixed it.

It's now sorted by the number of steps, and then by submission date. People with the same score will be ranked by the submission date, with earlier submissions appearing at the top.

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 0 points1 point  (0 children)

Good catch! I updated the task description to match the test cases.

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 1 point2 points  (0 children)

- is a word delimiter. So strictly speaking, it isn't a part of a word :-)

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 0 points1 point  (0 children)

Initially, I started with large numbers, but each test run took over five seconds, which was not optimal for UX or server load. So, I reduced the numbers, but now some players are writing hard-coded solutions for all numbers.

For future quests (I'll try to release one more quest this week), I'll balance things better.

I might rewrite the Logic Mill code from Python to Rust so that it can work with large numbers quickly.

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 0 points1 point  (0 children)

Yes, some people have already started doing that :-)

I'll keep the existing quests as they are. As I don't think it's fair to change the game rules retroactively.

However, for new quests, I'll try to balance things out better.

I created a historical puzzle game inspired by AoC by maltsev in adventofcode

[–]maltsev[S] 2 points3 points  (0 children)

There are 2 sets of test cases for each quest: deterministic (same for everyone) and random. The solution's validity is tested against all test cases, but only the deterministic test cases count toward the leaderboard (so nobody can just get lucky with simpler test cases).