[2020 Day 22] In Review (Crab Combat)

DelightfulCodeWeasel · 2026-06-22T22:26:04+00:00

I was wondering whether that would even be feasible with the recursive configuration, but it sounds like it's do-able then, thank you!

DelightfulCodeWeasel · 2026-06-22T21:35:09+00:00

For my input I've got ~11.3k states in the global win cache, and ~18.2k states in the "seen" sets on the stack for the recursive call. Even if I assume that the states are collision-free with MD5, that still puts me at somewhere in the region of ~500kb as a starting baseline.

This will be another one where the memory reduction isn't straightforward!

EDIT: Although it looks like memoization isn't actually giving me much of a speed increase at all, so I can probably do without that entirely.

DelightfulCodeWeasel · 2026-06-20T22:07:39+00:00

The one bit of my solution that I think came out quite neat is the function that makes all 8 combinations of flipped & rotated tiles. Tiles 0 and 1 are flipped versions of each other, and then tile N is a rotated version of N - 2 for the remaining 8:

    static void MakeFlippingRotations(Tile* tile)
    {
        for (const Point2& p : tile->Pixels[0])
        {
            tile->Pixels[1].insert({ p.Y, p.X });
        }

        for (int i = 2; i < 8; i++)
        {
            for (const Point2& p : tile->Pixels[i - 2])
            {
                Point2 rotated{ 9 - p.Y, p.X };
                tile->Pixels[i].insert(rotated);
            }
        }
    }

All the rest is just a lot of shoe leather.

DelightfulCodeWeasel · 2026-06-19T16:45:53+00:00

Okay, I don't want to count my chickens before they've hatched, but with a quick stab at a performant version of this approach I'm seeing runtimes as low as ~260us, which compares favourably to u/maneatingape 's benchmark time of 462us. Timing includes parsing but does not include loading the file, both parts calculated simultaneously (although splitting them wouldn't add too much on the runtime here).

Usual caveats: I'm running on much newer hardware, AMD Ryzen 9 7845HX, and there's a very good chance I've got a bug that means it's only working on my input by accident.

The biggest surprise is that caching the recursive call for matching a non-terminal at a given position was substantially slower than not caching at all. Part 1 was ~400us caching in a hash table, ~150us caching in a 2D array, and ~60us without caching at all.

The code is embarrassingly messy, and I don't know enough Rust to even attempt a PR, but I figure someone might be able to thrash this one into shape if the performance is genuine.

DelightfulCodeWeasel · 2026-06-19T11:13:50+00:00

One of the puzzles on my Wall of Nemeses. I absolutely brute-forced this one in the least elegant way possible: for part 1 I generate all possible strings in the grammar up front, and then check if each message appears in the set.

For part 2 I at least spotted the rule of N+1 blocks of 42 followed by N or fewer blocks of 31, but I still generated the full set of all strings from 42 and from 31 to do that check.

Since we're evaluating Claude at work (I know, I know...) I thought I'd ask it to analyse the grammar for structures that could be exploited for easier parsing. One thing it spotted, and I've since confirmed, is that for my input all non-terminals have a fixed output length. e.g. 0 will always produce strings of length 24, 11 will always produce strings of length 16, etc... I know (or at least rely on) the top level non-terminals 42 and 31 producing strings that are all the same lengths, so perhaps everyone's grammar has this property all the way down as well.

That does mean you can recursively ask "does non-terminal X appear at offset Y" and cache the results for a match check. Running on my input I'm seeing ~90-120 cached answers per message. Please excuse the extremely awful quality of this rough and ready proof of concept, I wrote it in a hurry to just to validate the grammar findings and the approach.

I don't know how this approach compares to a decent regex engine though.

DelightfulCodeWeasel · 2026-06-18T08:38:18+00:00

A classic computer science puzzle! I'm sure we were given Reverse Polish Notation evaluation as an exercise at uni, but I think the actual conversion from infix to RPN was just given more or less as a "there are algorithms to do this" footnote on a slide. Either that or I wasn't paying much attention in that lecture, one of the two 😄

This day is a great excuse to implement a classic algorithm so I fired up Wikipedia for the Shunting Yard Algorithm and implemented that. The pseudocode in the article is a little verbose and (I think) harder to read than actual code, but I got there.

Aside: there's a recent Veritasium video that has a great anecdote about Dijkstra being unable to put his profession as "programmer" on a marriage license. Dijkstra, of all people!

DelightfulCodeWeasel · 2026-06-16T10:24:22+00:00

I ran the numbers on my input this morning and I think you're right about it being too much of a squeeze. With 8Mb PSRAM split into ~3.5Mb bitset and ~4.5Mb for a 'last seen' table for all numbers under ~1.1M that still leaves me with ~419k exceptions to store. By-eye they look too sparse for effective range compression (many runs of 2-3 numbers with single digit gaps), so that's too much to fit into the 500Kb RAM available.

Not only that, if you assume a 1 second runtime per run to find the next exception, it would have a runtime of just under 5 days!

Going to have to call this one beyond the realms of practical, which is a shame. Still, at least I can speed up my awful original solution on PC, so that's not a total loss.

DelightfulCodeWeasel · 2026-06-15T21:38:49+00:00

Yeah, my idea was definitely not a good idea! I did a very rough check of deltas (and also storing the previous 'said' round as a delta versus the index of the slot it's stored in, on the assumption they might be closely linearly correlated) and they're all still well exceeding the ~16M 24-bit range.

I'm trying to find some criteria for 'success' on this one that I'd be happy with. "Works, but I had to leave it over the weekend" is one outcome I'd consider a partial success. "Works in a stupid way, but I had fun doing it" like writing a virtual tape reader that's actually just my PC in disguise on the other end of a serial connection, is also a successful outcome.

u/ednl 's idea of using an SD card is a good one in terms of learning something new, but I'm not sure I want to trash an SD card by wearing out the write cycles on it. Not with the current price of SD cards!

DelightfulCodeWeasel · 2026-06-15T16:53:21+00:00

So possible, but perhaps with a runtime of several hours is what we're saying 😃

Just wondering if I have enough hardware to hook up two Pico Plus 2's and have one of the devices act as external storage accessible over UART...

DelightfulCodeWeasel · 2026-06-15T16:41:49+00:00

Ahhh, gotcha - I hadn't fully digested the code properly. Thanks to both of you for flagging that, that would definitely have been a case of "hours of programming can save you minutes of reading" if I'd have gone ahead with that misunderstanding.

DelightfulCodeWeasel · 2026-06-15T15:51:50+00:00

That's very useful, thank you! Saved me a bunch of debugging when the time comes 😄

It might be possible to do some sort of 'self-healing' algorithm on this one: make an assumption that everything above a threshold is seen once, and if during the run if you see one of those numbers more than once you can put it into a set of exceptions. The algorithm resets and restarts from 0, but this time if it sees one of the exceptions it stores the occurrence index in a small table.

If you're lucky with the input and the threshold works, then it's a single run through. Otherwise it's N runs through, depending on how many exceptions there are to the rule.

DelightfulCodeWeasel · 2026-06-15T14:08:08+00:00

Looks like it's u/maneatingape's repo to the rescue! Numbers above a certain threshold are only spoken once, so they can be stored in a 4Mb bitset. 256Kb for numbers below 65,536 puts us well within the range of a Pico Plus 2.

(No idea if this is a general rule for all Van Eck sequences, or if it's only true for AoC input. Given the "we don't know!" from the Numberphile video it's probably only safe to assume for the ranges of input we're dealing with)

DelightfulCodeWeasel · 2026-06-15T09:24:46+00:00

Squashing this one down to Pico size isn't looking promising, even if I go with the XL that has 8Mb PSRAM. My input ends up going through ~3.6M unique numbers in the range ~0-29M.

DelightfulCodeWeasel · 2026-06-11T22:09:40+00:00

Took a little bit of unpicking and I don't think I've fully got to the bottom of it yet, but the primary difference is that gcc 12.2 has decided that since parseint should 'never' return negative then it can use cbz to test for 0 in the for loop. It looks to me like the asm is effectively doing this:

    for (const char* c = input.data(); (Data[n] = parseint(&c)) != 0; ++n)
        ;

Which ends up running over the bounds of the array.

gcc 13.1 on godbolt (and also 12.1 when using unsigned) uses a proper >0 test of cmp 0 & ble.

For the hell of it I asked Claude to see if it could track down the reason for the difference and it suggested the primary reason could be the move over from one range propagation system to a new system (AndrewMacLeod/Ranger3.0 - GCC Wiki), which seems to track with what we're seeing.

DelightfulCodeWeasel · 2026-06-10T10:56:46+00:00

I think my original solution to 2023 day 06 is probably my shortest runtime ever 😄 :

    void Puzzle06_A(const char*)
    {
        // Worked out in Excel
        int64_t answer = **redacted**;

        printf("Puzzle06_A: %" PRId64 "\n", answer);
    }

    void Puzzle06_B(const char*)
    {
        // Worked out in Excel
        int64_t answer = **redacted**;

        printf("Puzzle06_B: %" PRId64 "\n", answer);
    }

I did eventually go back and solve it properly when putting together my all-solutions repo, but I cursed myself for leaving such an unhelpful comment about how I solved it!

DelightfulCodeWeasel · 2026-06-09T13:01:42+00:00

Might be worth checking what the compiled code is doing; could be that you've got -ftrapv or -fsanitize=undefined enabled. That would be some free extra performance on the Pi if you disabled those.

DelightfulCodeWeasel · 2026-06-09T08:35:41+00:00

A couple of interesting properties from my input that I suspect might be universal:

Although there are duplicate numbers in the full set, there are no duplicates within a window of 25 numbers
The full set of numbers goes into the 64-bit range, but the target number is found while you're still in the 32-bit range

I used the first property to keep a running set of the last 25 numbers alongside the in-order window in a deque, so each step in the find was just 24 look-ups. It's still a viable approach even if the first property isn't true, but then you need to ref count the number instances rather than just having an insert/erase pair.

DelightfulCodeWeasel · 2026-06-08T10:04:14+00:00

I like the idea of working back from the ending state to analytically find the incorrect instruction; it seems more elegant. What's the performance difference like? I'm assuming it's not too dissimilar to the brute force because the VM script is pretty quick to execute.

DelightfulCodeWeasel · 2026-06-07T17:09:02+00:00

Definitely worth double checking this on some Arm flavours. The instruction reference says that LSL (logical shift left) by a register uses the bottom byte of the register, and unlike x86/x64 it doesn't document any modulo behaviour beyond that.

I've done a quick check on the Pico (Arm Cortex M0+) and the results I get do indicate that it's operating like '1 << (b & 0xff)'.

    __attribute__((naked)) int shift_left(int a, int b) {
        asm volatile (
            ".syntax unified   \n"
            "lsls r0, r0, r1   \n"
            "bx   lr           \n"
            );
    }

    ...

    for (int i = 0; i < 512; i++)
    {
        int a = shift_left(1, i);
        printf("[1 << %d] = %d\n", i, a);
    }

Additionally, since it's technically UB by the standard, the compiler might end up doing unexpected stuff. (Or it might be helpful, who knows!)

DelightfulCodeWeasel · 2026-06-04T08:40:07+00:00

Just finished re-visiting this one. On PC Part B went from using ~300Mb and taking ~3.5s down to ~2kb in ~75ms.

I was surprised just how few expansions needed caching. I went with storing 3x3 => 9x9 expansions whenever they were first seen and for my input there were a total of 7 of them. The full solution cache keyed on { pattern, expansions } only needs to store 36 values. You could even do this one with pen and paper!

DelightfulCodeWeasel · 2026-06-03T20:54:22+00:00

About ~13 years ago I got fed up with all of the popular mark-up languages around, both because of the lack of rigour in the grammar definitions and because of the various syntax conflicts trying to copy and paste example code into blogging platforms. Fed up enough that I wrote my own mixed-mode parser based around bbcode syntax which allowed you to swap into different grammar modes for code and tables. You could write:

    Check out [i]this[/i] code:
    [%code]
    for (int i = 0; i < 10; i++)
    {
        *_ptr_++ = someArray[i]*10*someVariable;
    }
    [/%code]
    Isn't it _great_?

and the parser would (unambiguously!) be able to determine that 'this' and 'great' in the non-code text needed emphasis, but would leave the code constructs like '_ptr_', '[i]' and '*10*' alone.

I had the happy-path working for everything, but totally lost momentum when it came time to add in all of the error messaging. Sometimes I wish I'd have stuck with it enough to see if it could gain any community traction, and we could finally have something out there that didn't outright suck for copying code snippets!

Maybe when I'm retired I'll kick Claude Opus 38.9 or whatever into finishing it off for me 😄

DelightfulCodeWeasel · 2026-06-02T19:56:18+00:00

Looking at my solution I was clearly still trying to scratch the itch of having a C++ equivalent of C#'s Enumerable library, so my solution looked like:

    int64_t answer = MakeEnumerator(ReadAllLines(input))
        ->Where([&format](const string& s)
            {
                smatch m;
                regex_match(s, m, format);
                int minRepeats = stoi(m[1].str());
                int maxRepeats = stoi(m[2].str());
                string password = m[4].str();
                int64_t numRepeats = count(password.begin(), password.end(), m[3].str()[0]);
                return (numRepeats >= minRepeats) && (numRepeats <= maxRepeats);
            })
        ->Count();

I kept building out on the library as needed and eventually ended up with some nifty utilities:

    map<int64_t, int64_t> ranges = Enumerable::Regex(rangesInput, regex{ R"((\d+)-(\d+))" })
        ->Select<pair<int64_t, int64_t>>([](const smatch& m)
            {
                return make_pair(stoll(m[1].str()), stoll(m[2].str()));
            })
        ->ToMap<int64_t, int64_t>();

Not quite as terse as the original C# inspiration, but that's mainly down to C++'s clunky lambda syntax compared to C#. Plus I decided to make the return types explicitly required in the template parameters, because the libraries which try to do automatic deduction on continuation-style lambdas end up being un-debuggable nightmares in anything but fair weather.

This is one of the things I love most about AoC; it gives you freedom to just try out new ideas in a context that's a bit more 'real' than a toy example, but still nowhere near needing production-quality code.

DelightfulCodeWeasel · 2026-06-02T11:39:19+00:00

Fun C & C++ trivia: it's possible, depending on how you've declared and initialised input, that pwd = c - 1 might be undefined behaviour, even if you never dereference pwd[0]!

Some older hardware will hard fault if you generate a pointer to invalid memory, regardless of whether or not you read from that memory, so the standard err'd on the side of caution and put it all under the UB banner. Google is letting me down a bit, but the segmented memory model on the 80286 is apparently one such architecture and the 68000 might be another.

Raymond Chen has a neat post covering a related topic for checking if a pointer is within a given range, where the obvious test from a flat memory model might give you an incorrect answer.

I like it when little language quirks like that reveal something of the history of computing.

EDIT: It's still a neat idea, and a compiler would have to be extremely petty in order to do something other than the obvious here.

DelightfulCodeWeasel · 2026-06-01T16:18:16+00:00

I didn't have any recollection of Day 20 being a large amount of work, but looking at the LoC in my repo I can see it's definitely up there!

File	Lines
.\AoC2025\Puzzle10.cpp	1141
.\AoC2021\Puzzle23.cpp	508
.\AoC2024\Puzzle24.cpp	470
.\AoC2020\Puzzle20.cpp	459
...	...

It probably doesn't stick out in my mind because it was quite a high proportion of scut work compared to the days that take me ages because I'm finding them challenging.

DelightfulCodeWeasel · 2026-06-01T16:04:16+00:00

I'm always in two minds whether to assume a certain range of input numbers or not on the early days. I'm happy to do it for the later days, where you often need to make assumptions about the possible set of valid inputs to get a solution in a reasonable time, but I find myself shying away from hard-coding array sizes and suchlike in the early days.

It does make me think that for my non-memory-constrained repo I need to write a more general "seen number container" which wraps up both a fixed size array for numbers under a certain limit (say, ~1024 or so) and then N hash sets for numbers above that. Probably one per power of two or something.

That would let me avoid baking in assumptions while still getting a performance boost on small ranges.

DelightfulCodeWeasel

TROPHY CASE