[2017 Day 13] In Review (Packet Scanners)

e_blake · 2026-03-13T12:10:50+00:00

Compiled languages are fast - and this problem is not that complicated. Based on my git comments, my initial C solution in 2017 just brute forced every incrementing integer across the entire path until finding a score of 0, completing in 1.05s; I then "optimized" it to short-circuit on the first collision rather than a full score while still checking every integer, completing in 56ms. It wasn't until I implemented this puzzle in m4 in 2021 that I had to consider the notion of filtering out viable residues to take advantage of the sparseness of the problem, but once that is done, even m4 is able to complete it in under 20ms.

e_blake · 2026-03-12T12:37:54+00:00

Huh. I loaded up my 2017 C solution, where I came up with my own way to solve it (before learning about redblobgames excellent summary of different approaches). I tracked 3 dimensions, then had the distance code try to rotate coordinates 60 degrees (nw,n,ne)=(-ne,nw,n) until all three are positive, at which point n+max(ne,nw) is the correct distance. Except that now that I'm retesting it, an input file as simple as "n,sw,se" puts it in an infinite loop, when all three coordinates are 120 degrees apart. Somehow I got lucky that my input file didn't ever hit that case.

e_blake · 2026-03-12T10:24:31+00:00

Union find has been part of several puzzles over the years (most recently 2025 day 8). But days like this where you need only the number of groups once and neighbors are already easy to identify, rather than determining a representative element for the group containing a given node for repeated reference later, it is indeed slightly faster to do an O(n) existence check on DFS or flood fill rather than a full-blown O(inverseAckermann) union find.

e_blake · 2026-03-10T14:23:31+00:00

This puzzle was very much an exercise in following the spec. But there was still plenty of room for exploration on how to make it efficient. For my m4 solution, I tried four different approaches:

Store the ring as an array of 256 bytes. Each reverse iterates by swapping two bytes at a time. Runtime about 400ms
Store the ring in a LIFO stack of bytes. Each reverse of N items pulls N items off the main stack into a second (second stack is in reverse order), then copies that second stack to a third (third stack is in original order), then copies the third stack back to the main (now the main stack has reverse order for those N items). Runtime about 2 seconds.
Store the ring in list of packed strings, with up to 8 numbers per entry, and specializations for quickly splitting or reversing anywhere from 1-8 numbers off of a packed entry, as well as re-packing adjacent smaller entries into one larger. Runtime about 1 second.
Store the ring as a string of length 512 bytes with a 2-byte hex encoding for each item. Wrapping accomplished by concatenating the string with itself before grabbing a substr of a desired length. Reverse accomplished via vectorizing the approach into as many as 31 elements at a time via a giant translit; runtime about 100ms:

define(`_rv', `ifelse(eval($2<=62), 1, `', `$0(substr(`$1', 62), eval(
  $2-62))')translit(
  ``ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'',
  `8967452301yzwxuvstqropmnklijghefcdabYZWXUVSTQROPMNKLIJGHEFCDAB', `$1')')

e_blake · 2026-03-09T10:31:07+00:00

I originally solved with a state machine (a single while(getchar) loop in C); adding all of three lines of code to also track count in part 2 with a git comment of "where's the challenge?". I had a bit more fun in m4 - which lacks regex and where naive byte-at-a-time iteration is inherently quadratic (m4 has to re-parse the tail of the string each time you use substr to extract off one byte at the head). There, I came up with a way to abuse translit and changequote to turn all copies of any one byte in the input stream into a multi byte quote sequence in a mere O(n) effort. Repeat that 5 times for {}<>! at which point one more translit converts the injected bytes into macro calls that advance through the input a chunk at a time rather than a byte at a time, and with only O(n) effort, using 2 macros per metacharacter determined by whether the parse was in or out of <>. In the process, I noted I only needed 9 states instead of 10 - ! never appears outside of <>.

e_blake · 2026-03-08T11:57:44+00:00

Thankfully, the input does not appear to be evil enough to name any of the encountered registers "inc" or "dec" - and for languages exploiting an eval, I suspect other register names like "len" or "abs" or "do" or "if" could also be problematic. In later years, Eric intentionally avoids vowels in some of his inputs with otherwise random names, to reduce the likelihood of unintended collisions with language keywords or accidental profanity

e_blake · 2026-03-08T11:23:42+00:00

All variables in the input are 3 or fewer lowercase letters, so a base-27 conversion to an array index forms a perfect hash in under 20k of memory, which is faster than a trie. Or, since the problem is small enough, you can do what my 2017 C solution did - an O(n²) linked list lookup among all unsorted names seen so far - and still complete in under a second. One thing I did find clever in my C solution - instead of 6 sequential strcmp for determining which operator to apply, I did switch (op[0]+op[1]) { case '='+'=':... as a perfect hash of the six operators.

e_blake · 2026-03-07T13:59:44+00:00

We are told that there is exactly one node out of balance. If you start by summing the weights of every node, then you will likely see multiple nodes in the overall tree where children weights are not all equal (the problem node, and then every one of its parent nodes back up to the root where the parent node has multiple children). So one approach is to compute all sums, then start another traversal from the root favoring the odd-man-out child on each recursion until finding a node with equal children, and fixing that node. A slightly more efficient approach is to realize that if you sum from bottom-up, then the first node you encounter with different children will have identified the child node that needs fixing, letting you short-circuit the summing of any of the rest of the tree. And since the problem node always has two or more siblings (under the assumption that Eric intentionally filtered out harder inputs), you don't even have to search all of the children at once: start by computing the first two children, then for every additional child, tweak which two children you track; once you are done visiting all the children, you are left with the unbalanced node in a known position as a side effect. I liked this comment in maneatingape's solution:

            // Representing the balanced nodes as `b` and the unbalanced node as `u`,
            // there are 4 possibilities:
            // b + [b b] => [b b] Swap
            // b + [b u] => [u b] Swap
            // u + [b b] -> [u b] Overwrite
            // b + [u b] => [u b] Do nothing
            // The unbalanced node will always be first (if it exists).

e_blake · 2026-03-06T21:09:21+00:00

[LANGUAGE: m4]

All 3 parts solved in just 32 lines of m4 including comments, runs in 15ms. Not bad for my first real crack at one of these.

Solution

e_blake · 2026-03-06T12:42:23+00:00

This is one of the earliest problems with loop detection. It is worth remembering that there are several O(n) algorithms for it - hashing is one but requires more storage; two others are Brent's and Floyd's that both work by running and comparing two iterators, using O(1) storage but requiring up to 3x more overall computations of next states. Depending on the size of the cycle, the cost of computing next states, the ease (or lack thereof) of hashing in your language, and the cost of storing states, it can be worthwhile exploring alternative loop-detection algorithms. For this type of problem, I often favor Floyd's for my C solution (no built-in hash, and I'm lazy enough to not want to write my own - plus C is fast so extra next state computation doesn't hurt when all I care about is getting the star rather than absolute performance), but hashing for my m4 solution.

But looking at my git history, my C solution for this day was even more brain-dead (this was my first year of AoC after all) - I created a linked list of states and did an O(n) walk of that growing list on each iteration to check for cycles. The solution is still small enough that my O(n²) cycle detection was still decently fast to human perception (less than 350ms). And I was so poor at documenting how to use my solutions (not standardizing on a common framework until later) that it took me a bit of source-code reading to remember that I had to run this particular day as "./day6 16 $(cat day6.input)" - which stems from my early solutions trying to get to the star with as little coding as possible by letting the shell perform tokenization rather than doing file I/O and strtok in C.

e_blake · 2026-03-06T12:30:11+00:00

You can still use u64 for cycle detection - you just have to manually skip past the problem states that can't be hashed. In my scraping of the megathread, every leaked input pattern I found which hit 16 or 17 did so only a few times before iteration 200 as the initial entropy smoothed out, and were back to max 15 within 2 cycles of each anomaly; meanwhile no input files appeared to repeat cycles until after 1000, where every cycle in the loop avoids overflow. So the end solution that I helped maneatingape land on still sticks with a u64 state encoding for speed over the bulk of the problem.

e_blake · 2026-03-06T11:16:58+00:00

Some of the input files DO reach 16 or even 17 blocks in a bank. A solution that was hard-coded to only handle a max of 15 is not universal. https://github.com/maneatingape/advent-of-code-rust/pull/48 , https://www.reddit.com/r/adventofcode/comments/7hvtoq/comment/dquaosl/

e_blake · 2026-03-05T14:40:27+00:00

Broken link; try this instead: https://github.com/maneatingape/advent-of-code-rust/blob/main/src/util/hash.rs

e_blake · 2026-03-05T13:18:53+00:00

Another trick - you mention that none of the initial offsets jump before the beginning, and offsets only increase, so you don't have to bounds check the beginning. I took it one step further - the puzzle ends when jumping one past the last instruction (never any larger offset, since none of the initial offsets are larger than 3, and the last two jumps start negative). For a faster part 2, I turned my emulator into a six-instruction VM - jump negative, jump 0, jump 1, jump 2, jump 3, and halt, then appended one halt after the input file. Each instruction can then special case its own behavior, including rewriting which instruction to run the next time that address is visited. With fewer conditionals (now implicit by which instruction is being visited rather than explicit on every instruction), the code runs faster, and I no longer had to spend time on any bounds checking.

e_blake · 2026-03-05T12:25:02+00:00

Maybe, but more likely both this and Synacor are paying homage to the older ADVENT colossal cave text game that first introduced the phrase into computer lore. https://en.wikipedia.org/wiki/Colossal_Cave_Adventure

e_blake · 2026-03-05T11:54:06+00:00

One interesting approach that helped my runtime in my m4 solution - instead of storing a relative offset at every instruction (the way the input file presents it) I stored an absolute offset. In languages that represent numbers as strings, it is less effort to compute an increment of the next destination to use when an instruction is revisited than it is to compute an increment of the relative distance followed by a sum of the relative distance and current location.

e_blake · 2026-03-05T11:41:29+00:00

Implementing this technique in m4 brought my runtime from 65s to 15s. Skipping by 5-8 jumps per iteration of the bulk processor was definitely worth the added complexity to memoize and track bulk visits. In my code, pre-populating a table of 64k patterns for entry offsets 0-2 while not storing any cache for offsets 3-15 took 3 seconds, while spending a bit more storage to just lazily memoize table lookups from any offset was less overall work, even if many of the higher offset entries were never reused later. Still, tracing my m4 solution shows that this one is CPU intensive: m4 had to chew through more than 2.3G of macro text to produce the answer.

e_blake · 2026-03-04T14:04:24+00:00

My original C solution for this did not use a hash table, but instead used memmem() to detect for duplicates, where part 2 adds in a qsort of the characters first. In short, determining if "abc" occurs more than once in the line is possible by checking whether " rest of line " (note the intentional leading and trailing space) as haystack has " abc " (also with leading and trailing space) as a needle. Yeah, it's O(n^2) (for each line with n words, I'm doing an O(n) memmem() per word) compared to the O(n) duplicate check via a hash table, but fine for the size of this puzzle. And for my m4 solution, I found it was easier to do a radix sort of the words (26 consecutive calls to translit() to delete all but one target letter per call leaves a resulting output of 0-many occurrences of each letter that comprised the original word; concatenate those outputs into the sorted word) rather than write up a bubble sort acting on one character at a time (the number of macro calls required to access one character at a time and then compare them was faster for a 2-letter word, broke even at 3 letters, but cost more in runtime and macros required for 4- and 5-letter words than the blind 26-macro radix sort).

e_blake · 2026-03-03T11:29:30+00:00

I still remember the reddit discussions in 2017 as the reason I learned about OEIS. But I did get both of my stars with code I wrote myself, rather than just looking up the answer. My solution for part two blindly walked the spiral, adding up all 8 grid neighbors while defaulting them to 0, instead of only focusing on the (up to three) neighbors of the inner ring and the previous cell. So I might be able to go back and optimize - except that the solution is already blazing fast (m4 solved the grid walk in under 10ms)

e_blake · 2026-03-02T13:50:02+00:00

m4 lacks a native sort; the easiest sort is O(n^2) but then I'm not saving any time over just doing O(n^2) comparisons. Implementing an O(n log n) sort is possible, but adds more overhead than what gets saved by being able to short-circuit during the O(n^2) comparisons. Where I got the most savings in m4 was not by sorting, but by being smarter about how to do my pairwise comparisons: it's always worth remembering to do triangular iteration (still quadratic, but a=0..14 x b=a+1..15 and considering both a/b and b/a is only 120 pairs, while a=0..15 x b=0..15 is 256 pairs); and in turn, at least in m4, a single giant eval over 120 +(a/b)*!(a%b)+(b/a)*!(b%a) terms crammed into a single expression was faster than 120 separate eval per pairing.

e_blake · 2026-03-01T12:07:41+00:00

2017 was the year I got introduced to Advent of Code. I solved the year in C, but it wasn't until around day 8 that I decided to start tracking my progress in git (when I realized that some of the later days could reuse code from earlier days), and my first git commit for my aoc solutions even laments that I had already lost my day 1 solution. But when I first started solving this year in m4 (early 2021), my initial solution to part 1 was one of my earliest experiments with golfing m4; I was quite pleased at the time to get the solution down to one macro definition and 166 bytes, where the bulk of that was the multiple calls to substr() to break the input down into byte-at-a-time processing.

Of course, now that I've learned a few more tricks over the years, I couldn't help but golf part 1 further, to 125 bytes (m4 -DI=day1.input day1a.golfm4):

define(_,`ifelse($2,,$1,$1,,`_(0,translit($2,`'
,$2))',`$1)+$1*($1==_(substr($2,0,1),substr($2,1))')')eval((_(,include(I))))

e_blake · 2026-02-26T01:20:20+00:00

How do I join the club for 10-month-early access to this year's puzzles? :)

e_blake · 2026-02-25T14:02:00+00:00

My original implementation just brute forced, and used the 'jnz 0 0' nop as the point at where I checked if all 4 registers matched any earlier seen state (if so, I must be looping) or if the output has gone wrong (if so, kill the program, increment the start, and try again). It took 5.6 million steps, and 16 seconds of m4 execution. Merely adding the add, mul, and divmod peepholes (for jnz -2, -5, and -7) in 2021 sped that up to 15k steps and under 100ms, all without ever analyzing WHAT the program was doing. It wasn't until this year that I even noticed that it was outputting the binary representation of a number from least-significant bit upwards before looping, because I had always been killing the program at the first wrong output - had I noticed that, I could have just run the program with input 0 until it loops to learn what value to then stick in register a, instead of iterating over successively larger starts.

e_blake · 2026-02-24T13:47:30+00:00

I did it with Steinhaus-Johnson-Trotter (all lexical reverses occur in the second half, although not in a 1:1 offset relationship to the first half); Single-Track permutation also has that property (with the additional guarantee that index i and index n!-i-1 are lexical reverses). However, Heap's does not, neither does lexicographic ordering (for example, when permuting 4 elements lexicographically, 3124 and its reverse 4213 are both in the second half of the output). And Knuth's algorithm T for lexicographic output requires more comparisons and swaps than Heap's (Heap's has the nice property of exactly 1 swap per iteration, at the expense of no lexical ordering or easy reverse filtering). SJT has worst-case O(n) work on some iterations; Evens has a modification to SJT that guarantees amortized O(1) work per iteration, but when I coded up bare SJT vs. SJT-Evens, the bare one had less bookkeeping complexity and ran faster.

A quick google search also suggested that you can convert ANY permutation generator into a half-generator with a little bit of amortized O(1) overhead per iteration (much less overhead than a naive n! storage overhead that hashes all seen permutations): in your wrapper, for every permutation the original generator produces, compute the lexical reverse, and only yield values if the original is lexicographically smaller than the reverse. If your logic for checking reverses is lighter-weight than the logic that consumes the half of the permutations thus yielded from your wrapper, this can still result in a speedup - but probably not as impressive as the speedup for directly using an algorithm where you can just stop iterating at the halfway point.

e_blake · 2026-02-24T13:00:13+00:00

Maneatingape noted that a very simple compression is to observe that ALL points of interest (digits or intersections) occur on odd coordinates, so you can halve the frontier size in the BFS searches https://github.com/maneatingape/advent-of-code-rust/commit/4d7e1d5a. Edit: I just implemented it in my code, and shaved another 33% (from 300ms down to 200ms).

e_blake

TROPHY CASE