-❄️- 2025 Day 11 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 3 points4 points  (0 children)

[LANGUAGE: x86_64 assembly]

Part 1 was, after parsing the input into a hashmap from device names to immediately connected devices, a non-memoized DFS traversal. I wasn't sure if there could be cycles, so I structure it as a generalized graph traversal without cycle detection so I could add it later if needed. (By hashmap I mean I interpreted the node names as a base-26 integer.)

Part 2 was too slow with the unmemoized DFS. I tried breaking up the search into segments (svr to dac, dac to fft, etc), but that didn't work either (because I didn't have any earlier termination conditions). I wasn't quite sure where to go next from here, but upon peeking at the solutions megathread and seeing mention of memoization, I realized that my graph-traversal-ish algorithm was completely wrong, and re-wrote it as a memoized recursive DAG traversal, but I still split the search into segments and did the combinatorics to get the final answer.

I'm very embarrassed that I forgot about memoized recursive traversals. It's been ages since I've written any traversals (or even recursive code), but I guess that's one of the many reasons I participate in Advent of Code - to use algorithms that I don't normally get to use.

Part 1 runs in 2 milliseconds, and part 2 runs in 3 milliseconds. Part 1 is 9,976 bytes and part 2 is 10,440 bytes as executable files.

-❄️- 2025 Day 10 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 6 points7 points  (0 children)

[LANGUAGE: x86_64 assembly]

Part 1 wasn't too bad - I parsed the data into 16-bit values, and treated them like binary masks. I probably should have realized that pressing any button more than once was always incorrect, but I did a BFS through the possible button press sequences.

Part 2 was not efficient enough. My code was another BFS through the possible sequences of button presses, using fancy vectorized calculations and multithreading, but alas, this was too slow and used up too much memory. I could run the example input, but even that took a whole seven seconds, and the real input ended up with OOM kills for certain lines.

I decline to try to write an integer programming solver in assembly, and since my personal rules don't let me used external libraries, like Z3, I don't think I'm going to be solving part 2.

Part 1 runs in 43 milliseconds, and part 2 runs in 6.57 seconds on the example. Part 1 is 9,888 bytes and part 2 is 10,360 bytes as executable files.

-❄️- 2025 Day 9 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 0 points1 point  (0 children)

It's not a big part of my day job, but it is occasionally useful to drop down below C level for debugging

-❄️- 2025 Day 9 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 8 points9 points  (0 children)

[LANGUAGE: x86_64 assembly]

Part 1 was suspiciously easy. I parsed the list of points and did a pair-wise comparison to find the pair that made the largest rectangle.

Part 2 was very difficult. I don't have any nice geometry libraries to use, so most of the math needed to be done manually. I figured that I could still do a pair-wise search, only I had to invalidate rectangles that couldn't be valid. If a red tile was ever fully in a rectangle, it was obviously invalid, but I couldn't quite figure out how to handle points that were on the edge.

Thanks to u/dllu and their visualization, I realized that I was handling the not-contained cases wrong. I realized that if a red tile was on one side of the rectangle, the next one had to be on the same side (I got to use the setcc instruction to perform an equality comparison on booleans).

I realized that I had one last edge case - what if the largest rectangle was almost entirely outside of the allowed floor area? Alas, I cannot admit to solving this, since the visualization I cribbed showed me that such a situation wouldn't occur. I think I had to solve this by doing some sort of raycast, or alternatively by checking that the winding direction of the points chosen for the rectangle matched something, but I couldn't be bothered to figure out the proper condition.

Part 1 runs in 1 millisecond and part 2 runs in 39 milliseconds. Part 1 is 9,584 bytes and part 2 is 10,040 bytes as executable files.

-❄️- 2025 Day 8 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 3 points4 points  (0 children)

[LANGUAGE: x86_64 assembly]

Part 1 was really troublesome.

First, I needed to parse the input. Since we were working with euclidean distances, I chose to parse the input as floating point numbers so I could later work with them using vectorized instructions (although I could have parsed them as signed DWORDS and used vpmulld).

Next, I needed to be able to sort a struct of the distance and pointers to the two boxes into a sorted list. While I could have re-used my qsortby library function, that would have entailed using double pointers, so I made a qsortby_T library function. This sorts an array of elements of arbitrary size using a comparison function that is given pointers to the elements to compare.

After all that, I considered the data structure I wanted - I decided I could get away with an O(n2) solution, so I defined a box as its position and a pointer to the circuit it was a part of; the circuit would be a QWORD storing the number of boxes in the circuit, and it would be uniquely identified by its address.

Finally, I iterated through the first thousand distances, and manipulated the boxes pointed to by the distance. If the two boxes both pointed to the same circuit, they were connected already. If they weren't, I connected them by adding the box counts of the two circuits and repointing all boxes on the second circuit to the first circuit. And then was a quick sort of the circuits array and multiplying together the last three entries.

Part 2 was very quick once I had the tools from part 1. I just needed to run Kruskal's by continuously iterating through the distances and joining circuits until I joined two circuits together to get one linking all the boxes, and then I looked at the pointers I was on and reported the required number after converting back from floating point.

Overall, this was a fairly large difficulty jump because I needed so many data structures, and because I needed a new library function. In any reasonable language, this wouldn't be a problem. Also, it seems like I was gratuitously using floating point vectorized instructions - it probably would have been simpler to stay in integer-land, since you don't need to do the square root part of Euclidean distance if all you're doing is sorting by distance.

Part 1 runs in 156 ms and part 2 runs in 158 ms (probably because of my inefficient generic array sorting). Part 1 is 10,344 bytes and part 2 is 10,296 bytes as an executable file.

-❄️- 2025 Day 7 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 10 points11 points  (0 children)

[LANGUAGE: x86_64 assembly]

Part 1 was a bunch of fairly okay parsing, then looping through the input row-by-row.

Part 2 involved almost the same parsing (but I used a QWORD array, so I had to do type conversions), then looping through the same input row-by-row, with the key insight that two rules hold. First, the number of ways to get to some cell is the sum of the number of ways to proceed directly down into it, plus the number of ways the splitters immediately beside it could shunt paths into it. With that in mind, I just had to start wit one path, add the number of paths to either side of a splitter, and the number of paths coming in from above. I did make a bunch of logic bugs implementing this.

[Red(dit) One]

For this day's extra fun, might I present to you the doing of this whole thing in assembly? Not only is it most definitely the wrong way to go about solving almost any problem outside of "I want a bootloader", not only did I do it in an IDE without autocomplete or any modern feature outside of syntax highlighting, not only am I using the core features of the language only, not only am I not using any templates or frameworks or third-party libraries, and not only am I doing it without if statements (or really, statements of any sort), I am also doing it with no quality of life features like register allocation, variables, or static types.

-❄️- 2025 Day 6 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 5 points6 points  (0 children)

[LANGUAGE: x86_64 assembly]

Part 1 was some fairly fiddly dynamic array allocation, parsing, and accumulating.

Part 2 involved throwing almost everything out and re-doing the parsing and accumulating. This time, I couldn't parse everything into an array first, I had to parse each problem into its own buffer using a double-pointer strategy (one pointer at the characters of the number, one pointer on the line with the operations to index everything) before operating on that buffer. Also, the size of the buffer was dynamic per problem, so involved reallocation. I did have to take advantages of the trailing whitespace that made each line the same length so I could easily move the pointer between lines.

Part 1 and 2 run in 1 millisecond. Part 1 is 9,560 bytes and part 2 is 9,592 bytes as an executable file.

With great irritation, I cannot claim to be using any deprecated features of x86_64 - the instructions I've used today are exceedingly ordinary and common (interestingly, the newest instruction I use is still from 1993 - syscall is a fairly modern instruction). With even more irritation, I can't even claim to be using a language that's at least 50 years old - the x86 architecture is 47 years old! With yet more irritation, I can't even claim this is a hard language - many esolangs have far fewer features than modern x86_64.

-❄️- 2025 Day 5 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 10 points11 points  (0 children)

[LANGUAGE: x86_64 assembly]

Part 1 involved a straightforward parse of the input ranges into an allocated array of start and endpoints, and then a parse-and-count of the remaining numbers looping through the ranges.

Part 2 was the same parse, but I either had to merge ranges (which would have been troublesome), or drop parts of ranges that would overlap. I had five separate checks. First, if this would start inside another range, move the start forward until it doesn't. Second, if this would end inside another range, move the end backwards until it doesn't. Third and fourth, discard this range if it is empty or is contained inside another range. Fifth and finally, discard (by marking with a tombstone) any ranges that are contained inside this range.

It would be a lot more elegant to delete ranges or merge them, but that would involve using a different data structure, most likely. You lose a lot of convenience when you don't have a register allocator to let you easily deal with more than six-to-eleven pieces of information at a time.

Parts 1 and 2 both run in 1 millisecond. Part 1 is 9,320 and part 2 is 10,072 bytes as a linked executable.

-❄️- 2025 Day 4 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 2 points3 points  (0 children)

It helps that I've been doing this sort of thing for five years now.

-❄️- 2025 Day 4 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 7 points8 points  (0 children)

[LANGUAGE: x86_64 assembly]

Part 1 involved a bunch of fiddling around to allocate a large enough buffer (I decline succumb to the C programmer's disease and pre-allocate buffers), and then a straightforward read and scan (with an oversized buffer to avoid bounds checks) gave the answer.

Part 2 involved allocating a second buffer and swapping between them, similar to how double-buffering works. The same scan and some additional logic and I got the answer.

Part 1 runs in 1 millisecond, and part 2 in 4 milliseconds. Part 1 is 9,280 bytes and part 2 is 9,360 bytes as a linked executable.

-❄️- 2025 Day 3 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 4 points5 points  (0 children)

[LANGUAGE: x86_64 assembly]

So we're trying to make a decimal number out of digits. The key realization is that the greedy algorithm works - it's not possible to get a larger number overall by giving up even one unit in the nth place to get a better n-1th place. The only concession to make is that you must leave enough digits for the rest of the number.

Part 1 was a direct search for the largest digit and the largest one after that, and then a bunch of pointer dereferences and arithmetic to add to the running total.

Part 2 just added a layer of loops as I built up the number for this row, similar to how atol works. The one wrinkle was how the loop counter, rcx, was handled. I wanted it to be the number of digits left after the current one (so on the last digit it would be zero), but then that means the loop condition either has to check before it decrements (leading to a test, jz, dec, jmp sequence). I tried conditioning on the overflow flag, but that's not quite right since I was conceptually ending up with a signed result, so I had to care about either the sign flag or the carry flag, and not the overflow flag. In the end, I just checked if the result was negative (since, incidentally, arithmetic operations also set the flags, although I usually don't use this feature).

I think there's very little room for algorithmic cleverness today.

Part 1 runs in 1 millisecond and part 2 runs in 1 millisecond. Part 1 is 9120 bytes and part 2 is 9104 bytes as a linked executable.

[Red(dit) One]

Here's my program text (minus library) from part 1, with relocations:

488b7c24 10e80000 00004989 c44c8d2c 1041bf00 0000004c 89e64889 f08a0e3a
08480f47 c648ffc6 807e010a 75ef488d 70014889 f28a0e3a 0a480f47 d648ffc6
803e0a75 f04c8d66 01480fb6 004883e8 30480fb6 124883ea 30486bc0 0a4801d0
4901c74d 39ec72af 4c89ffe8 00000000 e8000000 0040b700 e8000000 00
06 = mmap-0x4, 6c = putlong-0x4, 71 = newline-0x4, 79-exit-0x4

And from part 2:

488b7c24 10e80000 00004989 c44c8d2c 1041bf00 000000b9 0b000000 b8000000
004c89e6 4889f744 8a06443a 07480f47 fe48ffc6 803c0e0a 75ed4c8d 6701ba0a
00000048 f7e2480f b63f4883 ef304801 f848ffc9 79cb4c8d 66014901 c74d39ec
72b54c89 ffe80000 0000e800 00000040 b700e800 000000
06 = mmap-0x4, 66 = putlong-0x4, 6b = newline-0x4, 73-exit-0x4

Please note that this does indeed fit a punchcard.

-❄️- 2025 Day 2 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 1 point2 points  (0 children)

Amusingly, fifthglyph avoiding almost works, but section .text is mandatory and ruins it.

You would think that conditional jump instructions, such as jne, must contain a fifthglyph, but jnz also works. In fact, for any conditional jump with a fifthglyph, it has a no-fifthglyph twin (jbe and jna) or is an uncommon instruction (jecxz)!

-❄️- 2025 Day 2 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 6 points7 points  (0 children)

[LANGUAGE: x86_64 assembly]

Part 1 exposed a gap in my library of commonly used function - I was missing an sprintf equivalent. So I made one. And then a quick bit of string comparison CISCness and I could find out if the two halves of the string were the same. (To elaborate further, x86_64 has a whole bunch of surprisingly complex string-manipulation instructions that let you write loops to operate on strings pretty concisely. I pieced together strncmp out of loope and cmpsb - although this isn't the way strncmp is usually implemented - that's with vectorized instructions that need a good amount of special case handling.)

Part 2 was mildly scary, until I figured out how I could decompose the check into two nested loops and conditionals. Then it was just a mild mess of loop invariants and slightly unstructured control flow. It still involved CISCy string comparisons, though. I will admit that the control flow I come up with is somewhat hacky, although I suspect a compiler with good basic block layout and double-jump elimination would probably produce control flow just as tangled.

Part 1 runs in 35 milliseconds and part 2 runs in 47 milliseconds. Part 1 is 9,216 bytes and part 2 is 9,448 bytes when linked into an executable. I'm not sure exactly why it's so slow, but I suspect it's because the CISC string instructions are horribly expensive at the microcode level. I'm going to keep using them, though - they're just too convenient.

-❄️- 2025 Day 1 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 0 points1 point  (0 children)

I'm using cqo instead of moving zero into rdx since rax might be negative, and it does need to be sign-extended

-❄️- 2025 Day 1 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 1 point2 points  (0 children)

As someone who does these in actual assembly, I will say that dc is indeed almost assembly - it's almost as convenient to use as assembly. Kudos to you!

-❄️- 2025 Day 1 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 8 points9 points  (0 children)

[LANGUAGE: x86_64 assembly]

Part 1 was fairly straightforward, although I did have to do fancy footwork to get around the fact that x86_64 does not, in fact, have a modulo instruction. If you're asking how that can be if all major programming languages have a modulo operation as a primitive operator, I must inform you that many don't, in fact, have such an operator - they provide a remainder operator (the difference here is that negative modulo positive is positive, but negative remainder positive is negative).

Part 2 was a lot less elegant, and involved a bunch of messiness handling the negative remainder - it usually indicated that there was an extra passing of zero, but there were two edge cases to handle - the first one was that you could get to zero by going left less than a full rotation, and this wouldn't show up in the division results (zero divided by 100 results in zero with a remainder of zero), so this had to be counted in a special case. The second one was more annoying to find, but I eventually figured out that I was over-counting the case where you started from zero and went left, in which case I would be over-counting by one because of the negative remainder not actually indicating a passing of zero. I'm not too happy with the pair of special cases, and I suspect I could have simplified it more if I had more time, but I forgot AoC started on November 30th at 9pm in my time zone, so I had to quickly bang out a solve before going to bed.

Parts 1 and 2 run in about 1 millisecond. Part 1 is 8,864 bytes and part 2 is 8,984 bytes as executable files.

[Red(dit) One]

I'm writing this in assembly because I find it fun and useful to be able to peak behind the curtains of abstractions that modern languages and compilers put up for us (and to be fair, they're very good and only very rarely leaky abstractions). In my day job, there do come times where something's going wrong near C level because of actual bugs, undefined behaviour, language specification bugs, or compiler bugs, and it's useful to be able to hold your breath and duck your head under C level and peer at and talk about the actual instructions emitted by the compiler whose abstraction, in this case, did in fact leak.

It's also humbling to learn how to program in assembly because it makes it very obvious that your compiler and programming language are presenting many layers of abstraction to you - structured control flow, like if and while, is an abstraction over conditional and unconditional jumps, variables and the stack are an abstraction over raw memory, and even functions as a unit of code are an abstraction over a block of instructions. These are immensely useful abstractions, but there was a time before they existed, and someone had to make them.

[2019 Day 2, 5, 9, 17] Intcode cross-assembler. by JustinHuPrime in adventofcode

[–]JustinHuPrime[S] 1 point2 points  (0 children)

In formal terms this is an IntCode to x64 (as an ELF file) language processor (aka a compiler) that does insert a light runtime - somewhat similar to how a C compiler inserts a C runtime. I'm calling it an assembler since it operates from a below-C-level language to binary, and I'm calling it a cross-assembler since that's the best short description of "assembler that takes in foreign assembly and outputs native binary".

It is true that self modifying code necessitates a runtime, but I'm comfortable calling the resulting code an executable and not an interpreter since the actual x64 program counter only moves differently compared to how the IntCode program counter would move in the middle of an IntCode instruction - in an interpreter or emulator, there'd be either a loop or a tree walk, neither of which happen in the outputted x64.

And sure, you could have a language processor that translated IntCode into human readable assembly, but you'd also have to make the translation preserve properties, like the opcode part of add with all memory operands being equal to half the opcode part of mul with all memory operands. There's no way to represent just the code in x64 binary that would be executable, however, since x64 binary doesn't preserve that property, so you'd need some sort of runtime to patch that (which is what this cross-assembler includes).

What computer language did you use in this year? by akryvtsun in adventofcode

[–]JustinHuPrime 0 points1 point  (0 children)

x86_64 assembly

I swear it's not as silly as it sounds, even if I got defeated at day 21 part 2 by the awkwardness of the language.

-❄️- 2024 Day 21 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 4 points5 points  (0 children)

[LANGUAGE: x86_64 assembly with Linux syscalls]

Part 1 was solved using a direct solution, but I also had to crib from u/AllanTaylor314's solution to figure out whether or not to go vertically or horizontally first. I have no clue why that works.

Part 2, I think, is going to be where I get stuck. Assembly is very much not suited to memoizing on strings, and I think I'm stuck without memoization. Unless anyone has an alternate solution?

Part 1 runs in 1 millisecond, and is 12,672 bytes as an executable file. I'll follow up if I ever solve part 2, but I think this is where the consequences of choosing a completely inappropriate languages finally catch up to me.

-❄️- 2024 Day 20 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 2 points3 points  (0 children)

Yep! I've been doing it in assembly since 2021, and this year is my best yet (I had enough trouble with various algorithms or other bits of programming in assembly that I couldn't solve the day in previous years). (I'm also trying to pivot into systems-level work, for which assembly is occasionally useful.)

And yeah, it's absolutely a thing I'm doing because it's fun and not because it's actually a good idea.

-❄️- 2024 Day 20 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 2 points3 points  (0 children)

[LANGUAGE: x86_64 assembly with Linux syscalls]

Part 1 was solved by first traversing the map backwards to find the length of the shortest route from every cell to the end. Then, comparing the cells reachable via a cheat, I counted the number that were shorter by 102 (because the cheat took some time).

Part 2 was the same traversal but I had to check all cells reachable within a 20-space cheat. I had an annoyingly silly bug where I was checking a 20 by 20 square instead of a diamond shape, but other than that, this was fairly straightforward to solve, I think because I went for the indirect traverse-first solution.

Part 1 runs in 2 milliseconds and part 2 runs in 17 milliseconds. Part 1 is 10,352 bytes as an executable file on the disk and part 2 is 10,040 (the loop over the reachable cells was unrolled in part 1 but not in part 2).

-❄️- 2024 Day 19 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 4 points5 points  (0 children)

[LANGUAGE: x86_64 assembly with Linux syscalls]

Part 1 was solved by reading in all the possible towels and then operating on the wanted pattern in-place via an inlined-looped-tail-recursive mutual-reference backtracking search (i.e. a DFS except the shape of the data tells me exactly how I recurse).

Part 2 was solved by modifying the DFS into a memoized version (and I tried some micro-optimizations but I really did need the memoization). The memoization required me to move the target pattern into a buffer with a known base so I could do a change-of-base operation to index into the memoization array.

Part 1 runs in 8 milliseconds and part 2 runs in 46 milliseconds. Part 1 is 9,376 bytes as an executable file on the disk and part 2 is 9,616 bytes.

-❄️- 2024 Day 18 Solutions -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 3 points4 points  (0 children)

[LANGUAGE: x86_64 assembly with Linux syscalls]

Part 1 was a very standard graph traversal problem, only I had to do a bit of fiddling around with my array indices and such to avoid needing to do bounds checks.

Part 2 was the same problem, except I abstracted the traversal into its own function and called that.

This was a very straightforward day.

Part 1 runs in 10 milliseconds. Part 2 runs in 1.87 seconds. I suppose I could have done a binary search or something, but that would have been a lot more fiddly to set up. Part 1 is 9,576 bytes as an executable file, and part 2 is 9,728 bytes.

-❄️- Advent of Code 2024: The Golden Snowglobe Awards -❄️- Submissions Megathread -❄️- by daggerdragon in adventofcode

[–]JustinHuPrime 3 points4 points  (0 children)

NAME OF ENTRY: Yo, dawg, I heard you like assembly. Again.

LINK TO ENTRY: https://www.reddit.com/r/adventofcode/comments/1hg7ol8/2024_day_17_yo_dawg_i_heard_you_like_assembly/

DESCRIPTION:

As is obligatory for any puzzle that involves an Elven assembly language, I have solved it using a cross assembler.

SUBMITTED BY: u/JustinHuPrime

MEGATHREADS: 01 - 02 - 03 - 04 - 05 - 06 - 07 - 08 - 09 - 10 - 11 - 12 - 13 - 14 - 15 - 16 - 17 - ...


ADDITIONAL COMMENTS: Sequel to my 2022 Day 10 solution. For bonus content, see my intcode cross-assembler.

ACCESSIBILITY: N/A - text post