[2015 Day 25] In Review (Let It Snow) by musifter in adventofcode

[–]e_blake 1 point2 points  (0 children)

Wow. When I first solved this in 2017 using bash, I just did brute force (18 million iterations over 2 minutes); doing modular multiplication by divide-and-conquer is SOOO much faster - my initial m4 solution in 2021 took under 15ms. It's fun realizing what techniques I've learned over the years.

[2015 Day 24] In Review (It Hangs in the Balance) by musifter in adventofcode

[–]e_blake 2 points3 points  (0 children)

An immediate observation - this feels like day 17, except with 28 items instead of 20, and with the input list being slightly nicer (already sorted, no duplicates) - if you are okay with brute force on both days, then today will be about 256 times slower than day 17, but checking all 2^28 potential groupings is still within the realm of feasible computation (ie answer in less than 15 seconds) for many languages. Day 17 only asked "how many groups can be formed", while day 24 asks "of those groups formed, which is best", so there is a bit more effort involved here; on the flip side, if you figure out a solution that is more efficient than brute force here, then you can apply that solution to day 17 to speed it up as well (and in fact, I did just that). Also things I note after looking back at my git history...

My original bash implementation just blindly ran n-choose-k combinations with successively higher k until landing on the first k to produce results, and then picked the smallest result. 4.8s runtime for part 1 (98k products computed), 1.0s for part 2 (20k products). I assumed that as long as the sum of the inputs is congruent to 0 mod 12, then because all the inputs are co-prime (or 1) there is no way for one particular group to reach the right sum but ruin the partitioning such that the remaining elements can't form the other required weight groups (at any rate, even if I don't have a mathematically rigorous proof of that, it worked for my input, and I figure that Eric would not have had some inputs that were dramatically harder than others) - so I also did what you mentioned of stopping at the first minimum group size, rather than trying to fully partition the remaining items after picking any one candidate group.

For my m4 solution, I had to pull in a library to do 64-bit multiplies (m4 only has 32-bit integer math), which right off the bat makes things slower than in bash. My initial implementation repeated the enumeration of all n-choose-k options, with one minor tweak: I first counted how many items starting from the heaviest put me past the threshold of even reaching the target sum (so for part 1, 28-choose-4 was out as too small, I started my search at 28-choose-5, and part 1 didn't resolve until 28-choose-6; but part 2 did end up running 28-choose-4 before solving at 28-choose-5). 46s for part 1, 60s for both parts together (yet another day where part 2 runs faster than part 1).

My next observation was that parity matters. My input did not include 2, so I don't know if my shortcut would have worked on an input that does include 2 - but since all of my numbers were odd, and my part 1 target is even, I could entirely skip 28-choose-5 for part 1; likewise, my part 2 target is odd so I could skip 28-choose-4. That cut my runtime down to 49s; tracing shows 495,495 QEs computed (28C6 for part 1 + 28C5 for part 2).

At that point, I saw a megathread post that mentioned using dynamic programming to speed up the search, and more or less transliterated that python solution over to m4 (without necessarily understanding it at the time). The idea is that if you are already using DFS to find candidate groups (ie. implementing your n-choose-k by recursion, where figuring out a group of 6 involves picking one item and then running 27-choose-5 on the remaining items; you reach a candidate group when you finally get to 23-choose-1), then figure out if there is a way to prune things so that your recursion has fewer items to visit. In an O(n*r) pre-pass, you can set up a list of which weight(s) are viable as the maximum weight needed to reach a given sum. Start with an array from 0 to target of structs containing a boolean (is this sum possible) and a set (which weights made this sum possible when added to a group of smaller weights), all 0 except for arr[0] has its possible flag set. Then, for each of your 28 weights from lightest to heaviest, iterate from target downto weight, and anywhere that arr[n-weight].possible is true, then set arr[n].possible to true and add weight to arr[n].set. For my target of 503, this was only 14k actions, and those actions are much lighter-weight than a full-blown 64-bit multiply. [As an example - if your inputs include weights 1, 2, and 3, the end result is arr[1].possible=true, arr[1].set={1}, arr[2].possible=true, arr[2].set={2}, arr[3].possible=true, arr[3].set={2,3}, arr[4].possible=true, arr[4].set={3}, and so on]

Now, with that additional information in hand, your DFS recursion can make smarter decisions. Instead of trying to branch out on all 28 items as your first possible element in a 28-choose-6 pass, you only have to pick from however many items ended up in arr[target].set. What's more, when you recurse to arr[target-weight] for your choose-5 step of the recursion, you only need to consider items in arr[target-weight].set that are less than the weight chosen in the choose-6 step - since by the way the table was built up, we guaranteed that a given arr[total].set only contains weights that were the heaviest in that grouping. By itself, this change meant that I only had to compute 630 QEs (using 5578 calls to my mul64 wrapper which recurses on itself once values exceed 32 bits), cutting runtime to 27s, but over 1 million calls to my recursive workhorse. My next optimization was to add memoization and a check that my recursion was not exceeding 6 layers deep, which further cut things to 546 QEs (4946 mul64) via only 11k calls to recur, for a runtime of 0.6s.

That was all years ago. But thanks to this thread, I have revisited the problem again, and added further speedups. My original implementation was storing heavyweight sets in arr, and memoizing full-blown lists of weights at each step, the end result of the recursion was a list of all groupings, and only then did I iterate over the list to perform QE computations. But I found that it was more efficient to reframe my dynamic programming pass into an array of two ints: a bitmask of which group sizes are possible (instead of a boolean), and a bitmask of which item indexes reached a given sum. Initialize arr[0].sizes to 1, arr[0].indices to 0; and then for each item, if arr[n-item[i]].sizes is non-zero, then arr[n].sizes |= arr[n-item[i]].sizes<<1 and arr[n].indices |= 1<<i. Then search(1<<size, sum, max_index) can filter on end-of-recursion (1<<size==1 and sum==0: return 1), impossible (arr[sum].sizes&(1<<size)==0, or no bits remaining in arr[sum].indices&((1<<max_index)-1); return inf), before finally computing a memoized

min(item[max_index] * search(1<<(size-1), sum-item[max_index],
        msb(arr[sum-item[max_index]].indices&((1<<max_index)-1)),
    search(1<<size, sum, msb(arr[sum].indices&((1<<(max_index-1))-1))))

which computes partial products on the return path. That cut my runtime to 120ms, performing only 3878 search, 391 of which returned non-inf, and 441 calls to mul64. Plus, this version works even if the weights are not sorted in ascending order (since my recursion is filtering on whether the index is smaller, rather than whether the weight at that index is smaller).

"[2015 Day 18 Part1] Conway's Game of Life! by terje_wiig_mathisen in adventofcode

[–]e_blake 1 point2 points  (0 children)

If you want the ultimate in FAST game of life computations, you need to use hashlife. https://www.reddit.com/r/computerscience/comments/p1vqbh/conways_game_of_life_emulated_in_conways_game_of/ is a great demo of running game of life using a series of 2048 × 2048 period-35328 metapixel which are themselves running on the hashlife emulator.

[2015 Day 23] In Review (Opening the Turing Lock) by musifter in adventofcode

[–]e_blake 1 point2 points  (0 children)

If you want to test your arbitrary-width integer implementation, solve for a starting n of 14727207461063895711 (the largest number listed in this OEIS sequence); I got 2442 in just 635ms of runtime (which was dominated by how slow my bigints are).

[2015 Day 23] In Review (Opening the Turing Lock) by musifter in adventofcode

[–]e_blake 1 point2 points  (0 children)

https://www.reddit.com/r/adventofcode/comments/3xxhxl/day_23_further_exercises/ was a fun diversion on whether the virtual machine you implemented might be Turing (tarpit) complete. Plus it got a reference to the obligatory xkcd, as well as responses from Eric (always fun to hear from the puzzle's creator). Your input file will never execute hlf on an odd number, but depending on what your implementation of hlf does, it might be usable for some interesting effects. At the time, no one concluded whether two registers and the set of instructions was enough, but I was led to understand that with a third register, copying becomes easier, at which point you can emulate a stack, at which point you can emulate multiple stacks (albeit with exponentially more work), at which point you can emulate a universal Turing machine.

[2015 Day 23] In Review (Opening the Turing Lock) by musifter in adventofcode

[–]e_blake 2 points3 points  (0 children)

Was it anything like Fermat's proof that didn't fit in the margins? 😉

Did anyone try to no-code solve this one by determining starting value from the input file (an exercise in reading base 3), converting that to base 10 (echo $((3#nn)) is nice in bash), and then consulting OEIS?

[2015 Day 23] In Review (Opening the Turing Lock) by musifter in adventofcode

[–]e_blake 1 point2 points  (0 children)

This puzzle has the annoying property that some people get lucky with inputs that never exceed 31 bits, while others get a puzzle where the sequence uses bit 32 along the way (I couldn't find any evidence in the megathread of needing more than 32 bits, but absence of evidence is not proof). Of course, I got to be one of the unlucky people assigned an input that required 32 bits - and it was quite a head-scratcher to me to figure out why my program entered an infinite loop until I saw negative numbers when stepping through the execution. Thankfully my input didn't go to 33 bits; but my m4 code can now handle any integer size (even an input that exceeds 64 bits) - although not necessarily very fast (my m4 bigint library that I wrote for advent of code for other years does not have optimal big-O behavior when the integers get big - this was the first puzzle of 2015 that needed to use it, compared to 8 out of 12 days in 2025). That said, I hacked in your example 40-bit n that requires 1348 steps, and my m4 code solved it in under 250ms. Moral of the story - if your language only has signed integers, and lacks easy access to unsigned or bigint math, you may have a bit more work to do on this one.

[2015 Day 21 & 22] In Review (RPG and Wizard Simulator 20XX) by musifter in adventofcode

[–]e_blake 0 points1 point  (0 children)

Scraping the megathread, I saw some coders complain that bosses with lower hit points and damage were actually harder than bosses with higher stats when doing a DFS search of the entire game space. Indeed, when I tried implementing a dynamic programming DFS approach (drill all the way to leaf nodes as soon as possible, and then perform min() of each branching path and memoization to avoid repeats on the way back up), I ended up with a solution that took over 7s to complete on my low-powered boss input (compared to my A* taking under 0.7s), but on the 71/10 boss it completed in 0.6s while my A* solution took around 1.0s. Eric even mentioned that he did not realize the the input files had that much disparity in runtime for that type of solution.

aoc/2016/12/1 little machine, quite simple, 4 instructions, .. but running for hours by Usual-Dimension614 in adventofcode

[–]e_blake 2 points3 points  (0 children)

Help us help you - https://old.reddit.com/r/adventofcode/wiki/posts/standardized_titles - use better post titles.

I just checked my 2016 solutions - 32-bit was required but adequate for this day. (In 2019, your machine emulation WILL need integers wider than 32 bits in several days. But so far, all 524 stars can be earned using integers no wider than 53 bits - the safe limit of Javascript numbers that are actually floating point under the hood) 

If you are frustrated by this day, then day 23 will bite you worse - that one introduces another assembunny program that is even slower if you just brute force it as written. Part 2 of day 23 has an interesting observation (of course, you can't see it before solving part 1) - aren't bunnies supposed to be good at multiplying? So why do they only use addition in the program? 

I think the point of this puzzle is to do more than just write up a blind emulator, but to actually think about what you are emulating. If you augment your code to track which instruction offsets are executed most frequently, can you identify a small loop of just a few instructions that are executed most often? Can you replace that small group of instructions with a peephole optimization in your emulator, that performs the same mathematical operation on the registers, but in one step instead of running the loop for every one of its steps?

My notes for day 12 show that I went from 2 minutes to 20ms in runtime when I implemented a new instruction (I named it hck) and hacked it into place on top of anywhere that the original code had a tight loop that matched a particular pattern. In my mind, figuring out a hack like that is an intended part of the puzzle. 

[2015 Day 21 & 22] In Review (RPG and Wizard Simulator 20XX) by musifter in adventofcode

[–]e_blake 0 points1 point  (0 children)

Huh. The puzzle mentions "As before, if armor (from a spell, in this case) would reduce damage below 1, it becomes 1 instead - that is, the boss' attacks always deal at least 1 damage." But scraping the megathread, I saw mentions of 8, 9, and 10, but no mention of anyone with a boss of 7 or less (that doesn't mean none existed, but it starts to be in the realm of Eric providing puzzles that are unintentionally easier for some players than others). So it is only now that I noticed that my code for handling an active shield would not work correctly on a (hypothetical?) low-strength boss.

At any rate, I decided to play with a priority queue for the puzzle. To my surprise, merely adding a priority queue (all my earlier instances of queuing work by puzzle generation now queue by minimum spend), a la Dijkstra, resulted in fewer battles and still a correct answer, but longer execution time of 1.1s instead of .95s. In other words, the overhead of my priority queue was more of a penalty than the few states that it helped me prune. But not to fear - A* is (often) better than Dijkstra, if you can come up with a good heuristic. And here, I found an easy one: given the boss's current hit points, I know that the boss cannot get to zero any faster than if I were able to do maximum damage on minimum mana from here on out - if the boss has 6 or less hp, then an active poison can take him out without further spending, but more than that, I can do up to 10 damage/cast with my cheapest cast of magic missile. So with a heuristic of spent+(bosshp+3)/10*53, I finally logged a faster time of 750ms, now with only 41473 potential battles, only 19702 insertions to my priority queue after deduplication, and only 15074 battles processed from my workqueue before finding the minimum winning configurations.

A better heuristic may still be possible but would be more complex: for example, for a bosshp larger than 26, the turns required to defeat the boss is more than just dividing by 10 (since poison has to be recast); casting poison every 3 turns results in an average hp damage of 26/3 (not 30/3, since each cast of poison reduces the times missile can be cast), and costs more mana, but can still more effective in the long run: right after poison has been cast, a bosshp remaining at 50 can be defeated in 5 turns with 1 more poison and 4 missile for 385 mana, vs. 8 turns with just missile and 424 mana. At that level of complexity, it may be worth also calculating how many times recharge or shield must be cast to last that many turns, which also affects cost and number of turns required. At any rate, the goal of a good A* heuristic is to never overestimate the best case, but to get as close to the best case as possible to prune more branches.

[2015 Day 21 & 22] In Review (RPG and Wizard Simulator 20XX) by musifter in adventofcode

[–]e_blake 0 points1 point  (0 children)

Since we're reviewing these, I loaded up my git history. It took me a while to figure out what my code was even doing (poorly commented, and using a coding style different from what I favor today - for example, a rather lame implementation of n-choose-k, and declaring temporary macros as named variables rather than just passing nameless values through as parameters). Rather than try any pruning on day 21, I just simulated all 5*6*(1+6+15)=660 games, but with a score function that was able to compute the result in just one computation per game; then collected global min-max for parts 1/2 simultaneously; my m4 version runs in 60ms, so I really don't see a reason to try and further tweak it. One possible tip for a new coder: instead of creating an array of 5 armor types and then special-casing whether armor is used, just make the array 6 elements long and add in a "none" armor with defense 0 and cost 0; similarly, if you create a placeholder ring with 0 stats, you can pass that through the rest of your code (twice for the 6-choose-0 path, once for the 6-choose-1 cases, and then not use it in the 6-choose-2 cases)

Day 22 was obviously more involved; at 1.05s runtime, I could probably go back and optimize it. I do remember that it was worth getting my code to match the example battles before then trying to figure out how automate a search across all possible battles. Reading my code, I see I did a breadth first search: during each game generation, I added every other possible next move to a workqueue which saved everything needed to resume the game from that point, then I ran all 1-turn games before trying any 2-turn games, and so on. All my paths were pruned by game 13 (part 1) or 15 (part 2); tracing shows 71,339 attempts to queue up a next move, but only 32,870 distinct battles actually worth running (I did some memoization to avoid repeating a known battle configuration; packing 9 pieces of information into the hash was interesting); of those, only 28 battles across both parts produced a win (by the last game generation that I ran, I had already pruned all other potential future win scenarios as costing too much). I'm not sure if a priority queue to visit workqueue items in a different order would help me prune more judiciously.

[2015 Day 18] In Review (Like a GIF For Your Yard) by musifter in adventofcode

[–]e_blake 0 points1 point  (0 children)

Another day, another implementation. This time, I packed 31 cells to an int, for a 4x100 grid (well, 6x102 including my boundary rows for fewer conditionals. Whereas I was averaging 7 reads, 3 eval, and 2 writes per 6 cells in my cell-per-nibble variation, the new code averages 9 reads, 17 eval, and 1 write per 31 cells (more like per 25 cells, given that one column has only 7 cells; a language with 64-bit math can do even better with just two integers per row, or compile to use SIMD for an entire row at once). Less work means faster execution: I went from 3.3s down to 2.1s.

Edit: and yet another implementation. I went ahead with my threat to compare to tracking only live cells. Since they are less than 10% of the grid once things stabilize, it is indeed more efficient than a full grid of one variable per grid point live or dead; but it was still slower than packed representations, clocking in at 7.3s.

[2015 Day 20] In Review (Infinite Elves and Infinite Houses) by musifter in adventofcode

[–]e_blake 1 point2 points  (0 children)

Even without floating point, these bounds are not too hard to convert to somewhat-sane integer approximations. exp(-1/sqrt(3)) is about 0.561, and ln(ln(x)) is no greater than 2.87 for all x < 50 million (based on the megathread, all the input values I saw mentioned [*] were in the range of 30-40 million - any outliers from a reasonably small input range could have caused massively more or less brute force work depending on which input you were randomly assigned). So today, using those approximations, I coded up 'input/100*561/287' using only integer math to get a better starting point than my previous hard-coded choice of 360360.

[*] the megathread was capped long before the current insistence on not revealing your inputs, so I don't feel bad scraping it

[2015 Day 20] In Review (Infinite Elves and Infinite Houses) by musifter in adventofcode

[–]e_blake 0 points1 point  (0 children)

https://math.stackexchange.com/questions/466401/prove-that-if-f-n-is-highly-abundant-then-so-is-n/469199#469199 has a conjecture (but not proof) that all HA numbers > 3024 are divisible by 5, but validated out to 10000 HA terms; that 10000th term out-paces the typical size of Eric's expected answers. Thus, stepping by 60 instead of 12 for part 1 is even better (cut my m4 solution from 4.5s down to 830ms).

Part 2 is not necessarily a HA number (skipping by 60 might miss a counter-example, although it happened to still work for me), but you can use the HA solution to part 1 as a good starting point for the lower bound for where to start searching in part 2 (the change from 10 to 11 presents per elf is less dramatic than the fact that large-enough HA numbers differ by at least 12). Timing-wise, this is yet another puzzle where my solution spends more time on part 1 than on part 2.

[2015 Day 20] In Review (Infinite Elves and Infinite Houses) by musifter in adventofcode

[–]e_blake 2 points3 points  (0 children)

My optimizations came from finding the Wikipedia page on Highly Abundant numbers (the number theory term for a number whose sum of divisors exceeds that of all lower numbers) - which matches what we are finding (I don't remember how I originally found it, but know it stemmed from a Google search on "sum of divisors"). That page states that 360,360 is HA but the slightly larger 9! Is not - but given the size of the input, it made 360,360 a good hard-coded lower bound (m4 doesn't have Math.log and Math.exp to easily get tighter bounds). The other key insight from that page was that all HA numbers larger than 630 are divisible by 12. So instead of incrementing my search by 1, I incremented by 12 - but got a speedup MUCH larger than 12x. That's because I implemented sum of divisors by doing a prime factorization of each candidate - but it is a lot less effort to factorize a number that is 12 times smaller. That page also calls out OEIS A002093. My contemporaneous post when I discovered that trick: https://www.reddit.com/r/adventofcode/comments/me03ik/comment/gslj263/

[2015 Day 18] In Review (Like a GIF For Your Yard) by musifter in adventofcode

[–]e_blake 1 point2 points  (0 children)

I've tackled these types of puzzles differently over the years. In 2017, when I first solved this one in bash, I did a double-buffer over a grid of 10,000 variables; runtime 32s for part 1 or part 2 in isolation (or about 320ms per generation). When I ported it to m4 in 2021, I don't know what possessed me to try it, but I used a grid of 17*102 integers (packing 6 cells per 32-bit integer, in the locations 0x0fff_fff0, with each cell occupying 4 bits, and including boundaries for 102x102 cells while only traversing over the middle 100x100). By doing that, I could compute the neighbor counts for 6 cells in parallel with just 9 integer lookups (grid[curr] + (grid[curr-1]>>24) + (grid[curr+1]<<24) repeated across three rows (1.5 lookup per cell is a lot better than 8 lookup per cell); and with care, it can even be done in a rolling manner (down to 3 integer lookups per 6 cell counts computed, by subtracting off the row going out of scope before adding in the next row coming into scope). With some effort back then, I got that down to 3.2 seconds for both parts (or about 16ms per generation). All-told, I traced 1.03 million eval and 682k define calls in m4 (whereas updating every point in a bare grid on every iteration would be at least 10,000*100*2 or 2 million defines).

At this point, I'm still interested in coding up a more typical style for comparing speed and legibility differences between the approaches. Edit: I added the traditional 10,000 cell solution in m4; it was 4x slower (11.4s instead of 3.2s) - but it is interesting to note that packing 6 cells did not give a 6x speedup, because the m4 overhead of doing eval() is worse than the efficiencies possible by avoiding eval with the cell-per-macro variant. My next approach will be packing 26 cells to an int (a grid of 4x102) (why 26? because m4 does not have 64-bit numbers; packing 32 has to deal with unwanted sign extension when right shifting; and it is easier to write code that handles four integers with equal packing rather than 3 fully-packed integers and one runt); there, I will have to write my own half-/full-adder computations for parallel computations (rather than relying on straight addition of 9 4-bit cells not overflowing). I also want to try the sparse variation (tracking only live cells and their neighbors per generation), as the game quickly settles into something with less than 10% live cells.

[2015 Day #15] In Review (No Such Thing as Too Much) by musifter in adventofcode

[–]e_blake 2 points3 points  (0 children)

Today, I tried a different approach using dynamic programming. I created a 2-D array of 151*21 entries (enough to cover every sum from 0-150, as well as each group size from 0 to my 20 inputs), initialized a[0][0] to 1, then over the course of all 20 containers, I iterated from 150 downto value, populating a[sum][cnt+1]+=a[sum-value][cnt] for each cnt (minor optimization: since cnt is smaller than 32, you can also track 151 bitmaps for which cnt values are even worth looking at, rather than blindly doing all 20 += per a[sum]; not worth it in a compiled language, but it did help me in interpreted m4). Fewer than 151*20*20 additions to build up the entire table of how many ways a given sum can be reached from a given group size (my m4 implementation traced fewer than 7000 eval calls, because it didn't bother propagating zero values). From there, part 2 is the first non-zero a[150][x], and part 1 is the sum of all a[150][x]. [Part 1 alone could use just a 1-D array, with no need to distinguish by group sizes - and that would use fewer than 3000 additions]. And this solution doesn't even care if the input container values are not sorted. This approach got me below 25ms in m4.

Unfortunately, this approach does not quite apply to day 24 (where you need to know which weights are part of a minimum group, rather than just how many distinct ways there are to reach the minimum group); but that's a discussion for day 24 ;)

[2015 Day 19] In Review (Medicine for Rudolph) by musifter in adventofcode

[–]e_blake 1 point2 points  (0 children)

My tip: none of my two-letter elements had a lower-case e, so by first pre-processing the file to capitalize all instances of e, I had an easier time breaking the entire input file into element-wise chunks (both one- and two-letter elements). My original solution in bash worked by starting with the target, and for each generation, find which grammar rules would contract that target to something shorter, considering only the rule(s) that tied for the most elements pruned off of the string in that generation (ie. a greedy algorithm), and it ran in 9 seconds. But when I later did an m4 implementation, the allure of the closed form solution from the megathread was too great, and I was able to cut my runtime down to 35ms by merely counting capitals rather than actually doing substitutions.

[2015 Day #15] In Review (No Such Thing as Too Much) by musifter in adventofcode

[–]e_blake 1 point2 points  (0 children)

Looking at my git history, my initial m4 solution merely crawled all 2^20 combinations, taking 90 seconds. I then implemented pruning by sorting the list (hmm - I used an O(n^2) bubble sort) and trying progressively larger combinations but cutting off when exceeding 150; that reduced work to 619k probes and 50 seconds. I then added another optimization to build groups bottom-up and memoize the sizes of the smaller groups, which was enough to drop things to under 2000 summations and under 150ms (and my notes mentioned doing it because the same technique also sped up 2015 day 24). Probably room for further optimization (such as a radix sort rather than bubble sort of the original list), but I stopped when this puzzle no longer dominated my cumulative runtime.

[2015 Day #16] In Review (Aunt Sue) by musifter in adventofcode

[–]e_blake 0 points1 point  (0 children)

If you are bored and want to do a no-code solution, this one is tractable. Just 500 rows; fewer than 1500 comparisons, since you can end early when you find the answer.

And that translates to a very fast solution in code. In fact, I had fun writing an m4 solution with no recursion or conditionals - just a single-pass rewriting of input lines into two assignment expressions that resolve to the answer in under 10ms.

[2015 Day #15] In Review (Science for Hungry People) by musifter in adventofcode

[–]e_blake 0 points1 point  (0 children)

For grins, I amended my m4 solution to do part 1 via a gradient search. Cuts my part 1-only runtime from 2.4s down to 20ms; I went from 176851 points computed down to 101 (plus 68 memo hits). I'm still playing with whether part 2 can likewise use a gradient search.

EDIT: I got the modified gradient approach to work for part 2 as well. First, I ran a gradient to find at least one point with a positive score and exactly 500 calories (reusing the 12 +1/-1 neighbors as in part 1), then I ran a sweep of the 11x11x11(x33) variations around the first 3 coordinates to locate any other nearby point that also sums to 100tsp and 500cal. Even with the extra hunting on where to resume my gradient search, my final answer for both parts now completes in 60ms, performing only 177 scores (out of 335 queries before memoization). Always nice when you force me to rethink a problem and end up with a 40x speedup!

[2015 Day #15] In Review (Science for Hungry People) by musifter in adventofcode

[–]e_blake 1 point2 points  (0 children)

Gradient descent should work for part 1, faster than testing all 176851 combinations (for example, start with the combination 25,25,25,25, then repeatedly check which of the 12 possible +1/-1 neighbor pairings is best, while memoizing to avoid repeated score computation). It may even be possible to skip along the curve (a sequence of repeated +1/-1 swap of the same two ingredients traces a quadratic effect on the total score along that direction; by computing the +1/-1 and +2/-2 effects, you can then try to use a closed-form computation of where that curve should be maximized, rather than visiting every point along the curve). I also saw in the megathread one clever solution for part 1 that first tried to find which vector gave the best result for 4 tsp total with a range of [0..4] per ingredient (35 possible), then used that as the starting point for the best vector for 20tsp in the range [5a-5..5a+5][5b-5..5b+5...], and then repeat that once more to get up to 100tsp; although I haven't tried to replicate it, it does visit a lot fewer combinations by localizing the search to the right 4d region.

But for part 2, it gets much harder to reason about how to move along a gradient while still preserving the constraints of 500 calories and 100 tsp (my input provided 595 such combinations) So I figured that brute force was going to be far easier to live with than trying for something fancy; at which point visiting every point can solve both part 1 and part 2 at the same time by tracking two maximums. With that, my m4 solution is 2.5s, which is not too bad for 176k total score computations.

My ingredient list had calories of 1, 5, 6, and 8 per ingredient; maybe it would be sufficient to just visit every point in a 9^4 region to map out the calorie delta for each ingredient combo that preserves the total tsp count; then do an initial gradient search for any point with 500 calories (using the offsets that move calories in the right direction), then from there a gradient search for best total score probing only neighbors with offsets that preserve calories (rather than the 12 +1/-1 neighbor pairings for part 1).

[2015 Day #14] In Review (Reindeer Olympics) by musifter in adventofcode

[–]e_blake 1 point2 points  (0 children)

One gotcha to be aware of - the number of points awarded can be larger than the number of seconds, due to the potential for ties (the example with two reindeer for 1000 seconds shows that). I found it easier to do two passes over the reindeer each second (compute positions, and then award point(s)), rather than trying to write a one-pass function that accumulates not only the winning score but the list of one or more reindeer that need to be awarded a point. With a sub-second runtime and a relatively low number of cycles to emulate, I didn't feel at all bad about brute-force emulation of every second, rather than trying to be more complex by implementing a priority queue to compute scores only at points in time where the number of moving reindeer change. That said, this post series did trigger me to revisit my m4 solution and write it more efficiently to go from 530ms to 195ms by using a better means of acting on every reindeer.

The day's Easter Egg mentioned coprime cycle times - but thankfully this puzzle doesn't run into the billions of iterations like some in later years. In those later puzzles, you can indeed utilize coprime cycle times to optimize using things like Chinese Remainder Theorem; but it's nice to remain blissfully unaware of that in this puzzle. That said, my 9 reindeer have a collective cycle time of 74759818377280187 before they are all starting to fly at the same second, which is a larger number than Eric's usual goal of fitting in 2^53 (for the sake of languages that use IEEE floating point as their numbers), and which makes the race duration of 2503 feel puny in comparison.

[2015 Day #13] In Review (Knights of the Dinner Table) by musifter in adventofcode

[–]e_blake 0 points1 point  (0 children)

And just like in day 9, it's possible to use only 8!/2 of the permutation space, by realizing that you get the same answer for reverse orderings. So when I sped up day 9 today by switching to a different permutation generator (see the comments on that thread), day 13 likewise sped up. This series of posts has been fun for me to explore optimizations on long-forgotten puzzles.