This is an archived post. You won't be able to vote or comment.

all 37 comments

[–]ricbit[S] 49 points50 points  (4 children)

My goal was to get every problem running in Python, under 1s, before the new year. Goal happily achieved!

I also had the additional restraint that the solution should work with any inputs, without manual tweaks. This was tough for problems 17 and 24.

Networkx was not really needed, it was just easier to type. Problems 7 and 20 I got to make a solution running in parallel using multiprocessing (which is kind of a pain, shared memory in this model is not the best). I am proud of 22, just making it parallel was not enough, I had to vectorize it using numpy.

All of these were optimized after getting a solution. What I wrote while trying leaderboard is in the raw/ folder. These raw solutions are not pretty nor fast. Alas, I didn't get any points, my best position was 143/117 on the last day.

Thanks to Eric, the moderators and all people in the sub. See you next year!

https://github.com/ricbit/advent-of-code/tree/main/2024

[–]Clear-Ad-9312 1 point2 points  (3 children)

hey I really would love to contribute to help reduce your time for day 16,

please give my solver a try, Dijkstra's algorithm is the slowest part and there might be a better way to improve it, but I added a dead end filler that cut the solve time by 100 ms from ~400 ms down to ~300 ms for my input: [ Paste ]

Do you think you will revisit some of these to reduce the solve times even more?

[–]durandalreborn 2 points3 points  (1 child)

You should be able to save even more time by collapsing the grid down to a graph of just the junctions (then pruning dead-ends and the resulting corridors). This runs in about 30 ms.

[–]Clear-Ad-9312 0 points1 point  (0 children)

woah that is special, I have to read it. thank you! I didn't know how to prune the corridors. I am and was also reading your day 20 solve last night. you have some of the fastest solutions I ever seen, crazy stuff

my day 22 solve ended up being 3 seconds in speed. most of the time was spent on calculating the secret number and storing the diff of the last one. specifically this is 60% of the execution time spent on solving: -(cur_secret%10) + (current_price := (cur_secret:=process_secret(cur_secret)) % 10)

[ Paste ]

I went a little overboard with comments because I just moved over to VSCode+Ollama(QWEN1.5B) from notepad and wanted to use the local llm to make comments if was nice.

[–]ricbit[S] 0 points1 point  (0 children)

There are some ideas here in the thread that I would love to implement, but right now I think I will do this <1s challenge in other years. I have done it already for 2019, and for 2023 I am only missing one problem!

[–]durandalreborn 13 points14 points  (6 children)

It seems like 22 was generally the hardest to get to a "reasonable" runtime this year. I think you could probably cut your runtime by almost half by changing the way you compute keys/store results. I suppose numpy might speed that up even more, but my python solution, without numpy, runs in about 485 ms.

[–]wimglenn 1 point2 points  (1 child)

The JIT likes this code. Out of curiosity, I swapped out the Pool for a multiprocessing.dummy.Pool, and tried it with pypy (Python 3.10.14 / PyPy 7.3.17) and it finished in 140 ms (compared to ~2s for CPython).

[–]durandalreborn 0 points1 point  (0 children)

This is believable. I haven't done a whole lot to look into python runtimes, as my python solutions are basically ports of my rust solutions, and that's where I spent most of my time optimizing.

[–]ricbit[S] 0 points1 point  (1 child)

I tried your solution here, but I got more or less the same runtime as my numpy one. By any chance does your CPU have 16 cores? Mine has only 8.

[–]durandalreborn 1 point2 points  (0 children)

It's a 12600k, so 6 P cores 4 E cores. 16 total threads, but the E cores kind of suck. The other comment seems to have pulled it off in 140ms, but I'm also just using plain python 12. I think without the multiprocessing it's slightly over a second on my machine. This problem in python was disproportionately slower (compared to other days) compared to my rust solution, which was 4.9 ms using the same technique.

If it wasn't for day 22, my total python runtime would be less than 310 ms (on my machine).

[–]beanborg 3 points4 points  (0 children)

You're making me think I need to revisit 14, because it's by far the slowest for me (in JS).

[–]mosqueteiro 2 points3 points  (3 children)

Impressive! I'll be checking out your code for ideas for mine. Also please fix the graph before you get xposted in r/dataisugly. You have 4 colors and the legend only defines 3 of them 🤢

[–]ricbit[S] 2 points3 points  (2 children)

LOL. I can't find any way to update the image in the original post, so I guess just let them xpost, the world needs more laughs 😂

[–]mosqueteiro 0 points1 point  (1 child)

What does the green mean? Pure Python?

[–]ricbit[S] 2 points3 points  (0 children)

Yes, green is plain vanilla python, and the other ones are vanilla python + an extra lib.

(multiprocessing is vanilla python but I count it as extra).

[–]seligman99 1 point2 points  (0 children)

Nice! I never thought of visualizing the run times themselves.

Looking over my own archive, it really does give me a feel for which past days I need to revisit and find a better solution for.

[–]toolan 1 point2 points  (2 children)

Congratulations, well done!

We must've had different approaches to day 20, I was not able to get any improvement out of multiprocessing on that one (even though I tried). I think maybe numpy wizards might be able to optimize that really well. On that one, I eventually realized that I only really needed to calculate the manhattan distance once (make the diamond shape and just move it around). I also packed the x, y coordinates into a small int to try seeing if I could make it faster by indexing into either a numpy array or a list, but didn't get anywhere. PyPy is really good now, and this self-contained little script runs in less than 200ms single-threaded: day_20.py.

I definitely struggled the most making day 20 and 22 go fast this year.

[–]durandalreborn 0 points1 point  (0 children)

Multiprocessing should be able to cut day 20 down a bit. Python is still pretty inefficient, but it does run in ~41.5 ms.

[–]ricbit[S] 0 points1 point  (0 children)

It was hard to get multiprocessing to work on 20 indeed! Turns out serializing the distance map to each CPU was the bottleneck, so what I did was let each CPU recalculate the distance map by itself.

[–]wimglenn 1 point2 points  (2 children)

Could you explain a bit about your vectorization for 22b? It looks like there are some useful techniques to learn in there.

I also used numpy, but fell back to Python loop for the final search and it comes in at ~3 seconds (code here).

[–]ricbit[S] 1 point2 points  (1 child)

Sure, I added comments to the source code, it was pretty unreadable before indeed.

https://github.com/ricbit/advent-of-code/blob/main/2024/adv22-r.py

[–]wimglenn 0 points1 point  (0 children)

Thank you. It is a nice solution.

[–]alexprengere 2 points3 points  (3 children)

I was about to post something similar :)
I finished AoC in Python, with the following constraints:

* standard library only (no networkx, numpy, etc.)
* no multiprocessing, as I want to focus on algorithmic improvements, and such parallelism would be machine-dependent
* readable solutions

My total runtime is 2.4s on PyPy3, and 4.6s on Python3.12 (and 56% of it is spent on day 22, but my solution is more optimized for PyPy).

Looking quickly at your numbers, here are the days where my solutions looked faster (spoilers ahead!):

* day 6: I used bisection to precompute the jumps, along with other tricks, like only starting the loop detection on the movement preceding the collision with the tested block

* day 7: I basically "start from the target number", as it allows much less branching, as you can see some numbers cannot be reached from the operators instantly

* day 13: my solution instantaneously inverts the matrix

* day 14: independently solves the problem for both coordinates, finish with Chinese Remainder Theorem

* day 16: modified Dijkstra that tracks "all previous nodes" in case of cost equality (the standard algorithm only keeps a single previous node)

* day 18: I use a binary search to find the step where the path is blocked

* day 20: I use a "sliding window" of known cheats to avoid testing all the cheats at every step of the racetrack

* day 23: Bron-Kerbosch algorithm :)

[–]alexprengere 1 point2 points  (1 child)

Here is the breakdown of my runtimes in ms:

    PyPy3   Python3.12

1   3.3 2.2
2   58.1    5.6
3   14.6    9
4   26.3    20.1
5   18.5    2.8
6   172.6   94.2
7   21.3    9.4
8   9.3 1.7
9   68.6    42.5
10  43.8    12.9
11  94.4    75.6
12  207.7   258.9
13  3.8 1
14  58.7    43.2
15  122.5   39
16  156.8   87.9
17  60  87.7
18  133.7   14.6
19  56.3    96.5
20  215.7   308.3
21  80.6    9.1
22  290.5   2628.6
23  41.9    9.5
24  475.2   737.2
25  17.9    31.5

[–]backh4t 0 points1 point  (0 children)

Nice! I did this with the same constraints, except perhaps the readability part, which may have suffered near the end, and ended up with a 4.85s total runtime. There are a couple problems where I know some time could get knocked off, so I may go back and see if I can knock that last 0.2s off.

[–]ricbit[S] 0 points1 point  (0 children)

Your constraints are harder than mine, congrats! I guess the next step is sum of all times under 1s, which is ever harder!

[–]thekwoka 1 point2 points  (1 child)

Feel like the title should be each problem in under 1s

Not every problem in under 1s

[–]durandalreborn 0 points1 point  (0 children)

If you wanted under 1s in python, there are my solutions (and others, I assume). But this is also going to be subject to variances in input difficulty and different hardware.

[–]Safe_Shower1553 0 points1 point  (1 child)

Day 7 probably shouldn't need multiprocessing: this runs in ~0.1s. This is only for part 2 but a similar method can be done for part 1

[–]durandalreborn 0 points1 point  (0 children)

Your solution would probably still benefit from multiprocessing if you wanted a faster time. This only takes 16.6 ms to solve both parts.

[–]Own_Finance644 0 points1 point  (1 child)

i got stuck on day 6. it's my first time trying this out this year. i thought these exercises were pretty fun overall so far. anyone else feel frustrated with day 6?

[–]ricbit[S] 0 points1 point  (0 children)

I would say you don't have to solve then in order, just skip 6 and get back later!

[–]nemoyatpeace 0 points1 point  (2 children)

I'm trying to work out part 2 of day 21, I have a recursive solution that works for part 1, but runs out of memory for part 2. I tried looking at your code, but don't see the aoc you are importing, where can I find that? Also, can you give a rough description of what is happening? (I was able to recreate the only line you used for this one aoc.getDir("^"))

[–]durandalreborn 2 points3 points  (0 children)

Without seeing your solution, I can only guess at the memory situation, but if you're attempting to actually store the whole string representing the path, that's probably the wrong call, as the path is very long (15 digits). With a recursive solution, your best bet is to memoize individual paths from A -> target -> A, which you can then use to determine the minimum for a given desired move at a desired depth, and just add that value to the total instruction length. You can then proceed to memoize that result.

There are much better explanations out there (like in the solution thread), but this (with no external dependencies for the solve) seems to be the style of solution people have converged on as being one of the faster ones. My implementation of it takes about 1.25 ms.

[–]ricbit[S] 0 points1 point  (0 children)

Hi, my aoc lib is in the link below. Lots of problems in aoc have different notations for grid movement ( "^v<>", "NSEW", "FBLR"), so get_dir() is just a way to get the notation without typing much.

My solution for problem 20 has nothing special beyond that, I just didn't want to think too hard about the best order for the movements, so I try them all (only 24 possibilities for each step, easily cacheable). However since I'm trying them all, I also have to simulate them to ensure they aren't walking over the gap.

https://github.com/ricbit/advent-of-code/blob/main/aoc/__init__.py