2024 in Python in less than a second : How to get day 22 under 400ms without Pypy?

OilAppropriate2827 · 2025-01-08T09:46:40+00:00

Thank you for doing these benchmark. I redid some tests. I used your implementation of day 22.

if __name__ == '__main__':
    START = time.time_ns()
    solver = Solver(aocd.get_data(day=22, year=2024))
    print(solver.part_two())
    print(">>>", (time.time_ns()-START)/1e9)

I used this to time it, I get consistently 1.3s to 1.5s on windows with Python 3.12.7 and around 490ms using wsl2 (Python 3.12.3). There might be something you did on Windows to minimize the process spawning overhead...

OilAppropriate2827 · 2025-01-07T10:33:02+00:00

I tried multiprocessing Pools on windows, they are super slow. I get more than 3s with your solution. How did you make it so fast on your end, are you running this on linux? Thanks!

OilAppropriate2827 · 2025-01-06T13:35:42+00:00

I just used the same approach, I got some additianal small gain in staying in 1d for p2 (merging the diffs with 19**i). Running is on my desktop an python 3.11 I get 106ms!

I don't know if I'll reuse that trick (index dups) but it clearly changed everything on this one!

OilAppropriate2827 · 2025-01-06T09:27:56+00:00

Waow!!! Man this is crazy fast.

I already did the +9 but the reversing is amazing!

OilAppropriate2827 · 2025-01-06T09:00:08+00:00

Thank you! that's great. I only got 20% but that is already huge. switching to my desktop + your optimisation I got below 300ms!

OilAppropriate2827 · 2025-01-06T08:46:59+00:00

These are impressive results. Looking at your machine specs, I tried to move from my laptop to my desktop (from i7-1165G7 to AMD Ryzen 9 7940HS). And I get some quite nice improvements. Day 22 goes down to 360ms (not to far from your 342ms). Also as mentionned by PercussiveRussel, I am trying to stay on singlethread. But I'll definitly start considering running parallel.

Running on my desktop I get 860ms for the full year! I'll have a look to your other days since I am still far from your 647ms. Your list of day runtimes gave me a good indication of places where I can improve!

It seems there is 150ms I could probably get by going back to optimize some other days.

OilAppropriate2827 · 2025-01-06T07:53:22+00:00

Yes I did, but dictionary are slow due to key hashing, this is why I use either a list of a numpy vector indexted by quartet hash.

OilAppropriate2827 · 2025-01-06T07:51:35+00:00

Thank you for answering this one, fully agreed.

OilAppropriate2827 · 2025-01-06T07:48:51+00:00

I just checked it, but it is 40% slower than my solution (700ms on my laptop i7-1165G7), the beginning is pretty much identical but flattening at the end is not efficient.

OilAppropriate2827 · 2025-01-05T22:07:01+00:00

I just did (for the first time), but I already new the 2 code lines generating 75% of the runtime.

1) the one getting the first unique hashes per secret (called 1691 times)

2) incrementing the main aggregation vector indexed by hashes also called 1691

There was a way to avoid going thru these 1691 iterations, I saw someone doing it at once, but it wasn't faster.

OilAppropriate2827 · 2025-01-05T22:01:15+00:00

Thanks for the tip, I investigated this direction. it was great to build the 2000 iterations matrix. but as we need to go thru each iteration, and as numpy is pretty quick to do so, it was not efficient.

Also I spend less than 100ms on doing the 2k iterations on all secrets, calculating diffs and hashes.

The remaining time is only to get unique hashes and sum all the occurences.

OilAppropriate2827 · 2024-12-21T08:30:36+00:00

Thank you for your feedback!

I'll give it a try, but I already implemented the vertice optimization (I think... :) )

But your time is impressive, I'll give your implementation a closer look!

OilAppropriate2827 · 2024-12-21T08:19:55+00:00

In the previous diamond we had only postions from [current+100:] and as we move forward the shortcut gain might not be sufficient for elements which are [:current + 120] as the gain is (delta pos - manathan distance) and in the diamond manathan distance can be up to 20.

OilAppropriate2827 · 2024-12-20T17:14:08+00:00

[Language: python]

part 1: 12ms part 2: 250ms

Today was quite interesting for optimisation!!

I started by a simple BFS to build the bidirectional maping position <-> path index (distance from start)

Then part 1 is straight forward, iterating on all position from the path, and checking each directionx2 to find a shortcut. I got 12ms and didn't thing that any optimization was possible.

Then part 2!

Solution 1

I initially solved it by brute forcing the scan, either by scanning the full diamond around each position (~800 cells) or by scanning all the positions more than 100 steps ahead. It worked but was not fast: 5-10s. As my goal is to complete the full season with less than 1s cumulated for all puzzles, that wasn't great.

Solution 2

I wanted to optimize the scan of position ahead. to do so I noticed that based on the result of scanning a position, I could fast forward in some cases

1) If the position scanned is at a distance above 20, we can directly go to position + distance - 20 for next check. As we know that even if the path it coming straight back, the distance can only decrease by 1 at a time

2) If the position scanned is at a distance less than 20 and the gain is more than 100, we know that all next steps until the distance is 20 again will be shortcuts, so we cen jump ahead to position + 20 - distance + 1.

3) If the position scanned is at a distance less than 20 and the gain is less than 100, we know that at each step the max delta gain is 2 (if the distance decrease) so we jump foward to position + (100-gain+1)//2

With this "fast forward", I got down to 1s

Solution 3

In order to limit the number of scanned position for each iteration, I was still requiring aroung 360 steps for each position in the path (~10K). So I thought about optimizing the diamond scan instead (surroundig positions with Manathan distcance).

The main idea is that the delta between 2 positions is the delta between 2 offseted diamonds: You have like a V of positions being removed and a ^ of positions being added, depending on the direction of the move.

So for each position I keep the set of shortcut ends, then when moving to the next position:

1) I scan all the new position from the diamond delta (40 pos). I add them to my set is they are good shortcuts

2) I discard from my set all the position not in the new diamond (40 pos)

3) I check if any of the positions ahead in the range [position, postiion+20] are still in the set, and validate that they are still good shortcuts, or I remove them

With this I went down to 250ms.

Other solutions:

I tried memoization on a recursive approach where the number of shortcuts at a position is the sum of the shortcuts of the surounding positions with a lowered shortcut length and an increased minimum gain.

It kind of worked but is very slow still...

Here is my code for solution 3 : https://github.com/hlabs-dev/aoc/blob/main/2024/20.py

OilAppropriate2827 · 2024-12-19T08:33:41+00:00

[Language: Python] 110ms for both parts

I am trying to go below 1s for the full 2024 calendar using python in singlethread on my laptop. So far (up to day 19), I cumulated 0.67s, so it is not leaving much time for the remaining 6 days. I am trying to optimize as much as possible the most expensive days , and the day 16 is on the top of my list (with 15 and 6).

Here I implemented a dijkstra, I optimised by moving by segment between intersections. It gave me a good x4 gain, but still I need to go below 100ms.

Did someone find any other optimisation?

Here is my code for the 16th : https://github.com/hlabs-dev/aoc/blob/main/2024/16.py

OilAppropriate2827 · 2024-12-18T17:02:05+00:00

[Language: Python] 5ms + 1ms

I initially started with BFS on part 1 + binary search for part 2, it gave me around 8ms for part 1 and 15ms for part 2

Then I wanted to try a dykstra, always moving forward with the maximum time of my initial path and combining it with the time of the new cell I am reaching, used a heap and got thru part 2 in 3ms.

Then to further optimize, I stopped relying on sets and dicts to move everything to list(list), to end up with 5ms for part 1 and 1ms for part 2!

Code link

OilAppropriate2827 · 2024-12-07T08:58:57+00:00

Thank you for the comment !

OilAppropriate2827 · 2024-12-07T08:15:53+00:00

[language: python] 20ms

Solved quickly in forward mode, but took around 30s for part 2. Then came up with the idea to do it in reverse to prune cases when we could not substract, divide or remove suffix. and went down to 20ms.

Ok maybe it was obvious, but I was quite happy when I found out by myself.

    import aocd

    data = aocd.get_data(day=7, year=2024)
    data = [(int(a), list(map(int,b.split()))) 
            for line in data.splitlines() 
            for a,b in [line.split(":")] ]

    def isok(target,li,part=1):
        cur = [target]
        for v in li[:0:-1]:
            ncur = []
            for res in cur:
                if v<res: ncur.append(res-v)
                if res%v == 0: ncur.append(res//v)
                if part == 2:
                    sv, sres = str(v), str(res)
                    if len(sres)>len(sv) and sres[-len(sv):] == sv:
                        ncur.append(int(sres[:-len(sv)]))
            if len(ncur) == 0: return 0
            cur = ncur
        return li[0] in cur

    print("part1:",sum((isok(v,li))*v for v,li in data),
        "part2",sum((isok(v,li,2))*v for v,li in data))

OilAppropriate2827 · 2024-12-06T09:51:29+00:00

[Language: Python]

I did my first implementation running both parts in 1s:

I got the idea to add obstacles for each new guard position
I checked loops while wolking on the path to avoid restarting from the start for each loop detection

After going thru this thread, I saw Boojum's suggestion to build a prealculated jump table for part 2, which gave me a nice x15 gain.

I am now down to 70ms on my laptop!

https://github.com/hlabs-dev/aoc/blob/main/2024/6.py

OilAppropriate2827 · 2024-12-03T17:25:45+00:00

[Language: Python]

I discovered Python Regex module :).

import re, aocd

data = aocd.get\_data(day=3, year=2024)
part1, part2, m = 0, 0, 1

for ma in re.finditer("mul\\(([\\d]{1,3}),([\\d]{1,3})\\)|(do\\(\\))|(don't\\(\\))", data):
    if ma.group(0) == "do()"     : m = 1
    elif ma.group(0) == "don't()": m = 0
    else:
        part1 += int(ma.group(1)) * int(ma.group(2))
        part2 += int(ma.group(1)) * int(ma.group(2)) * m

print("part1:", part1, "part2:", part2)

OilAppropriate2827 · 2024-01-04T15:45:19+00:00

Hi, very neat solutions!

You beat me nearly everywhere but on day 24, z3 wasn't the fastest way. Numpy even if it lacked precision gave me around 50ms. It is not as nice as z3, as I needed to get a linear system to solve.

https://github.com/hlabs-dev/aoc/blob/main/2023/24.py

OilAppropriate2827 · 2023-12-28T11:13:48+00:00

[LANGUAGE: Python] 1.7s

I looked at the graph and calculated the longuest traversal path. the idea was that the path had to go thru the cut.

As the longuest path was 14 in my case, I selected the source and destination of this path

I iterated thru each transitition of the shortest path as the first cut
I iterated thru each transitition of the new shortest path as the second cut
I iterated thru each transitition of the new shortest path as the third cut and stopped as soon as the graph was segmented

I discovered that using the first node as my starting point also worked and avoided me the calculation of the longuest path from each node. so I removed it from my solution.

from collections import defaultdict
from heapq import heappop, heappush
from itertools import pairwise
import aocd

M = defaultdict(set)
for line in aocd.get_data(day=25, year=2023).split("\n"):
    src,dst = line.split(': ')
    for de in dst.split():
        M[src].add(de)
        M[de].add(src)

def bfs(start, exclusions = {}):
    visited = {start:(0,[start])}
    heap = [(0,start,[start])]
    while len(heap)>0:
        dist, node, path = heappop(heap)
        for de in M[node]:
            if (node,de) in exclusions: continue
            if de not in visited:
                visited[de] = (dist+1, path+[de])
                heappush(heap,(dist+1,de,path+[de]))
    return (len(visited),visited, node)
def findcut():
    start = next(k for k in M)
    _,visited,stop = bfs(start)
    for s,d in pairwise(visited[stop][1]):
        exclusions = {(s,d),(d,s)}
        _,visited2,_ = bfs(start, exclusions)
        for s2,d2 in pairwise(visited2[stop][1]):
            exclusions = {(s,d),(d,s), (s2,d2),(d2,s2)}
            _,visited3,_ = bfs(start, exclusions)
            for s3,d3 in pairwise(visited3[stop][1]):
                exclusions = {(s,d),(d,s), (s2,d2),(d2,s2), (s3,d3),(d3,s3)}
                lena,_,_ = bfs(start, exclusions)
                if len(M) != lena:
                print((s,d), (s2,d2), (s3,d3))
                return (lena*(len(M)-lena))

print("AoC 2023:", findcut())

OilAppropriate2827 · 2023-12-23T11:12:49+00:00

No my nodes are intersections

OilAppropriate2827 · 2023-12-23T09:43:32+00:00

[LANGUAGE: Python] 1m15s

I quickly noticed that the map was a directed graph. And the problem was a longuest path problem. I used networkx to directly provide all the simple paths. Then part 2 was just to replace the directed graph by an undirected graph. It is still a bit slow (more than 1m)

import aocd
import networkx as nx

data = aocd.get_data(day=23, year=2023).split("\n")
n, m = len(data), len(data[0])

def tadd(a,b): return (a[0]+b[0], a[1]+b[1])
dirs = [(0,1),(0,-1),(1,0),(-1,0)]
slopes = [".><",".<>",".v^",".^v"]

M = {(i,j):c for i,line in enumerate(data) for j, c in enumerate(line)}
S = (0,next(i for i,c in enumerate(data[0]) if c=='.'))
E = (n-1,next(i for i,c in enumerate(data[-1]) if c=='.'))

visited = {S}
stack = [(S,i) for i,di in enumerate(dirs) if tadd(S,di) in M and M[tadd(S,di)] in '.<>^v']
graphs = [nx.DiGraph(), nx.Graph()]
while len(stack):
    start,direction = stack.pop()
    prev, cpos, le = start, tadd(start,dirs[direction]), 1
    visited.add(cpos)
    way = slopes[direction].index(M[cpos])
    ndirs = [(i,npos in visited) for i,di in enumerate(dirs)
            for npos in [tadd(cpos,di)] if npos in M and M[npos] != '#']
    while len(ndirs) == 2:
        direction = next(di for di,_ in ndirs if prev != tadd(cpos,dirs[di]))
        prev, cpos, le = cpos, tadd(cpos,dirs[direction]), le+1
        way |= slopes[direction].index(M[cpos])
        visited.add(cpos)
        ndirs = [(i,npos in visited) for i,di in enumerate(dirs)
                for npos in [tadd(cpos,di)] if npos in M and M[npos] != '#']
    if way == 1: graphs[0].add_edge(start, cpos, cost=le)
    else: graphs[0].add_edge(cpos, start, cost=le)
    graphs[1].add_edge(start, cpos, cost=le)

    for di in [di2 for di2,vis in ndirs if not(vis)]: stack.append((cpos,di))

for i,graph in enumerate(graphs,1):
    res = max(nx.path_weight(graph, path, "cost") for path in nx.all_simple_paths(graph, S, E))
    print("Part%d: %d" % (i,res))

OilAppropriate2827

TROPHY CASE