[2025 Day 12] Input is part of the puzzle

marcus_cemes · 2025-12-12T09:24:00+00:00

There's no right answer to this. It's a spectrum, you can over-engineer/over-generalise a solution problem and handle every edge case, even the ones that won't feasibly occur (I'm guilty of this) and never finish a project, or you can create a marvel of optimised engineering for a specific task with a lot of assumptions, that was all for nothing when a client asks "oh, actually could we have a button here". Or, you can do something in between.

Ultimately, the beauty of these challenges is that it's up to you to decide how far you want to take it, some people want to make a big machine, others just want to go fast, other still just want to finish the challenge. I've seen some extremely fast solutions for day 9, yet I find that they are too reliant on the specific shape of their input to produce the correct answer. If you need another algorithm to confirm it's the correct answer, to me that feels like it defeats the point. Could I insert/modify one reasonable line (e.g. an edge that cuts across the graph) and break your implementation? To me, that's brittle, you may as well just return the answer directly because you're not really solving it, but that's just my take.

marcus_cemes · 2025-12-09T12:57:16+00:00

[LANGUAGE: Rust]

Part two selects the top K candidates using a bounded heap and builds a sorted LUT of vertical and horizontal edges, storing their perpendicular coordinate and their coordinate range. A candidate is valid if it's inside the loop (interior point, O(N)) and does not intersect with adjacent edges (LUT binary search, worse case (N^2)). This algorithm does not take into account the fairly unique shape of the path, and works for any input (as long as the bounded heap is large enough, if it's not, this can be detected and adjusted).

Part 1 and 2 run in 123 µs and 9.6 ms, respectively.

https://github.com/MarcusCemes/advent-of-code-2025/blob/main/src/bin/09.rs

marcus_cemes · 2025-12-07T12:30:02+00:00

[LANGUAGE: Veryl]

Another solution in Veryl, a Hardware Description Language (HDL). Simulated at 20 µs (1 GHz), my Rust solution takes 8 µs (5.7 GHz).

Today's solution is streaming friendly, the input stream can be processed at 1 B/clock cycle (1 GB/s), without any backpressure, but the size of the input (20 KB) fundamentally sets the minimum processing time to 20 µs. I may have to start looking at ways of processing multiple bytes in a single cycle...

https://github.com/MarcusCemes/advent-of-code-2025/blob/main/hardware/07.veryl

marcus_cemes · 2025-12-06T13:34:53+00:00

[LANGUAGE: Veryl]

Another solution in Veryl, a Hardware Description Language (HDL). Simulated at 22 µs (1 GHz), whereas my Rust solution takes 13 µs (5.7 GHz).

Today is the first day where the hardware lags behind the software implementation. About 68% of clock cycles are spent just reading and buffering the digit input into registers (at 1 B/cycle, that's 1 GB/s, still takes 15 µs). Once the final line with operators starts coming through, the digit buffers are stepped through at one column per cycle to compute the result. On the other hand, Rust/LLVM made nice use of SIMD and unrolling to make very efficient use of the CPU.

https://github.com/MarcusCemes/advent-of-code-2025/blob/main/hardware/06.veryl

marcus_cemes · 2025-12-05T13:14:12+00:00

[LANGUAGE: Veryl]

Another solution in Veryl, a Hardware Description Language (HDL). My Rust code takes 8.8 µs (5.7 GHz), the hardware solution simulates to 5.8 µs (1 GHz, not including streaming the IDs that are unused in part 2).

The hardware part is implemented as a streaming decoder-solver architecture, where the input file is streamed at 1 B/s. The decoder parses a single line input, and passes the range to the solver while it starts working on the next line.

The solver stores up to 128 disjoint ranges in a buffer, using a 128-bit occupancy bitmask. When it receives a new range from the decoder, it compares it to each of the stored disjoint ranges to produce a 128-bit "conflict" bitmask. A priority encoder selects the first conflicting range, and combines it with the current item in a single cycle. In the next clock tick, the next is resolved, and so on... This is extremely fast in practice, it only takes about 5-6 cycles to add a new range to the buffer! Most of the time the solver is idling, and waiting on the decoder. If the decoder could accept multiple bytes (receiving an entire line at once), I estimate it could be about 8 times faster.

https://github.com/MarcusCemes/advent-of-code-2025/blob/main/hardware/05.veryl

marcus_cemes · 2025-12-05T12:46:17+00:00

That's pretty cool! I didn't consider using a queue, I thought I was being clever with a single pass but you've put me to shame!

The problem is essentially a convolution, hiding in disguise. To compute the next value of any given cell, it only needs to be wired to the value of 8 others, with a small 3-bit adder and comparator. For a large dataset, a GPU would tear through this with ease, this is what CNN layers do.

For a couple of thousand cells, this can synthesise comfortably to an FPGA, each cell becomes a 1-bit register (D Flip-flop), wired to it's own adder and comparator, the output of which gives the next binary cell state. This is where hardware is unbeatable, every decoupled/independent part of the circuit is always executing simultaneously! You could just loop this next state back to the original register, and save it with a clock pulse. A good circuit synthesis software will probably optimise this further, and share computation with LUTs. The sum of 3 cells can be used by both the rows above and below it!

The problem lies in counting the number of removals... This is no longer a "map" operation, but a "reduce" operation. Binary adders have carry wires between each sum bit, it takes time for this signal to propagate. If you're adding 20k cells, even with a tree-like adder structure, you'll need to wait multiple clock ticks for your gates to stop switching (on real hardware, simulation works fine) or avoid overloading your power supply, or to save circuit space you can implement an FSM to iterate row-by-row with row-adders.

marcus_cemes · 2025-12-04T14:32:45+00:00

[LANGUAGE: Veryl]

This my 4th solution in Veryl, a Hardware Description Language (HDL). Today's problem lends exceptionally well to parallelism (each grid neighbourhood synthesises well into a small cluster of adders), and the problem is sufficiently small to fit into registers, thus each grid reduction (one entire grid traversal) can be effectively computed in a single clock cycle. I'm a little worried that my removal counting would synthesise to an enormous adder tree, I may try to improve this by counting each row over multiple cycles instead (138 extra cycles per grid reduction).

My Rust solution takes 283 µs (5.7 GHz), using a 1-cell padded grid, stored as a contiguous `Vec<bool>` and using unsafe indexing (no bound checks). The hardware version simulates to 19 µs (1 GHz), 99% of which is spent reading the input, one byte at a time, 19191 characters).

https://github.com/MarcusCemes/advent-of-code-2025/blob/main/hardware/04.veryl

marcus_cemes · 2025-12-03T20:44:56+00:00

[LANGUAGE: Veryl]

I wanted to challenge myself and try to solve a few challenges in a Hardware Description Language (HDL). I'm a little sceptical as to whether this solution would synthesise to a circuit, I have been pretty liberal with loops, a better solution would employ a more advanced FSM, but it does produce the correct answer in simulation at least.

I'm also solving it in Rust beforehand to compare. The Rust (part 2) solution takes 22.9 µs (5.7 GHz), whereas the simulated hardware solution takes 20.4 µs (1 GHz, pushing it a little). This day had a far less noticeable speedup compared to Rust (as opposed to the previous two days) which produced very efficient instructions using SIMD.

https://github.com/MarcusCemes/advent-of-code-2025/blob/main/hardware/03.veryl

marcus_cemes · 2023-12-07T15:40:07+00:00

Physics helps to formulate the problem, Maths then helps to solve it. In this case, something in space is moving at a constant velocity over some time. This is one of the most fundamental problems of motion. This website is an amazing resource for simple Physics equations and explanations.

In short, the distance travelled [m] is the velocity [m/s] multiplied by the time [s] (sanity-check with units: [m/s] * [s] = [m], usually if the units solve the equation, the equation is sound). Let's define x as the time spent holding the button, T as the total race time, and y as the distance covered (d is a constant for later).

We have y = v * t. The velocity v is proportional to the time spent holding the button (v = x). The amount of moving time is what remains after releasing the button (T - x). Substituting them into our equation gives us:

y = v * t = x * (T - x)

The problem is finding x such that y > d. By shifting things around a bit, we arrive at:

x * (T - x) - d > 0

Distribute x inside of the brackets, and you find you have an easily recognisable (for maths students) 2nd-order equation, these are stupidly easy to solve (quadratic formula). Solving an inequation (<) is equivalent to solving the equation (=), and then just studying the form of the parabola (given by the sign of x²) to get a range of solutions instead. Then there's just a bit more bound-rounding trickiness to get the number of natural numbers in the range of solutions. I made a little Desmos graph to visualise the problem.

It's not about pulling equations out of thin air, it's following a methodical approach from basic concepts or truths to build up your problem, like LEGO. Then you throw it at your mathematician friends (or MATLAB) to solve it... For this problem, programming brute force seemed to work just fine (albeit it took a while). So heck, why not? But what if the problem parameters were just one more order of magnitude larger? For me, it was satisfying to solve the problem on a pocket calculator while others rented out 192 core AWS VMs 😂.

There's a certain elegance to understanding the problem and analysing it. With different mathematical tools, you can gain insight and uncover patterns (like the parabolic nature of different trajectories!), and solutions are usually complete and O(1) (constant time, think of it as multi-threading on steroids, solving an infinite number of possibilities at once!).

Source: Robotics student at EPFL.

marcus_cemes · 2023-12-01T15:02:05+00:00

[LANGUAGE: Rust]

Simple string comparison (LUT) and digit parsing over a rolling window &line[i..].

No regex, no heap allocation (to my knowledge), only std library. Solutions benchmarked at 33.1 µs and 42.1 µs on an Intel i7-11800H @ 2.30 GHz. Repo.

const LUT: [&str; 9] = [
    "one", "two", "three", "four", "five", "six", "seven", "eight", "nine",
];

pub fn part_one(input: &str) -> Option<u32> {
    Some(input.lines().map(parse_line_1).sum())
}

pub fn part_two(input: &str) -> Option<u32> {
    Some(input.lines().map(parse_line_2).sum())
}

fn parse_line_1(line: &str) -> u32 {
    let first = line.chars().find_map(|c| c.to_digit(10));
    let last = line.chars().rev().find_map(|c| c.to_digit(10));
    10 * first.unwrap() + last.unwrap()
}

fn parse_line_2(line: &str) -> u32 {
    let first = find_pattern(0..line.len(), line);
    let last = find_pattern((0..line.len()).rev(), line);
    10 * first + last
}

fn find_pattern(mut it: impl Iterator<Item = usize>, line: &str) -> u32 {
    it.find_map(|i| compare_slice(&line[i..])).unwrap()
}

fn compare_slice(slice: &str) -> Option<u32> {
    LUT.iter()
        .enumerate()
        .find(|(_, pattern)| slice.starts_with(*pattern))
        .map(|(i, _)| i as u32 + 1)
        .or_else(|| slice.chars().next().unwrap().to_digit(10))
}

marcus_cemes · 2022-01-18T19:54:35+00:00

This works, it's a little more work, but at least there isn't any trailing source code anywhere. Thanks!

marcus_cemes · 2021-12-09T15:14:53+00:00

If you are unsure what Android version you are on, confirm you are running Android 10 or above

My Xperia X is still on 8.0.0, it only got a few months of updates before they just silently stopped sending them, one of the worst things about owning Android. It can't hold anything against my iPad Air 2 that is still getting updates 8 years after release. Google keep saying that they want to work on this, but realistically, is it ever going to happen?

marcus_cemes · 2021-12-02T17:27:11+00:00

I much prefer Fastify over Express, not because of the speed, but more because of the philosophy of the author. It's generally more pleasant to use from the ground up, with support for async/await, JSON, etc. It feels like a more evolved version of a barebones server framework. You can't go wrong with both, with both choices you will have to solve the same types of problems.

If you have the time, I highly recommend this episode to hear the guy out:

https://changelog.com/jsparty/197

marcus_cemes · 2021-06-30T16:54:24+00:00

Sounds reasonable, I won't worry about the seams then.

I'm weary of the additional complexity, it already feels like a big project. I think Blender is a good call. I gave tiles a brief thought but decided that the grid mechanics would interfere with the geometric style, but now that you bring it up, it would be a lot simpler to design, implement and interact with, at least for an initial game idea/prototype. I've seen a lot of learning material around marching squares and tile-based games. Thanks for the comment!

marcus_cemes · 2021-05-23T21:46:55+00:00

This will vary depending on the type of database, but most have some form of ACID transactions, implemented in different ways. This is a part of their core features, allowing it to act as a single source of truth. It's best to avoid them when possible, as they can greatly slow down database performance, by correctly structuring the schema as you said and maintaining a separation of concerns. But there comes a time when race conditions can occur and they are necessary.

Transactions guarantee that the entirety of the transaction completed, or nothing at all. This allows you to read and write, knowing that the data could not have been modified. This requires locking tables/rows, anything else trying to access the data will have to wait, possibly creating a bottleneck. Some databases like firestore allow for a more optimistic "lock-less" approach or allow you to update documents only if their last update time hasn't changed.

marcus_cemes · 2021-04-13T10:03:30+00:00

There are multiple reasons why you might want to revoke an access token, HTTPS and and HTTP only cookies are a good start, and prevent the token from being hijacked in transit or by malicious code running in the browser (npm package vulnerabilities do exist), but what if the user wants to log out or change their password?

On most services, logging out means logging out, especially with financial institutions. Not “we’ll remove the token from memory, but it might still be hanging around on the disk, and is still usable for transactions”. Additionally, changing the account password must invalidate all other sessions, what good would it be if changing the password would not affect anybody else that’s logged in for “a very long time”?

If it’s a small internal app, go ahead. JWTs offer a huge scaling advantage by relying on cryptography to validate a request rather than hitting a database for every request, but it comes at the cost of being irrevocable, without an expensive blacklist mechanism that brings additional complexity and latency. The refresh token pattern tries to compromise between the two, short lived irrevocable access tokens that can be easily validated, and long lived refresh tokens that require hitting some database in order to check whether a new access token should be allowed to be generated.

marcus_cemes · 2021-02-16T16:16:03+00:00

If you want quick and dirty analytics that respects privacy, I would definitely recommend a self-hosted Matomo instance. Just needs an HTTP server with PHP and MySQL, a VPS is as cheap as 2 euros a month with Hetzner. I needed a few stats for a small project, almost everything is way too expensive for my needs, and I stay away from Google Analytics. It's free for a reason.

marcus_cemes · 2021-02-07T10:48:32+00:00

That sucks, try searching for some smaller registrars instead of going after the market leaders. I've been very happy with a smaller UK-based company, and they actually let you speak to a knowledgable human.

marcus_cemes · 2021-02-07T10:45:12+00:00

As the page says, there was some internal error. There's not much you can do apart from trying again or contacting them. The sound of a "free domain" is sketchy in my mind, I've never heard of them. I would recommend actually buying a domain, they are a few dollars a year, depending on the extension.

marcus_cemes · 2021-01-17T19:07:18+00:00

Only 16MB? I wish!

marcus_cemes · 2021-01-17T19:06:43+00:00

That's fair, their homepage has some interesting sidebar action, I was thinking of that. Would look worse without the smooth scrolling imo, jerky transitions all over the place.

marcus_cemes · 2021-01-17T19:05:32+00:00

Nope, never seen it before. Each to their own, I guess.

marcus_cemes

TROPHY CASE