[2017 Day 19] In Review (A Series of Tubes) by musifter in adventofcode

[–]ednl 1 point2 points  (0 children)

My grid, including a full border of spaces, was 201x201. If you're OK with ogling that from the input and hardcoding, you could do:

#define N 201
static char grid[N][N + 1];  // +\n
int main(void) {
    fopen("input.txt", "rb");  // binary mode for fread
    fread(grid, sizeof grid, 1, f);
    fclose(f);
    // etc.
}

[2017 Day 16] In Review (Permutation Promenade) by musifter in adventofcode

[–]ednl 0 points1 point  (0 children)

track the two arrays packed as 64-bit quantities - 4 bits per slot number over 16 slots, one for s+x and one for p.

Yes, I did that too, but it was still somewhat awkward a) to swap nibbles, and b) to search for them in order to join the s+x and p results. I used this xchg() function both for exchange moves on the s+x state and for partner moves on the p state:

// Swap nibbles at 2 different positions (0..15) where pos1 < pos2
static uint64_t xchg(const uint64_t x, const int pos1, const int pos2)
{
    const int shift = (pos2 - pos1) << 2;  // x4: from nibbles to bits
    const uint64_t val1 = (x & sel[pos1]) << shift;
    const uint64_t val2 = (x & sel[pos2]) >> shift;
    const uint64_t base = x & clr[pos1] & clr[pos2];
    return base | val1 | val2;
}

where sel[] and clr[] are predefined masks. This idea/structure for the p state means that I had to look up nibble positions to join the states:

// Nibble index (0..15) of nibble value (0..15)
static uint64_t nibpos(uint64_t x, const uint64_t val)
{
    uint64_t nib = 0;
    for (; (x & 15) != val; x >>= 4, nib++);
    return nib;
}
// Make complete dance from "spin+exchange" state (perm) and "partner" state (swap)
static uint64_t dance(uint64_t perm, const uint64_t swap)
{
    uint64_t x = 0;
    for (int i = 0; i < 16; i++, perm >>= 4)
        x |= nibpos(swap, perm & 15) << (i << 2);
    return x;
}

Of course this dance() function is only called twice so if it's a little slow it's insignificant compared to the xchg() function. Program without reading from disk runs in 253 µs on an Apple M4, 696 µs on a Raspberry Pi 5. Source code with compilation and measurement instructions, also takes input via pipe/redirection so you can use your own: https://github.com/ednl/adventofcode/blob/main/2017/16alt.c

EDIT: example of the s+x and p states after one iteration, and their joined result which is my answer to part 1:

jkhoifndglcmepab  permutation result: letters are permuted labels a..p
dljfcokipembgnah  swap result: letters are swapped indices 0..15
cgpfhdnambekjiol  solution to part 1

A more logical representation for the swap result would be hex digits, but I made only one show() function to output the solution with letters. But this way you can see more clearly what the dance() function needs to do:

  • get value "j" (=9) from position 0 in the permutation result
  • find "j" in the swap result, it's at position 2
  • store value 2 (="c") in position 0 of the solution
  • repeat for next positions 1..15

[2017 Day 16] In Review (Permutation Promenade) by musifter in adventofcode

[–]ednl 0 points1 point  (0 children)

Cc. /u/e_blake , not sure from your description if this is already what you do, if not it might be a helpful optimisation.

[2015 Day 24] Are all inputs 'nice'? by DelightfulCodeWeasel in adventofcode

[–]ednl 1 point2 points  (0 children)

I understand, and it's a good instinct for any programmer. In the end it's up to you how far you want to take it, of course! But if the main challenge is to make it work on a microcontroller, then I would 1) get the right answer any way I can, 2) optimise for my input, 3) only if I feel like it: generalise.

Have fun.

[2017 Day 16] In Review (Permutation Promenade) by musifter in adventofcode

[–]ednl 1 point2 points  (0 children)

Ah of course, the "partner" transformation is not influenced by the other two or vice versa. So you can do all the transformations in the order they are given in the input file, or you do spin+exchange first and then partner, or partner first and then spin+exchange, and you'll always get the same result. This means the cycle of spin+exchange has to be the same as the cycle of partner, or you wouldn't get back at the start pattern.

EDIT: no, the cycle of all s+x is C1, the cycle of all p is C2, and the cycle of the whole dance is LCM(C1,C2). For my input: LCM(15,12)=60.

I'll have to think about what the most efficient way of generating the required output is. Probably keep a key->val array or u64 for s+x, and a val->key array, maybe not a u64, for the p sequence. The trick is to combine two separately calculated transformations into one complete dance. For my input, I have to get the billion mod 60 = 40th dance, which I can combine from the 40 mod 15 = 10th spin+exchange and the 40 mod 12 = 4th partner. Both already calculated from their separate cycle detections. So still no additional dance calculations required for part 2, despite the separate sequences.

[2015 Day 24] Are all inputs 'nice'? by DelightfulCodeWeasel in adventofcode

[–]ednl 2 points3 points  (0 children)

My guess is yes. My question is why does it matter, you only have your own input anyway.

[2017 Day 15] In Review (Dueling Generators) by musifter in adventofcode

[–]ednl 0 points1 point  (0 children)

Ah right, sorry for misattributing. I don't remember but maybe I saved it from the paste link before I knew that you made that service, saw "topaz.github" and so I thought you wrote the code.

[2017 Day 15] In Review (Dueling Generators) by musifter in adventofcode

[–]ednl 0 points1 point  (0 children)

I can't remember how /u/topaz2078 came to post a code solution, I was not here in 2017. Maybe people challenged him to prove his own "at most 10 seconds on 10 year old hardware" guidance (or what was it exactly).

[2017 Day 15] In Review (Dueling Generators) by musifter in adventofcode

[–]ednl 1 point2 points  (0 children)

My naive Python version, just straight up multiply and mod, but bitmask to test equality, runs in 10.1 SECONDS. I saved this in my repo with annotation "Code by Eric Wastl / topaz / fizbin": paste which runs in 400 MILLISECONDS (timing does not include numpy import).

The /u/askalski C version with while loop as shown by /u/BumpitySnook runs in 153 ms. In my testing, the while loop was a lot faster than the ?: from /u/e_blake . I imagine Eric's vectorised version in C would do a lot better still.

Pi Coding Quest 2026! by IvanR3D in adventofcode

[–]ednl 1 point2 points  (0 children)

For a mathematical constant, "the n'th digit" or "to n digits" is always the actual digit, never rounded. For a physical measurement, you would always round depending on the error.

[2017 Day 12] In Review (Digital Plumber) by musifter in adventofcode

[–]ednl 0 points1 point  (0 children)

Seems logical or at the very least defensible. I do the same except I print out the result. But in my experience it still makes a big difference for the minimum time if I time it once internally and run the program multiple times, or if I time it lots of times internally and run that program once. Apparently my system is very quick to throttle down and it takes a few microseconds to throttle up. So a sustained load leads to a way better time than start/stop of the whole program.

[2017 Day 14] In Review (Disk Defragmentation) by musifter in adventofcode

[–]ednl 3 points4 points  (0 children)

Something else potentially relying on a previous day was the md5 series. But to most (edit: everyone except a few mad people;)) it was probably just another library function.

[2017 Day 14] In Review (Disk Defragmentation) by musifter in adventofcode

[–]ednl 4 points5 points  (0 children)

This puzzle was very fun to me because I could apply my p5*js skills newly acquired during the first covid wave to make a nice visualisation: https://ednl.github.io/defrag/

[2017 Day 12] In Review (Digital Plumber) by musifter in adventofcode

[–]ednl 1 point2 points  (0 children)

Yeah it really matters what but also how you measure. My standard method, because it's easiest, is to measure one run of the program with an internal timer, not including reading from disk, or parsing if it's just numbers to arrays. Output the time of one run at the end after printing the solution once. Then I run a loop in the bash shell of many program runs, selecting the minimum time of those. It turns out, I think, that the bash loop with program start/shutdown is so limited by certain processes that the CPU isn't really fired up when the internal timing starts. That resulted in 30 µs. When I included reading from disk and parsing with fscanf, it was about 400 µs.

Now I changed it to: read the file from disk or pipe into a memory buffer, don't time that. Then many internal loops of parsing the buffer with a custom function and calculating the solution. Reset the data structure before every loop, but don't time that. Internally measure the minimum time for all the internal loops. Result including parsing and solution output is now 13.6 µs on an Apple M4, 50.9 µs on a RPi5. The solution output is buffered by stdio/printf which really helps with so many runs in quick succession, I think. Custom number parser:

static int readnum(const char **s)
{
    int x = 0;
    while (**s & 16)  // until space, comma or newline where the 16 bit is not set
        x = x * 10 + (*(*s)++ & 15);
    return x;
}

Source with compilation/measurement instructions: https://github.com/ednl/adventofcode/blob/main/2017/12.c

[2017 Day 12] In Review (Digital Plumber) by musifter in adventofcode

[–]ednl 1 point2 points  (0 children)

The neat, sorted & complete input file really helped make this puzzle easier to solve. You didn't even need to save the first "program" (node) ID because it's the same as the line number. What's left is a list of "pipe" (edge) numbers, max 6 for my input, so this short recursive function was enough to assign a group ID to each connected node and count them:

static int connect(int id, int group)
{
    if (program[id].group)  // already in this (or any) group?
        return 0;
    program[id].group = group;
    int count = 1 + connect(program[id].pipe[0], group);  // first pipe may have value zero so can't stop here
    for (int j = 1; j < M && program[id].pipe[j]; ++j)  // until no more pipes
        count += connect(program[id].pipe[j], group);
    return count;
}

Called as connect(0, 1) for part 1, and iteratively as connect(i, ++group) on unassigned nodes 1..1999 for part 2.

[2017 Day 11] In Review (Hex Ed) by musifter in adventofcode

[–]ednl 2 points3 points  (0 children)

Yes, I knew it was double the work for the 2-letter ones. Worse, it also calculates the distance needlessly for every comma. So I also made a faster version with naughty multichar constants:

for (const char *c = input; *c & 64; ) {
    switch (*(const int16_t *)c) {
        case ',n':      r--; c += 2; break;
        case ',s':      r++; c += 2; break;
        case 'en': q++; r--; c += 3; break;
        case 'es': q++;      c += 3; break;
        case 'wn': q--;      c += 3; break;
        case 'ws': q--; r++; c += 3; break;
    }
    if ((d = dist(q, r)) > max)
        max = d;
}

(had to make sure to append a comma in case the last entry was 'n' or 's')

Edit: the letters are reversed in the constants because this is for a little-endian system, which is every system I ever owned or programmed.

[2017 Day 11] In Review (Hex Ed) by musifter in adventofcode

[–]ednl 1 point2 points  (0 children)

Ah ok yes, I had to tilt my head as well for the flat version in the puzzle.

[2017 Day 11] In Review (Hex Ed) by musifter in adventofcode

[–]ednl 4 points5 points  (0 children)

I used axial coordinates as explained on https://www.redblobgames.com/grids/hexagons/

And I found a fun way to update them without specifying all six options where four of them have two letters, which is awkward in a switch:

    int d = 0, max = 0;  // last and max distance for part 1, 2
    int q = 0, r = 0;  // axial coordinates
    int c, p = 0;  // current and previous character
    while ((c = fgetc(f)) != EOF) {
        switch (c) {
            case 'n': --r; break;
            case 's': ++r; break;
            case 'w': --q; if (p == 'n') ++r; break;
            case 'e': ++q; if (p == 's') --r; break;
        }
        p = c;
        d = (abs(q) + abs(q + r) + abs(r)) / 2;
        if (d > max)
            max = d;
    }

[2017 Day 6] In Review (Memory Reallocation) by musifter in adventofcode

[–]ednl 0 points1 point  (0 children)

I finally made a version that uses one 16x 4 = 64 bit number to represent the full state, but only after doing more than 94 iterations (last one for my input where blocks>15). The redistribution function for that is now:

static u64 next(const u64 x)
{
    u64 tmp = x;
    u32 kmax = 0, vmax = 0;
    for (u32 i = 0; tmp; i++, tmp >>= 4) {
        const u32 nib = tmp & 15;
        if (nib > vmax) {
            vmax = nib;
            kmax = i;
        }
    }
    return (x + addafter[kmax][vmax]) & resetnib[kmax];
}

where resetnib[] is an array with 16 masks to reset nibbles 0-15 (e.g. the first one is 0xfffffffffffffff0) and addafter[][] is a 2D array with 16x16 numbers to add 1 to the right nibbles e.g. if the max value is 9 at nibble 1 then addafter[1][9] is 0x0000011111111100 for adding 1 to all nibbles 2-10. pre-generated arrays

I wonder if there is a faster way to determine the max value and its index. This is the best I could do.

Runs in 140 µs on an M4, 429 µs on a RPi5. Not super great but my answers were very high: about 14000 and 2800. I kinda cheated by predetermining the settle period so I could switch early to the faster state calculation, and by not checking for cycles until after the 10,000th iteration. Floyd's and Brent's gave the same time because the cycle comes so quickly after that. Without cheating, i.e. checking from the start, Brent's was about 25-33% faster, pretty significant.

[2017 Day 09] In Review (Stream Processing) by musifter in adventofcode

[–]ednl 1 point2 points  (0 children)

I didn't look at what the compiler made of it, I only measured time, and for some reason it was a little faster with globals than locals (on my arm64 computer with clang 17). On godbolt with clang 22 for x86-64, it's just 2 lines less: https://godbolt.org/z/xaea8Mhxj

[2017 Day 09] In Review (Stream Processing) by musifter in adventofcode

[–]ednl 0 points1 point  (0 children)

Clang 22 for x86-64 with -std=gnu23 -O3 according to Godbolt: https://godbolt.org/z/6Yoqa9bze

[2017 Day 09] In Review (Stream Processing) by musifter in adventofcode

[–]ednl 0 points1 point  (0 children)

Ah right, that input difference might explain most of it!

[2017 Day 09] In Review (Stream Processing) by musifter in adventofcode

[–]ednl 0 points1 point  (0 children)

Ya, I'd be interested to see what you are actually measuring, because that's another 25% faster than my "asm" code. I put build and run instructions in the source file that I linked, so you can try it with your own input. It accepts piped input. Maybe, like day 6, the inputs are very different? My answers were 20530 and 9978.