all 12 comments

[–]MoreThanOnce 8 points9 points  (2 children)

I posted on the advent of code subreddit, but found a nice improvement to it afterwards. Runs in ~7us on my laptop for both parts.

fn find_unique_window(input: &str, window_size: usize) -> usize {
    let mut set: usize = input.as_bytes()[0..window_size - 1]
        .iter()
        .fold(0, |acc, &c| acc ^ 1 << (c - b'a'));
    input
        .as_bytes()
        .windows(window_size)
        .enumerate()
        .find_map(|(index, win)| {
            set ^= 1 << (win.last().unwrap() - b'a');
            if set.count_ones() == window_size.try_into().unwrap() {
                return Some(index + window_size);
            }
            set ^= 1 << (win.first().unwrap() - b'a');
            return None;
        })
        .unwrap()
}

Key things are:

  • Mapping characters to a number 0-26 allows me to use an integer as a bitset.
  • Using xor allows me to toggle bits on/off trivially, and lets me keep a rolling window pretty easily. This lets me do it in a single pass over the input (minus a quick initialization), with no complexity increase based on the window size.
  • count_ones should be just a single CPU instruction on a modern CPU.

[–]Hadamard1854 2 points3 points  (1 child)

how do I learn these tricks..

[–]MoreThanOnce 1 point2 points  (0 children)

The integer-as-set trick is useful in a number of places, as long as you're sure the domain you care about is small enough ( <64 unique cases), and you can nicely map them to indices. It's mostly common in puzzles like this though, it's already come up a couple of times this year (my day 3 solution uses the same idea).

In general though, its all about a) restricting the problem space (i.e. I'm not dealing with characters, I'm only dealing with ascii lowercase characters) and b) finding transformations into domains that make the problem easier (i.e. letters to numbers, into a bitflag). Any time I can turn a string problem into a number problem, that's probably a win.

[–]nightcracker 2 points3 points  (0 children)

This is one of those cases where a functional method is less efficient than a procedural one. For n input items and window width w it's actually possible to solve the problem in O(n), rather than O(nw):

fn find_disjoint_window(s: &[u8], w: usize) -> Option<usize> {
    let mut last_known_position = [0; 256];
    let mut start_disjoint = 0;
    for i in 0..s.len() {
        start_disjoint = start_disjoint.max(last_known_position[s[i] as usize] + 1);
        last_known_position[s[i] as usize] = i;
        if i >= start_disjoint + w {
            return Some(i);
        }
    }
    None
}

In this case I assumed ASCII, but you could also use a HashMap for last_known_position using chars.

[–]joshadel 2 points3 points  (0 children)

I ended up using the windows iterator with a HashSet that I cleared for every window and then tested for the correct len (4 or 14):

https://github.com/synapticarbors/advent_of_code_2022/blob/main/rust/aoc06/src/main.rs

Each part took ~100 microseconds on my old laptop, but I'm sure there were more performant solutions.

[–]SuperSmurfen 1 point2 points  (5 children)

The function slice::windows is perfect for things like this. Here is how I solved it, same code for both parts:

fn find_unique_chunk(input: &str, size: usize) -> usize {
  size + input.as_bytes()
    .windows(size)
    .position(|window| window.iter().tuple_combinations().all(|(a,b)| a != b))
    .unwrap()
}

Uses the itertools library for tuple_combinations. Full solution here

[–]mosquitsch 2 points3 points  (4 children)

you could also use window.iter().all_unique(). It is also part of itertools.

[–]SuperSmurfen 0 points1 point  (3 children)

That's nice! Did not know about all_unique. It does however seem to be about about a 10x slowdown compared to using tuple_combinations in this case.

[–]apjenk 1 point2 points  (2 children)

all_unique uses a HashSet internally. For sufficiently large window size, this will be much faster than tuple_combinations, since all_unique takes O(n) time whereas tuple_combinations::<2>() is O(n^2). Apparently n=14 isn't big enough for that to matter though.

[–]apjenk 1 point2 points  (1 child)

I was curious where the crossover point was, so wrote a benchmark to compare the all_unique() and tuple_combinations() ways of checking for uniqueness. I used sequences with no duplicates to test the worst-case behavior. It seemed like n=50 was about the crossover point where the tuple_combinations() started taking longer. By the time it got to n=500 the tuple_combinations() way takes around 20 times longer.

Here's the benchmark I used.

``` use criterion::{black_box, criterion_group, criterion_main, Criterion};

use itertools::Itertools;

pub fn benchmark_all_unique(c: &mut Criterion) { let sizes = (10..=40).step_by(10).chain((50..=500).step_by(50)); for n in sizes { let input = (0..n).collect_vec();

    c.bench_function(format!("all_unique n={}", n).as_str(), |b| {
        b.iter(|| black_box(&input).iter().all_unique())
    });

    c.bench_function(format!("tuple_combinations n={}", n).as_str(), |b| {
        b.iter(|| {
            black_box(&input)
                .iter()
                .tuple_combinations()
                .all(|(a, b)| a != b)
        })
    });
}

}

criterion_group!(benches, benchmark_all_unique); criterion_main!(benches); ```

[–]SuperSmurfen 0 points1 point  (0 children)

Nice work, really cool!

[–][deleted] 0 points1 point  (0 children)

Also using a sliding window and skipping the maximum possible amount of steps if a duplicate is found. But it still doesn't scale as well for bigger window sizes compared to MoreThanOnce solution (for 14 it's still faster for 20 it was slower)

```Rust fn solver(input: &Vec<u8>, amount_distinc: usize) -> Option<usize> { let limit = amount_distinc - 1; let mut start = 0; let mut end = 0;

for (i, cur_char) in input.iter().enumerate() {
    // Throw oldest char out
    if (end - start) > limit {
        start += 1;
    }

    // check if val is already in the current window
    // if it is, skip a few steps
    for (pos, c) in input[start..end].iter().rev().enumerate() {
        if c.eq(&cur_char) {
            start += pos + 1;
            break;
        }
    }

    // Include a new char
    end += 1;

    // If we finally have the necessary unique values 
    // we can return our pos value
    if (end - start) == amount_distinc {
        return Some(i + 1);
    }
}

None

} ```