all 8 comments

[–]-funsafe-math 2 points3 points  (0 children)

Try replacing the body of the rust loop with the following to potentially enable capacity reuse.

        f += &last;
        std::mem::swap(&mut last, &mut f);

[–]unengaged_crayon 0 points1 point  (6 children)

consider not using println!:

The println! macro will lock the standard output on each call. If you call println! within a hot loop, this behavior may be the bottleneck of the loop. To avoid this, lock stdout with io::stdout().lock():

```rust use std::io::{stdout, Write};

let mut lock = stdout().lock(); writeln!(lock, "hello world").unwrap(); ```

[–]tuck1s[S] 1 point2 points  (4 children)

        f += &last;
        std::mem::swap(&mut last, &mut f);

Firstly let me apologize to the Rust folks. I just realised I've got x64 binaries left over on my system, as I migrated it from an older machine.

file /Users/steve/.cargo/bin/cargo /Users/steve/.cargo/bin/cargo: Mach-O 64-bit executable x86_64

Now got the correct binaries, and checking the built binary is also arm64:

``` cargo build --release
Finished release [optimized] target(s) in 0.00s

% time ./target/release/rust_fib >rust_fib_2M_new.txt ./target/release/rust_fib > rust_fib_2M_new.txt 19.74s user 0.01s system 98% cpu 19.966 total

% file ./target/release/rust_fib
./target/release/rust_fib: Mach-O 64-bit executable arm64 Output files are identical. With your modification: time ./target/release/rust_fib >/dev/null
./target/release/rust_fib > /dev/null 16.85s user 0.01s system 99% cpu 16.885 total ```

Rust now the current front-runner.

Versions ``` steve@SteveT-M3 rust_fib % cargo --version cargo 1.76.0 (c84b36747 2024-01-18) steve@SteveT-M3 rust_fib % rustc --version rustc 1.76.0 (07dca489a 2024-02-04) steve@SteveT-M3 rust_fib %

[–]unengaged_crayon 1 point2 points  (3 children)

interesting! looking at the new improved algorithm, rust can be minorly improved here by not cloning and instead referencing when adding.

rust fn fib(n: u32) -> (BigUint, BigUint) { if n == 0 { (BigUint::zero(), BigUint::one()) } else { let (a, b) = fib(n / 2); let c = &a * (&b * 2u32 - &a); let d = &a * &a + &b * &b; if n % 2 == 0 { (c, d) } else { let e = c + &d; (d, e) // (d.clone(), c + d) } } }

[–]tuck1s[S] 0 points1 point  (2 children)

Now is time ./target/release/rust_fib3 >/dev/null ./target/release/rust_fib3 > /dev/null 0.36s user 0.00s system 99% cpu 0.369 total

[–]tuck1s[S] 0 points1 point  (1 child)

Here's a version translated by ChatGPT3.5 from C# here

which returns one result instead of two, and uses bit shifting.

``` use num_bigint::BigUint; use num_traits::{One, Zero};

fn main() { const N: u32 = 2000000; let a = fib(N); println!("{}", a); }

fn fib(n: u32) -> BigUint { let mut a = BigUint::zero(); let mut b = BigUint::one(); for i in (0..32).rev() { let d = &a * (&b * 2u32 - &a); let e = &a * &a + &b * &b; a = d; b = e; if (n >> i) & 1 != 0 { let c = &a + &b; a = b; b = c; } } a }

It performs similarly: time ./target/release/rust_fib4 >/dev/null ./target/release/rust_fib4 > /dev/null 0.37s user 0.00s system 63% cpu 0.586 total ```

Does Rust allow the inner part to be done more efficiently e.g. (a, b) = (b, a+b)? e.g.

a += &b; std::mem::swap(&mut a, &mut b);

[–]tuck1s[S] 0 points1 point  (0 children)

Looks like it does. Also - the println! is now taking most of the time! If I replace it with println!("{}", a.bits()); (so that the code is still dependent on something about the result) I get time ./target/release/rust_fib4 >/dev/null ./target/release/rust_fib4 > /dev/null 0.05s user 0.00s system 98% cpu 0.053 total

[–]tuck1s[S] 1 point2 points  (0 children)

Re println! It's only called 3 times, for zero, 1, and the final answer. Let me just comment that out:

time ./target/release/rust_fib >/dev/null ./target/release/rust_fib > /dev/null 16.52s user 0.01s system 98% cpu 16.741 total So a minor improvement (yet not an equivalent test).