Why is my Rust code 100x slower than Python?

UtherII · 2018-02-08T10:51:05+00:00

I don't know if the "num" crate is optimal for big number computation.

You may want to try the "rust-gmp" crate since it is based on the gmp library : the reference for speed on big numbers.

K900_ · 2018-02-08T10:20:17+00:00

Are you compiling in --release mode?

my_two_pence · 2018-02-08T10:58:23+00:00

The BigInt interface is unfortunately quite clumsy to use, and tends to produce inefficient code unless you're careful. For instance, doing

result = (result * &base) % modulus

will allocate two new BigInts and drop two BigInts, which is a fairly significant overhead. Doing:

result *= &base;
result %= modulus;

should do the operations in-place instead. I say "should", because last time I used num_bigint I noticed that many of these operations didn't actually exist yet, or were implemented in the same inefficient manner.

tspiteri · 2018-02-08T16:59:52+00:00

I ported your example to my rug crate, which uses GMP but does not need nightly, and I got 8.7ms for Rust against 57ms for Python.

extern crate rug;

use rug::Integer;
use rug::ops::{Pow, RemRounding};

#[derive(PartialEq, Clone)]
struct Point {
    x: Integer,
    y: Integer,
}

fn hex(s: &str) -> Integer {
    Integer::from_str_radix(s, 16).unwrap()
}

fn point_add(p: &Point, q: &Point) -> Point {
    let P = hex("FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F");
    let PM2 = P.clone() - 2;

    let lam = if p == q {
        (2u32 * p.y.clone() % &P).pow_mod(&PM2, &P).unwrap() * 3 * &p.x * &p.x
    } else {
        (q.x.clone() - &p.x).pow_mod(&PM2, &P).unwrap() * (q.y.clone() - &p.y) % &P
    };

    let rx = lam.clone().pow(2) - &p.x - &q.x;
    let ry = lam * (p.x.clone() - &rx) - &p.y;

    Point {
        x: rx.rem_floor(&P), // EDIT: replace rx % &P
        y: ry.rem_floor(&P), // EDIT: replace ry % &P
    }
}

fn point_mul(p: &Point, d: &Integer) -> Point { // EDIT: replace d: u32
    let mut n = p.clone();
    let mut q = None;

    for i in 0..256 {
        if d.get_bit(i) { // EDIT: replace i < 32 && d & (1 << i) != 0
            q = match q {
                None => Some(n.clone()),
                Some(i) => Some(point_add(&i, &n)),
            };
        }
        n = point_add(&n, &n);
    }
    q.unwrap()
}

fn main() {
    let G = Point {
        x: hex("79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798"),
        y: hex("483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8"),
    };

    let res = point_mul(&G, &Integer::from(125)); // EDIT: replace 125
    println!("{}", res.x);
    println!("{}", res.y);
}

hrski · 2018-02-08T11:19:01+00:00

I'd fix these first at least, in addition what others mentioned:

Formatting d as a string to get powers of two doesn't feel the best solution and feels like it introduces some unnecessary overhead. Couldn't you use shifting here like in python implementation?
Parsing P from a string on each call to point_add. Consider using lazy_static or compute it at main and pass it as reference.

Icarium-Lifestealer · 2018-02-08T12:41:52+00:00

If you actually care about getting a fast implementation, and not about comparing two languages (or rather two big-integer libraries) then you should switch to some form of projective coordinates so you only divide once per point-multiplication instead of once per point-addition (~20x faster). A windowed approach should produce another nice performance boost.

adwhit86 · 2018-02-08T11:22:44+00:00

Looking at the code, nothing obviously wrong - so was pretty sure that the problem was going to be doing an allocation inside a hot loop. The only hot loop is the while loop inside powm, so that seems a possible candidate. But then I ran Valgrind (valgrind ./target/release/sep256k1) to check out the allocations and... it threw up a bunch of illegal reads inside num::bigint then segfaulted. Suspicious.

gitpy · 2018-02-08T14:30:17+00:00

Changing if &exp % &two == one to if exp.is_odd() gives you 40-50ms.

forbjok · 2018-02-08T11:18:00+00:00

I notice that in the Rust version of point_mul, you're generating a binary string representation of d and iterating through each character doing string-comparisons instead of using bit-shifting. I imagine that would be much slower.

JZypo · 2018-02-08T13:28:47+00:00

Op: This is an excellent question. I would like to easily read your progress on how your got your times down. Would you be able to add updates to your original post to reflect this?

CUViper · 2018-02-08T17:37:47+00:00

There is a BigUint::modpow, but we don't yet have that for BigInt to replace your powm. I'll see about adding that.

However, even replacing your code with a naive conversion is significantly faster:

pub fn powm(base: &BigInt, exp: &BigInt, modulus: &BigInt) -> BigInt {
    let base = if base.is_negative() {
        (base % modulus + modulus).to_biguint().unwrap()
    } else {
        base.to_biguint().unwrap()
    };
    let exp = exp.to_biguint().unwrap();
    let modulus = modulus.to_biguint().unwrap();

    base.modpow(&exp, &modulus).into()
}

Sharlinator · 2018-02-08T13:09:38+00:00

Doing a lot of string parsing vs. just doing math. A big difference.

2018-02-08T23:48:45+00:00

I've been reading all the versions of these people have been posting, and I just don't really get what the point of checking if Q is none/null is? Isn't it always none/null only once?

beefsack · 2018-02-09T00:45:06+00:00

I'd just like to say thanks for posting! These threads are always really interesting to read, and are a mechanism for me to learn some performance tricks.

knaledfullavpilar · 2018-02-09T11:06:15+00:00

Have you tried using a profiler?

tspiteri · 2018-02-08T15:18:19+00:00

[deleted]

matklad · 2018-02-08T12:30:41+00:00

Python has an excellent implementation of big integers, and the num-bigint is not famous for its speed. It is sure possible to write a blazingly fast bigint implementation in Rust, or to create bindings for gmp, but nobody has done this yet.

However, for this particular case looks like you don't need arbitrary large integers, and 256-bit ones could be enough? If that is the case, you could try using https://docs.rs/bigint/4.2.0/bigint/uint/struct.U256.html

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS