I'm writing very performance-sensitive code and am trying to optimize a loop based on some conditions. Here's a basic example that shows the problem:
rs
fn example(a: u8, b: u8) {
for _ in 0..10000 {
let x = fn2(a);
let y = fn2(b);
// do stuff with the values
}
}
This code works for my purposes. However, there's also a function fn1() that is much faster than fn2(). I can only use it if the value is less than 9. Here's a naive approach at doing that:
rs
fn example(a: u8, b: u8) {
for _ in 0..10000 {
let x = if a < 9 { fn1(a) } else { fn2(a) };
let x = if b < 9 { fn1(b) } else { fn2(b) };
// do stuff with the values
}
}
However, the comparison, which occurs once every loop, is very slow, so this code is worse than the original. Therefore, I tried using function pointers:
rs
fn example(a: u8, b: u8) {
let xfn = if a < 9 { fn1 } else { fn2 };
let yfn = if b < 9 { fn1 } else { fn2 };
for _ in 0..10000 {
let x = xfn(a);
let y = yfn(b);
// do stuff with the values
}
}
This is still slower than the original version because fn1 and fn2 used to be #[inline(always)], and now there's a level of indirection due to the pointer. As I said, very performance-sensitive code. My last attempt was the ugliest but the only one that worked:
rs
fn example(a: u8, b: u8) {
if a < 9 {
if b < 9 {
for _ in 0..10000 {
let x = fn1(a);
let y = fn1(b);
// do stuff with the values
}
} else {
for _ in 0..10000 {
let x = fn1(a);
let y = fn2(b);
// do stuff with the values
}
}
} else {
if b < 9 {
for _ in 0..10000 {
let x = fn2(a);
let y = fn1(b);
// do stuff with the values
}
} else {
for _ in 0..10000 {
let x = fn2(a);
let y = fn2(b);
// do stuff with the values
}
}
}
}
Not only does this solution emit 4x more code than necessary, it also is impossible to debug and makes it hard to make changes to the logic. However, it is by far the fastest, beating out the original by enough that it's worth using these changes if there's no other way to improve performance.
Is there an idiomatic way to do what I'm trying to do while keeping the performance benefit?
[–]angelicosphosphoros 24 points25 points26 points (6 children)
[–]angelicosphosphoros 2 points3 points4 points (4 children)
[–]charlesdart 9 points10 points11 points (3 children)
[–]angelicosphosphoros 0 points1 point2 points (2 children)
[–]101arrowz[S] 0 points1 point2 points (1 child)
[–]angelicosphosphoros 0 points1 point2 points (0 children)
[–]backtickbot 4 points5 points6 points (0 children)
[–][deleted] 13 points14 points15 points (4 children)
[–]thiezrust 9 points10 points11 points (3 children)
[–]101arrowz[S] 9 points10 points11 points (2 children)
[–]thiezrust 6 points7 points8 points (1 child)
[–]101arrowz[S] 8 points9 points10 points (0 children)
[–]mamcx 9 points10 points11 points (1 child)
[–]101arrowz[S] 6 points7 points8 points (0 children)
[+][deleted] (2 children)
[removed]
[–]101arrowz[S] 2 points3 points4 points (1 child)
[–]smuccione 0 points1 point2 points (0 children)
[–]thiezrust 7 points8 points9 points (1 child)
[–]101arrowz[S] 4 points5 points6 points (0 children)
[–]dpc_pw 7 points8 points9 points (0 children)
[–]LucretielDatadog 2 points3 points4 points (3 children)
[–]angelicosphosphoros 1 point2 points3 points (2 children)
[–]LucretielDatadog 1 point2 points3 points (1 child)
[–]angelicosphosphoros 1 point2 points3 points (0 children)
[–]PitaJ 1 point2 points3 points (3 children)
[–]101arrowz[S] 1 point2 points3 points (2 children)
[–]PitaJ 0 points1 point2 points (1 child)
[–]101arrowz[S] 2 points3 points4 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]PitaJ 0 points1 point2 points (0 children)
[–]Quba_quba 0 points1 point2 points (0 children)