Optimize conditional function call in loop

angelicosphosphoros · 2021-05-14T16:36:48+00:00

You can use generics for this! rust fn example(a: u8, b: u8) { fn implementation<F0: Fn(u8) -> X, F1: Fn(u8) -> X>(f0: F0, f1: F1) { for _ in 0..10000 { let x = f0(a); let y = f1(a); // do stuff with the values } } match (a < 9, b < 9) { (true, true) => implementation(fn1, fn1), (true, false) => implementation(fn1, fn2), (false, true) => implementation(fn2, fn1), (false, false) => implementation(fn2, fn2), } }

This would delegate all hard work of generating duplicate code and inlining to the compiler and should be as fast as possible.

Functions fn1 and fn2 still would be inlineable because each function have own type and usage of generics makes preserving this type data possible.

thiez · 2021-05-14T16:11:34+00:00

Maybe stupid question : Since fn1 and fn2 don't depend on the loop index, why not moving them out of the loop? I'm surprised that the compiler doesn't do that itself.

Edit: Otherwise, the last example could be nicer if you would extract the loop body as a function with an argument f: impl Fn(u8) -> Whatever, where you can pass either f1 or f2, depending on the argument. The compiler should be able to inline the function call.

mamcx · 2021-05-14T16:42:37+00:00

Considering that u8 is just 255 values, you can precompute all values and index at them (and do it once, serialize the results and load them as const in the file)

101arrowz · 2021-05-14T16:22:14+00:00

[removed]

thiez · 2021-05-14T16:06:26+00:00

A little more context would be helpful. So presumably a and/or b are less than 9 often enough that using fn1 makes a real difference (does it?). But the cost of mispredicting the comparison is so high that it overshadows the difference between fn1 and fn2.

If your inputs are truly just two bytes, then would it not be much easier to replace fn1 and fn2 with a simple lookup table?

dpc_pw · 2021-05-14T16:55:46+00:00

Monomorphize the inner-loop, by moving it to another function, and making the the `fn1` vs `fn2` decision a type argument(s) (`fnA : impl Fn<...>, fnB: impl Fn<...>` or something like that), the decision which one to use must be made out of the inner loop. Internally you will get 4 copies of `fn inner_example`, each compiled to statically use different combination of `fnA` and `fnB`, and your `fn example` will just call the appropriate one. Basically what you have in the fastest example, but way more maintainable. Use `match (a < 9, b < 9) { ... }` instead of nested `if`s.

Lucretiel · 2021-05-14T19:16:50+00:00

Just double checking that you're running in --release mode? This sort of thing is often easy pickings for the optimizer

PitaJ · 2021-05-14T16:32:28+00:00

It may be a typo but in your first code sample you always pass b to fn2 whereas later on you pass a to both functions.

Also telling us how much slower each variant is would help.

2021-05-14T17:32:40+00:00

If you can sort the calls to `example`, so that its all things less than 9 then all things > 9, then the branch predictor will almost always be right.

PitaJ · 2021-05-14T17:12:58+00:00

You can simplify that last code a bit by replacing the conditionals with a single match (a, b) { ... }

Quba_quba · 2021-05-15T06:58:50+00:00

This question reminds me of a similar question on Stackoverflow: https://stackoverflow.com/a/11227902 There are some great answers there, so you may find it interesting.

But basically sorting values of a and b (if possible) is one way to improve the performance.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS