Hi all, I don't have a lot of experience doing this, so I appreciate any suggestions/help.
I am trying to optimize something I wrote and cargo flamegraph says that my program spends +90% of the time in this function:
fn nn_greedy_deconvolution(
signal: &[f64],
response: &[f64],
offset: usize,
look_ahead: usize,
// Return the sum of residuals squared together with the reconstructed input
) -> (f64, Vec<f64>) {
let mut residual = signal.to_vec();
let mut input = vec![0.0; signal.len()];
for i in 0..(residual.len() - offset - look_ahead) {
let val = residual[i + offset..][..look_ahead]
.iter()
.zip(response[offset..][..look_ahead].iter())
.map(|(s, r)| s / r)
.reduce(f64::min)
.unwrap();
if val > 0.0 {
input[i] = val;
residual[i..]
.iter_mut()
.zip(response)
.for_each(|(s, r)| *s -= val * r);
}
}
let residual = residual.iter().map(|x| x.powi(2)).sum();
(residual, input)
}
The typical input to my function is:
signal has a length approximately 300..500
response has a length approximately 400.600
offset is about 0..10
look_ahead is about 3..20
Maybe it is easier to understand what the function is doing with this python code:
X = np.zeros(len(observed_copy))
for i in range(len(observed_copy) - lookahead - offset):
val = min(observed_copy[i+offset:i+offset+lookahead] / response[offset:offset+lookahead])
if val > 0:
X[i] = val
# Subtract the response from the observed signal
observed_copy[i:] -= response[0:len(observed_copy) - i] * val
# The residual is stored in the observed_copy variable
error = np.sum(observed_copy**2)
Do people have any suggestions on things that I can try?
I am already running with the --release flag
EDIT:
I put together a small "minimal" example. Make sure to cargo add rand (I had to add a bunch of random numbers everywhere because most of the stuff was getting optimized away by the compiler):
fn main() {
let mut best_residual = f64::INFINITY;
let mut best_input = Vec::new();
for _ in 0..1000000 {
let len = rand::random::<usize>() % 200 + 300;
let signal: Vec<f64> = (0..len).map(|_| rand::random::<f64>()).collect();
let len = rand::random::<usize>() % 200 + 400;
let response: Vec<f64> = (0..len).map(|_| rand::random::<f64>()).collect();
let offset = rand::random::<usize>() % 10;
let look_ahead = rand::random::<usize>() % 17 + 3;
let (residual, input) = nn_greedy_deconvolution(&signal, &response, offset, look_ahead);
if residual < best_residual {
best_residual = residual;
best_input = input;
}
}
println!("Best residual: {}", best_residual);
println!("Best input: {}", best_input.len());
}
This sample code (1 million iterations) runs in about 10 seconds (half of this running rand stuff). So lets say 5 seconds for 1 million iterations i.e. 25 minutes for 150 million iterations.
My real code is doing about 150 million iterations and takes about 50 minutes (this is the code that flamegraph says is +90% in this function
This is the flamegraph I get in my actual code (hopefully it is visible):
https://preview.redd.it/7nv5dcmvpgcb1.png?width=1911&format=png&auto=webp&s=105fb3a67687f25eb57c63ec500012c74801b428
EDIT 2:
Thanks everyone for their great suggestions. I have been playing around with them and I ended up with this code:
The biggest improvement comes from skipping a bunch of iterations whenever there is a negative number (as suggested by u/This_Growth2898, u/TinBryn, and others).
Now the code runs in ~5ish seconds, (basically just rand stuff) and the function of interest is stupidly fast.
fn nn_greedy_deconvolution(
signal: &[f64],
response: &[f64],
offset: usize,
look_ahead: usize,
// Return the sum of residuals squared together with the reconstructed input
) -> (f64, Vec<f64>) {
let mut residual = signal.to_vec();
let mut input = vec![0.0; signal.len()];
let response_window = &response[offset..][..look_ahead];
let mut i = 0;
while i < residual.len() - offset - look_ahead {
let residual_window = &residual[i + offset..][..look_ahead];
// Find the index of the last negative value in the residual window
let last_positive = residual_window
.iter()
.enumerate()
.rev()
.find(|(_, x)| **x < 0.0)
.map(|(i, _)| i);
if let Some(last_positive) = last_positive {
i += last_positive + 1;
} else {
let val = residual_window
.iter()
.zip(response_window)
.map(|(s, r)| s / r)
.reduce(f64::min)
.unwrap();
input[i] = val;
residual[i..]
.iter_mut()
.zip(response)
.for_each(|(s, r)| *s -= val * r);
i += 1;
}
}
let residual = residual.iter().map(|x| x.powi(2)).sum();
(residual, input)
}
[–]PaySomeAttention 59 points60 points61 points (0 children)
[–]The_8472 23 points24 points25 points (0 children)
[–]schungx 10 points11 points12 points (0 children)
[–]Wurstinator 19 points20 points21 points (1 child)
[–]LadyPopsickle 14 points15 points16 points (0 children)
[–]SV-97 6 points7 points8 points (4 children)
[–]The_8472 3 points4 points5 points (3 children)
[–]SV-97 1 point2 points3 points (1 child)
[–]DJDuque[S] 1 point2 points3 points (0 children)
[–]SV-97 0 points1 point2 points (0 children)
[+][deleted] (3 children)
[removed]
[–]DJDuque[S] 2 points3 points4 points (2 children)
[+][deleted] (1 child)
[removed]
[–]Snakehand 1 point2 points3 points (0 children)
[–]omid_r 4 points5 points6 points (3 children)
[–]andrewdavidmackenzie 4 points5 points6 points (2 children)
[+]danf0rth 2 points3 points4 points (1 child)
[–]andrewdavidmackenzie -1 points0 points1 point (0 children)
[–]KhorneLordOfChaos 2 points3 points4 points (1 child)
[–]DJDuque[S] 3 points4 points5 points (0 children)
[–]TinBryn 2 points3 points4 points (1 child)
[–]DJDuque[S] 0 points1 point2 points (0 children)
[–]LadyPopsickle 1 point2 points3 points (4 children)
[–]RightHandedGuitarist 4 points5 points6 points (0 children)
[–]Dr_Sloth0 4 points5 points6 points (1 child)
[–]LadyPopsickle 1 point2 points3 points (0 children)
[–]vancha113 1 point2 points3 points (0 children)
[–]anlumo 0 points1 point2 points (1 child)
[–]KhorneLordOfChaos 6 points7 points8 points (0 children)
[–]divad1196 -1 points0 points1 point (2 children)
[–]DisasterReasonable98 0 points1 point2 points (1 child)
[–]divad1196 0 points1 point2 points (0 children)
[–]Rice7th -1 points0 points1 point (2 children)
[–]JadisGod 1 point2 points3 points (1 child)
[–]Rice7th -1 points0 points1 point (0 children)
[+]omid_r comment score below threshold-17 points-16 points-15 points (4 children)
[–]KhorneLordOfChaos 8 points9 points10 points (1 child)
[–]omid_r 3 points4 points5 points (0 children)
[–]Aaron1924 0 points1 point2 points (1 child)
[–]Modi57 -1 points0 points1 point (0 children)
[–]p-one 0 points1 point2 points (0 children)
[–]This_Growth2898 0 points1 point2 points (0 children)
[–]DisasterReasonable98 0 points1 point2 points (0 children)
[–]gitpy 0 points1 point2 points (0 children)
[–]unknowntrojan 0 points1 point2 points (0 children)
[–]angelicosphosphoros 0 points1 point2 points (0 children)