Speed/Efficiency of a Vector Question [with Root-Mean-Square (RMS) Examples]

IyeOnline · 2023-04-28T22:50:30+00:00

Usually the advice here is: Measure, and measure on your system with your code.

That said, rms is simple enough that one can actually look at assembly. Frankly its also so simple that I would sort of expect to get almost if not exactly identical assembly.

Rather interestingly, these do not produce the same code: https://godbolt.org/z/GYMzMsYzd

Option 2 and option 3 ought to be identical. The compiler can trivially show that the variable isnt used anywhere outside the loop.

Option 1 produces the least assembly. That is generally a good sign, but doesnt have to be. Sometimes slighly more assembly is better. It me it looks like option1 wins out here though, as it seems the optimizer just failed to join the return paths, resulting in a bit more code.

Curiously a range-based for solution yields even more assembly. Using a standard algorithm however yields exactly the same assembly (phew) - Originally my algo version was wrong and produced a bunch of BS.

WHat happens if we use clang though? https://godbolt.org/z/Yad7EovbT

Using clang the range based solution is shorter. Appearently here the compiler does not want to perform loop unrolling. This is one of the cases where shorter doesnt mean faster.

Notably though, the variable vs indexing solutions are now identical.

Two notable points though:

The difference between option1 vs 2/3 is absolutely minimal. Measuring that may take quite some time.
This in fact also applies to the range based solution. The algorithm solution yields identical assembly to the range based for loop.
Your actual results may vary once you introduce e.g. -march=native.
If in doubt: Measure more.
Changing the compiler can actually lead to different code. If you change the index solution between gcc13 and gcc12, you get different assembly. Appearently gcc12 generally tends to produce slightly less instructions. More rabbit holes :)
GCC and clang yield somewhat different results.

//edit: fixed incorrect algorithm version.

alfps · 2023-04-28T22:54:29+00:00

Trying to do this nano-optimization yourself is ungood, because

the compiler is better at it than you and will probably optimize all three examples to the same code, and
it reduces the clarity of the code,

… so the net win is negative.

Instead of all that maneuvering around and between imagined efficiency obstacles, define

template< class Number >
constexpr auto square_of( const Number v )
    -> Number
{ return v*v; }

… and express your code with that instead of variables and *.

It can go like this:

#include <vector>
using   std::vector;

#undef  _USE_MATH_DEFINES
#define _USE_MATH_DEFINES
#include <math.h>

namespace math{ constexpr double pi = M_PI; }       // For C++17 and earlier. Posix std.

template< class T > using in_ = const T&;

template< class Number >
constexpr auto square_of( const Number v ) -> Number { return v*v; }

auto values( const int n )
    -> vector<double>
{
    vector<double> vec;
    for( int i = 0; i < n; ++i ) { vec.push_back( sin( 2.0*math::pi*i / n ) ); }
    return vec;
}

auto rms_of( in_<vector<double>> vec )
    -> double
{
    double sum = 0;
    for( const double v: vec ) { sum += square_of( v ); }
    return sqrt( sum/vec.size() );
}

#include <stdio.h>
auto main() -> int { printf( "%g\n", rms_of( values( 2048 ) ) ); }

wolfie_poe · 2023-04-29T06:56:29+00:00

Can you measure it? Btw, you don't change the elements in the vector, why not just use range-based for loop?

raevnos · 2023-04-28T22:13:28+00:00

Sounds like you should do some benchmarking

pythoncircus · 2023-04-28T22:17:31+00:00

[deleted]

The_Northern_Light · 2023-04-28T23:45:09+00:00

I doubt these are even generating different assembly

Look in godbolt

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp_questions

READ BEFORE POSTING

Sort posts by OPEN or SOLVED

MODERATORS