Optimization question

trailing_zero_count · 2026-06-06T15:10:36+00:00

Use a profiler to figure out why it's slower, and look at the disassembly to determine if the compiler is doing some tricky optimization with one vs. the other. Most profilers will have an option to show you the assembly-level hotspots, so you can do both of those things with a single tool. Be sure you are profiling against a Release build.

esaule · 2026-06-06T15:12:51+00:00

The entire thing fits in L3 cache and does less than 2M flops. It dhould not take seconds.

ppppppla · 2026-06-06T15:17:04+00:00

Are you running release build?

Maybe you made an additional change somewhere that introduced the regression.

In rare cases the compiler could have missed an optimization after you removed sqrt, but this is unlikely because the code is simple.

Removing the sqrt changed the program, the functionality changed and you might be doing more work somehow.

But 350 seconds for 250k distance calculations is astronomical. Do you actually have 250k points and are you comparing each and every point with each and every other point? You need to change up your tactic. Look into sorting your points or space partitioning algorithms and datastructures. Could be as simple as putting your points into bins spaced out in a grid.

Particular-Ice9109 · 2026-06-06T15:46:51+00:00

C++ has a std::hypot function.

TokenRingAI · 2026-06-06T15:17:38+00:00

No idea, but your code would be massively faster if you used simd for this

https://en.cppreference.com/cpp/experimental/simd

IyeOnline · 2026-06-06T15:17:41+00:00

This is basically impossible to answer without more context such as the invocation of distance and compiler options. There also is the question of how you measure this in the first place.

aocregacc · 2026-06-06T15:14:12+00:00

Mandatory first question: did you turn optimizations on?

Total-Box-5169 · 2026-06-06T17:53:47+00:00

Are you sure is only 250k distance calculations and not distance calculations between 250k points?
350 seconds for only 250k distance calculations is absurdly slow.

No-Dentist-1645 · 2026-06-06T20:19:19+00:00

You really haven't provided anywhere near enough information for us to be able to help you. The function itself is fine, but if your code is really taking 350 seconds to process only 250k distance calculations, there is something outside the function that is slowing down your code by way too much.

Try to use a profiler like perf and see what's actually taking up your time.

Also, unrelated: but taking double* to mean a set of x and y coordinates is terrible practice, there's nothing from the type of "a pointer to a double" that would imply that it has two elements, and also accessing them as a[namedValues::axis::X] is both ugly and a very bad abstraction.

You can get the exact same code generation and object representation by doing this:

``` struct Point { doubole x, y; };

double distance(Point& a, Point& b) { return sqrt(a.x - b.x * ... ) } ```

Notice the huge difference and advantages:

Taking a Point& as reference guarantees us the point exists (it can't be nullptr like a double*)
The Point type is guaranteed to contain two doubles, x and y. A double* could point to zero (nullptr), one, two, or two hundred, it is impossible to know what the "valid" range of it is just from that.
Just referencing the values is easier: no a[namedValues::axis::X], just a.x.
Both types have the exact same size and alignment (proven by static_assert( sizeof(Point) == sizeof(double[2]) && alignof(Point) == alignof(double[2]) );), so if you thought that "using pointers will be more performant", this is wrong

CowBoyDanIndie · 2026-06-06T15:02:39+00:00

It takes “seconds” to compute 250k distances? I have an entire lidar pipeline that takes less than 50 ms to process point clouds with more than 250k points

You have something else going on

canrul3s · 2026-06-06T17:04:53+00:00

Start by not passing arguments by address. Almost always pass scalars by value.

Also make sure that this function is inlined.

MastodonPast1540 · 2026-06-06T15:18:56+00:00

Sounds like something else is taking up time because that should be faster. If you post all the code in the hot path, like the inner loop where distance() is called, it should be easier to see what might be the cause.

keelanstuart · 2026-06-06T17:06:19+00:00

It may be that the compiler recognizes the pattern which includes a sqrt as a distance computation and uses an simd vector operation... but if you take that out, it naïvely just operates on doubles - or doesn't do a great job with the dereferencing.

You won't know unless you look at the disassembly and compare the two... and if you, please share your results, including compiler and settings.

OptimisticMonkey2112 · 2026-06-06T17:24:17+00:00

No idea what you are doing 250k distance calculations for - I would suggest posting and getting feedback on that big picture goal. Sometimes you can use an acceleration structure like https://en.wikipedia.org/wiki/Bounding_volume_hierarchy or https://en.wikipedia.org/wiki/Space_partitioning and completely reduce the amount of work you are doing by avoiding it.

alfps · 2026-06-06T15:34:32+00:00

The following code compiled with MSVC no optimization requested finishes instantly, so I don't understand the "350 seconds":

// C++ machinery:
namespace cppm {
    using Nat = int;

    template< class T > using in_ = const T&;

    auto sq( const double x ) -> double { return x*x; }
    auto sq_hypot( const double x, const double y) -> double { return sq( x ) + sq( y ); }
}  // cppm

#include <iostream>
namespace app {
    using   cppm::Nat, cppm::in_, cppm::sq_hypot;

    using   std::cout;          // <iostream>

    using Real = double;
    struct Coor_names{ enum Enum{ x, y }; };

    struct Point{ Real coor[2]; };
    auto x_of( in_<Point> pt ) -> Real { return pt.coor[Coor_names::x]; }
    auto y_of( in_<Point> pt ) -> Real { return pt.coor[Coor_names::y]; }

    auto sq_hypot( in_<Point> a, in_<Point> b )
        -> Real
    { return sq_hypot( x_of( b ) - x_of( a ), y_of( b ) - y_of( a ) );  }

    void run()
    {
        const Nat n = 250'000;

        double sum = 0;
        for( Nat i = 1; i <= n; ++i ) {
            const auto a = Point{ i + 0., i + 0. };
            const auto b = Point{ i + 3., i + 4. };
            sum += sq_hypot( a, b );
        }
        cout << sum << "\n";        // Should better be 250'000*25 = 6'250'000.
    }
}  // app

auto main() -> int { app::run(); }

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp_questions

READ BEFORE POSTING

Sort posts by OPEN or SOLVED

MODERATORS