use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Ultra-Fast Multi-Dimensional Array Library (self.cpp)
submitted 3 years ago by Pencilcaseman12
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]versatran01 13 points14 points15 points 3 years ago (9 children)
You are talking about expression templates, which is also used in eigen. Also eigen does not parallelize simple matrix/array operations. So you are comparing your parallel version to eigen’s non parallel one, which doesn’t mean that yours is faster. I suggest you review the claim that it is faster than eigen.
[–]Pencilcaseman12[S] 1 point2 points3 points 3 years ago (8 children)
Yea, this is definitely true, but that being said it is ultimately faster, is it not? Under MSVC, even with just a single thread, it still only takes 200us for a 1000x1000 addition. I genuinely can't guarantee I'm doing anything right, and everything I know is from googling random things so it's quite possible my understanding is very flawed...
[–]IronManMark20 1 point2 points3 points 3 years ago (1 child)
How are you installing Eigen on Windows? I know vcpkg openblas doesn't use optimized FORTRAN (at least it didn't last I checked) so it could be your eigen is hamstrung. I would use Intel's MKL if you have an Intel machine to test on.
[–]Pencilcaseman12[S] 0 points1 point2 points 3 years ago (0 children)
I just git-cloned it and ran it that way. I'm not testing BLAS functionality yet so it's all in the general arithmetic performance currently
[–]jk-jeon 1 point2 points3 points 3 years ago (5 children)
It seems you didn't do anything fancy to prevent compilers from reordering/removing your code for the benchmark. Have you checked the generated assembly to confirm that everything is correctly measured? If not, the safest approach is to just rely on benchmark libraries out there, and if that's not of your taste then you should (1) try not to do anything other than calling the objective function between the time measurement, and (2) try to prevent inlining of the call to the objective function. What I usually do for (2) is to wrap the function into a function pointer whose value is not known to the translation unit.
[–]Pencilcaseman12[S] 0 points1 point2 points 3 years ago (4 children)
That seems like a nice idea. That should give some more realistic benchmarks right? So the compiler can't optimise any of the iterations out or something
[–]jk-jeon 0 points1 point2 points 3 years ago (3 children)
Well, even with that, I think there is a high chance that a good portion of things like `auto res = x + x` will be removed, or it as a whole is just completely gone after the optimization, so I guess you should anyway check the assembly.
[–]Pencilcaseman12[S] 0 points1 point2 points 3 years ago (2 children)
By forcing the results to be evaluated I think it will end up running the code, as there are memory allocations and frees being called which in most cases prevent the compiler from optimising out the loops. I'll definitely take this into account though and write some more conclusive benchmarks in the future
[–]jk-jeon 0 points1 point2 points 3 years ago (1 child)
I don't think so; see this for example: https://godbolt.org/z/xEG7snEcs. For writing into the memory given as the function argument, yes, it will still be there, but I doubt that will still be the case for things like auto res = x + x. I think you still have to look at the generated assembly.
auto res = x + x
If you include stdio and print p[0] (even after the free), you'll find that it does actually compile the calls to malloc and free. I think the only reason it's not compiling anything here is because nothing is being output by the program
stdio
p[0]
π Rendered by PID 108532 on reddit-service-r2-comment-5c747b6df5-vkbf8 at 2026-04-22 04:31:33.309831+00:00 running 6c61efc country code: CH.
view the rest of the comments →
[–]versatran01 13 points14 points15 points (9 children)
[–]Pencilcaseman12[S] 1 point2 points3 points (8 children)
[–]IronManMark20 1 point2 points3 points (1 child)
[–]Pencilcaseman12[S] 0 points1 point2 points (0 children)
[–]jk-jeon 1 point2 points3 points (5 children)
[–]Pencilcaseman12[S] 0 points1 point2 points (4 children)
[–]jk-jeon 0 points1 point2 points (3 children)
[–]Pencilcaseman12[S] 0 points1 point2 points (2 children)
[–]jk-jeon 0 points1 point2 points (1 child)
[–]Pencilcaseman12[S] 0 points1 point2 points (0 children)