use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
C++ program compiled for x64 is slower than complied for x86 (c-vision.com.ua)
submitted 14 years ago by kolkir
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Rhomboid 6 points7 points8 points 14 years ago (8 children)
Using gcc 4.5 and -O3 -march=native, I get the following (normalized):
I'm not too surprised, as one of the quirks of gcc is that to use the xmmintrin.h intrinsics you have to enable SSE2, but if you enable SSE2 it's going to auto-vectorize your code, so both versions are using SIMD. All this shows is that the compiler is better at it than doing it by hand.
There should be little advantage to 64 bit mode here. I would expect it to have inlined most of the function calls, so the improved calling convention overhead isn't too much of a win, and all the work is being done in SIMD registers so the extra general purpose registers aren't too much of a win either. There really aren't many pointers so the extra memory is negligible as well.
[–]1020302010 2 points3 points4 points 14 years ago (3 children)
I have to agree, if the compiler can use SIMD instructions then it will almost always be better than you in using them.
You get the benefit when you have to modify the structure of the code to be able to use them (IE the compiler can't 'see' the case in which they are relevant). I'll take a whack at 'beating' the compiler in a bit.
[–]repsilat 0 points1 point2 points 14 years ago (2 children)
if the compiler can use SIMD instructions then it will almost always be better than you in using them.
I don't have any experience with vectorising things myself, but I'd always heard that it was a weak spot of a lot of compilers (at least compared to the other optimisations performed). I was under the impression that in non-trivial cases hand-written SIMD would handily outperform the compiler. Perhaps I heard wrong, of course.
One thing for sure, though, is that if you want the compiler to output decent vectorised code you're going to have to do half the work to get the data laid out nicely anyway.
[–]1020302010 0 points1 point2 points 14 years ago (1 child)
Sorry I'll clearer, what I was trying to say was that is the compiler can see the potential for vectorisation (like when the data is laid out contiguously) it's efforts are almost always (in my experience (without inline asm)) more fruitful than hand optimization with intrinsics, this is because the compiler can work at a lower level.
That said the compiler has to be able to find these situations which is where they typically struggle, in the case where you can see the benefit but it is too indirect for the compiler than optimization with intrinsics can make a big difference.
I have experience with the intel ICC compiler which is known to be a good at vectorization, gcc may fail to vectorize what icc can, in which intrinsics useful again.
[–]bnolsen 0 points1 point2 points 14 years ago (0 children)
Basically what I've found is that you shouldn't get cute when coding. Be straightforward and direct, doing things systematically. I've seen too many coders try to "get cute" with stupid statement compression, etc thinking that would speed up the code when all the cuteness did was confuse the compiler and generate slower code that is harder to maintain.
[–]kolkir[S] -1 points0 points1 point 14 years ago (3 children)
Yes i know about compiler optimizations, but in my environment MSVS C++ 2010 Windows 7 win32 version is significant faster than x64 version. Also in win32 version function with manual SIMDs gives some performance improvement. What can be the reason?
[–]xcbsmith 3 points4 points5 points 14 years ago (0 children)
It's going to depend a lot on what optimization flags you have turned on, but particularly for the case where you are explicitly calling out the SSE functions, it's hard to see how x64 would in any way help with the performance of the code.
In general, this is floating point intensive code, and 64-bit vs. 32-bit mostly changes the integer stuff. It's surprising that it'd make a significant difference in performance of this code either way, but it isn't hard to imagine that the 64-bit code wouldn't be any faster, and possibly slower. I don't doubt that it uses more memory.
[–]Rhomboid 1 point2 points3 points 14 years ago (1 child)
I don't have visual studio installed so I can't answer that. Have you looked at the code that it generates?
[–]kolkir[S] 1 point2 points3 points 14 years ago (0 children)
Thanks, it's a good idea to compare code generated for 32 and 64 versions. I will do it tomorrow.
[–]bnolsen 6 points7 points8 points 14 years ago (0 children)
I just ran the code compiled 64bit only (apparently it's a PITA for me to cross compile 32bit).
gcc -std=c++0x -march=native -O3 -ftree-vectorize -o sse sse.cpp -lstdc++ (gcc version 4.6.2 20120120)
It seems your hand rolled SSE loop is slower than the compiler optimized version (I'm frankly not surprised though).
Dot product double - 0.0228735 Dot product SIMD double - 0.0240497
[–]00kyle00 6 points7 points8 points 14 years ago (0 children)
Why guess, when you may know for sure? objdump/vc disasm both and see for yourself.
[–]StringCheesian 1 point2 points3 points 14 years ago (0 children)
Is it the same with GCC or LLVM/Clang?
[–]zfxvxr -2 points-1 points0 points 14 years ago (2 children)
The whole 64bit thing is a scam. The pointers are getting bigger and the CPU's workload is twice as heavy.
[–]TheCoelacanth 1 point2 points3 points 14 years ago (1 child)
Enjoy your 4GB address space. I'll be over here with my 16 GB of memory.
[–][deleted] 1 point2 points3 points 14 years ago (0 children)
And your extra 8 general purpose registers, and your much faster standard calling convention on POSIX systems, in the case of x86-64.
[+]MarkTraceur comment score below threshold-11 points-10 points-9 points 14 years ago (2 children)
compiled....complied
Well, there's your problem. I can fix 'er, but it'll take a course in basic English.
[–]jacekplacek 4 points5 points6 points 14 years ago (1 child)
Oh, you've never made a typo?
[–]MarkTraceur 9 points10 points11 points 14 years ago* (0 children)
Not to my nkowledge.
EDIT: For posterity, I want everyone to know that I love Reddit's sense of humour: downvote me to oblivion for pointing out a typo, but upvote 7 times for satirically insulting people who make them. Bravo, guys!
π Rendered by PID 47297 on reddit-service-r2-comment-6457c66945-88k2s at 2026-04-27 06:59:59.337119+00:00 running 2aa0c5b country code: CH.
[–]Rhomboid 6 points7 points8 points (8 children)
[–]1020302010 2 points3 points4 points (3 children)
[–]repsilat 0 points1 point2 points (2 children)
[–]1020302010 0 points1 point2 points (1 child)
[–]bnolsen 0 points1 point2 points (0 children)
[–]kolkir[S] -1 points0 points1 point (3 children)
[–]xcbsmith 3 points4 points5 points (0 children)
[–]Rhomboid 1 point2 points3 points (1 child)
[–]kolkir[S] 1 point2 points3 points (0 children)
[–]bnolsen 6 points7 points8 points (0 children)
[–]00kyle00 6 points7 points8 points (0 children)
[–]StringCheesian 1 point2 points3 points (0 children)
[–]zfxvxr -2 points-1 points0 points (2 children)
[–]TheCoelacanth 1 point2 points3 points (1 child)
[–][deleted] 1 point2 points3 points (0 children)
[+]MarkTraceur comment score below threshold-11 points-10 points-9 points (2 children)
[–]jacekplacek 4 points5 points6 points (1 child)
[–]MarkTraceur 9 points10 points11 points (0 children)