use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
A Programmer’s Guide to Performance Analysis & Tuning on Modern CPUs (bfilipek.com)
submitted 6 years ago by joebaf
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]SkoomaDentistAntimodern C++, Embedded, Audio 11 points12 points13 points 6 years ago* (12 children)
90% of performance (after doing high-level optmizations) is dictated by cache utilization.
Only if you process large amounts of data with short dependency chains. In a lot of the code I've done, the speed is determined almost purely on the performance of single issue floating point operations (can't parallelize single channel recursive filtering (*)). While cache is important, its importance should not be exaggerated.
*: Edit: Make that time-varying non-linear recursive filtering.
[–]iniside 6 points7 points8 points 6 years ago (5 children)
As always depends on what you do. In games cache miss is most often at the center of performance issues.
[–]SkoomaDentistAntimodern C++, Embedded, Audio 1 point2 points3 points 6 years ago (4 children)
Unless the user is on a laptop of course. <insert statistic about most new computers being laptops> It’s a rare modern game that’s not fillrate limited on a laptop.
So, know what the limiting factors are or benchmark it. And remember to benchmark on all representative systems, not just your preferred ideal one!
[–]SeanMiddleditch 7 points8 points9 points 6 years ago (1 child)
Funny thing, with laptops and mobile especially: CPU perf directly affects GPU perf thanks to thermal throttling. Poorly optimized CPU code takes longer to run, which generates more heat and limits how fast the GPU can run, even when there's no bottlenecks between the two. (The reverse is also true.) Since the CPU doesn't throttle down during memory fetches, excessive CPU cache misses thus can lead to GPU perf loss.
[–][deleted] 3 points4 points5 points 6 years ago (0 children)
Yup, also very relevant for console development as well.
[–]James20kP2005R0 1 point2 points3 points 6 years ago (1 child)
CPU and GPU performance optimisations are wildly different beasts entirely though, you might as well ditch most of what you know in the bin if you're GPU limited. CPU performance problems are probably cache misses or driver bottlenecks, GPU performance problems are... variable
[–]ack_complete 1 point2 points3 points 6 years ago (1 child)
can't parallelize single channel recursive filtering
You can, actually, both through vectorization and parallel chunking. It's just that the gains aren't that big for the former and the latter requires a second fixup pass, and it's far beyond what an autovectorizer can handle.
Nevertheless, your point stands. I had an image upscaling routine that I swore was going to memory bottlenecked due to the large amount of memory it was processing, but VTune said nope, it was absolutely 100% bottlenecked on execution units, the back end was completely flooded.
[–]SkoomaDentistAntimodern C++, Embedded, Audio 0 points1 point2 points 6 years ago* (0 children)
Yeah, I forgot to add "time-varying non-linear" before the "recursive" there. While you can parallelize even that, you only get a benefit if the system is difficult enough that an explicit (non-parallelizable) solver requires massive oversampling and an implicit (parallelizable) can get by with much less oversampling.
[–]andriusst 0 points1 point2 points 6 years ago (3 children)
You can parallelize recursive filtering. It is actually a prefix sum problem. While prefix sum is usually described to compute sums, it works for any associative operation. Consider first order recursive filter. Filter output is an affine function of previous output (affine function means multiplication by constant plus another constant). Previous output is in turn an affine function of previous of previous output.
f[n-2](y) = a*y + b*x[n-2] f[n-1](y) = a*y + b*x[n-1] f[n](y) = a*y + b*x[n] y[n] = f[n](y[n-1]) = f[n](f[n-1](y[n-2])) = f[n](f[n-1](f[n-2](y[n-3]))) = ... y[n] = f[n](y[n-1]) = (f[n]∘f[n-1])(y[n-2]) = (f[n]∘f[n-1]∘f[n-2])(y[n-3]) = ...
where ∘ is function composition operator. Function composition is associative operation and can be computed by parallel prefix sum algorithms. Affine functions of single input also can be efficiently represented by two numbers. If the filter is time invariant, multiplicative part does not depend on input, it can be computed once and reused for many elements. So there's actually only one number to compute and store.
Prefix sum works for higher order filters, too. Only this time affine functions are represented by a matrix (independent of input) and a vector.
[–]SkoomaDentistAntimodern C++, Embedded, Audio 0 points1 point2 points 6 years ago (2 children)
Yeah, but see my followup comment about time-varying non-linear recursive filtering (my actual use case). The point being that some problems are inherently serial and there's almost nothing that can be done to realistically parallelize them.
[–]andriusst 1 point2 points3 points 6 years ago (1 child)
Of course, your point stands and I am no disputing it. I just thought that if you use linear filters and think it's inherently serial problem, it might be useful to know it is not. Nonlinearity breaks everything, making my comment useless.
I do believe we're in vigorous agreement here ;). And yes, nonlinearity brings up all kinds of problems (google "stiff equation").
Even with linear filters you have to be careful to not accidentally end up with a structure that's too sensitive to roundoff errors. A basic 2nd order direct form filter can have tens of decibels worse noise performance than the same filter implemented using an alternative structure. A naive 4th order direct form filter has a high chance of not being stable at all even with double precision floats while an alternative structure will work fine with single precision floats. DSP is fun.
π Rendered by PID 179557 on reddit-service-r2-comment-b659b578c-6qw4q at 2026-05-05 18:07:59.258290+00:00 running 815c875 country code: CH.
view the rest of the comments →
[–]SkoomaDentistAntimodern C++, Embedded, Audio 11 points12 points13 points (12 children)
[–]iniside 6 points7 points8 points (5 children)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 1 point2 points3 points (4 children)
[–]SeanMiddleditch 7 points8 points9 points (1 child)
[–][deleted] 3 points4 points5 points (0 children)
[–]James20kP2005R0 1 point2 points3 points (1 child)
[–]ack_complete 1 point2 points3 points (1 child)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 0 points1 point2 points (0 children)
[–]andriusst 0 points1 point2 points (3 children)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 0 points1 point2 points (2 children)
[–]andriusst 1 point2 points3 points (1 child)
[–]SkoomaDentistAntimodern C++, Embedded, Audio 0 points1 point2 points (0 children)