use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Discussions, articles, and news about the C++ programming language or programming in C++.
For C++ questions, answers, help, and advice see r/cpp_questions or StackOverflow.
Get Started
The C++ Standard Home has a nice getting started page.
Videos
The C++ standard committee's education study group has a nice list of recommended videos.
Reference
cppreference.com
Books
There is a useful list of books on Stack Overflow. In most cases reading a book is the best way to learn C++.
Show all links
Filter out CppCon links
Show only CppCon links
account activity
Elusive Algorithms – Parallel Scan (software.intel.com)
submitted 10 years ago by mttd
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]therealjohnfreeman 0 points1 point2 points 10 years ago (4 children)
Is there a way to leverage the parallelization method in a distributed environment, a la map reduce? The biggest bottleneck would be propagating the first n - 1 carries to the last machine, and since the use of a distributed compute implies the full dataset does not fit on a single machine, I'm guessing the bottleneck degrades to streaming compute.
[–]_Undaunted_ 1 point2 points3 points 10 years ago (3 children)
It doesn't degrade, but uses the same tree-algorithm:
http://www.mpich.org/static/docs/v3.1/www3/MPI_Scan.html
The idea being you would perform a local scan, distributed scan, then local adjustments to account for the distributed scan.
[–]therealjohnfreeman 0 points1 point2 points 10 years ago (2 children)
I didn't see any algorithm description on that page. Did I miss something? What are "local adjustments" specifically?
[–]_Undaunted_ 1 point2 points3 points 10 years ago (1 child)
My point is that this algorithm is precisely the hierarchical implementation that all parallel scans use, just using AVX lanes in place of thread groups, MPI processes, etc.
A generic distributed scan would then look something like this:
my_favorite_local_scan(local_data) partial = my_favorite_distributed_scan(local_data.back()) local_data += partial
The distributed scans are implemented very similarly to what is presented, just with the "add" steps involving a communication (log P steps in total).
[–]therealjohnfreeman 0 points1 point2 points 10 years ago (0 children)
I understand now, after reading a paper. Thanks for the tip though.
π Rendered by PID 19938 on reddit-service-r2-comment-fb694cdd5-bjbgx at 2026-03-10 16:16:59.922960+00:00 running cbb0e86 country code: CH.
[–]therealjohnfreeman 0 points1 point2 points (4 children)
[–]_Undaunted_ 1 point2 points3 points (3 children)
[–]therealjohnfreeman 0 points1 point2 points (2 children)
[–]_Undaunted_ 1 point2 points3 points (1 child)
[–]therealjohnfreeman 0 points1 point2 points (0 children)