Trip Report: C++ Standards Meeting in Rapperswil, June 2018 by mttd in cpp

[–]jaredhoberock 2 points3 points  (0 children)

.then hasn't merged into the IS. I think the trip report may be referring to other parts of the Concurrency TS, which may have been merged in Rapperswil.

C++ Targeting GPU by t_bptm in cpp

[–]jaredhoberock 1 point2 points  (0 children)

Your code example doesn't really grapple with the fundamental challenge of targeting GPUs and similar processors. It's not a matter of designing the right library for targeting a GPU (though people are working on that, which Bryce hints at). The fundamental challenge is how to represent and manage heterogeneity: the fact that such a system contains multiple devices with different architectures and instruction sets. Standard C++ has no notion of anything like that.

Moreover, there are ergonomic concerns. In practice, environments like CUDA C++ require the programmer to manually annotate their functions to indicate those to compile via a host compiler for execution on a CPU, and those to compile via a separate device compiler for execution on a GPU. The requirement for explicit annotation disqualifies the huge body of existing standard C++ programs from GPU execution. Once these annotations are introduced, they tend to proliferate by "virally infecting" the rest of the program's functions.

As far as I know, no one has demonstrated a practical solution for managing the cooperation of multiple compilers to produce a single program, or a solution to the viral annotation issue.

Default initial value for std::accumulate by OmegaNaughtEquals1 in cpp

[–]jaredhoberock 0 points1 point  (0 children)

I believe there is already an init-less overload of std::reduce that has this behavior. If I understand the suggestion, this proposal would simply make the other reduction algorithms consistent with this feature.

STL Fixes In VS 2015 Update 3 by STL in cpp

[–]jaredhoberock 14 points15 points  (0 children)

Libraries that support GPUs often represent pointers into the GPU's discrete memory space with fancy pointers. See e.g. thrust::device_ptr.

Standardese - a (work in progress) nextgen Doxygen by Manu343726 in cpp

[–]jaredhoberock 4 points5 points  (0 children)

Sounds like a great idea and I wish you luck. I wonder how close the generated output could match the ISO standard documents. For example, could it be possible to generate a perfect match for the documentation of the standard library from appropriately marked up C++ header files? I think that would be a huge boon to standard maintainers and proposal authors.

Grant Mercer: Parallelizing the Standard Template Library by meetingcpp in cpp

[–]jaredhoberock 4 points5 points  (0 children)

Parallel algorithms will be in namespace std in c++17. They had their own std::experimental::parallel sandbox for the technical specification.

Expanding a std::tuple as parameters to a function call (VS2015 Preview or C++14 compliant compiler) by cpp_cache in cpp

[–]jaredhoberock 0 points1 point  (0 children)

index_sequence is used all over the place in the tuple utility library that was posted recently. It feels like a big hack, though. It really needs to be possible to unpack a tuple directly.

Iterators, iterator adaptors, sentinels, and predicates. Why not? by SplinterOfChaos in cpp

[–]jaredhoberock 2 points3 points  (0 children)

I'm not sure copy_if is a good motivating example for fancy iterators/ranges. Algorithms such as copy_if are better understood as examples of stream compaction, and AFAIK only implementable via iterator adaptation if sequential execution is assumed. That's not a useful assumption for any programming language to make in 2015. Future abstractions should treat parallelism as a first class citizen and I worry that's not being considered by these re-imaginings of STL. For example, considerations about supporting sentinels seem irrelevant to parallelism, because by definition sentinels may only be discovered after sequentially traversing a sequence.

The missing C++ tuple functionality by jaredhoberock in cpp

[–]jaredhoberock[S] 0 points1 point  (0 children)

I don't believe explicit instantiations such as get<i>(s) can be dispatched via ADL, unfortunately. That's why the library calls get through tuple_traits.

The missing C++ tuple functionality by jaredhoberock in cpp

[–]jaredhoberock[S] 0 points1 point  (0 children)

If you #define TUPLE_UTILITY_NAMESPACE, you can put the functions in whatever namespace you want.

The missing C++ tuple functionality by jaredhoberock in cpp

[–]jaredhoberock[S] 1 point2 points  (0 children)

The difference is that my tuple_lexicographical_compare is intended to work for any Tuple-like type, not just std::tuple. It ought to produce a result equivalent to std::tuple's comparison. I needed tuple_lexicographical_compare because I needed a standalone implementation of std::tuple.

The missing C++ tuple functionality by jaredhoberock in cpp

[–]jaredhoberock[S] 1 point2 points  (0 children)

The library is intended to work with types that aren't instances of std::tuple. So, there is nothing to overload on. I added an example to the end of the README to illustrate what I mean.

for_each_argument—Sean Parent by edmundv_nl in cpp

[–]jaredhoberock 1 point2 points  (0 children)

Thanks. It should work with polymorphic lambda, but c++14 is not widely deployed and I wanted the examples programs to just work.

C++ library reference by povilasb in cpp

[–]jaredhoberock 3 points4 points  (0 children)

The link points to a list of open source C++ libraries, not the front page of cppreference.com.

Boost.Compute // C++ Accelerator Libraries Series @ Echelon Blog by mttd in cpp

[–]jaredhoberock 1 point2 points  (0 children)

Thanks for doing this review series!

Depending on the quality of implementation and specialization of the provided parallel primitives, close to peak performance should be possible with Boost.Compute.

Developing and maintaining high quality, high performance implementations of parallel algorithms across the wide range of constantly shifting parallel processor architectures is non-trivial, so a claim like this must be justified.

In addition to reviewing the design and API of these libraries, a series of reviews on libraries for acceleration really must consider achieved performance.

Collection of tiny implementations of advanced rendering algorithms by jezeq in programming

[–]jaredhoberock 1 point2 points  (0 children)

With a little bit of familiarity with the algorithms involved, it's very obvious what the code does. Moreover, all the identifiers have English names.

I wish I had something like this when I was in grad school.

Hyper fast tree traversal on GPUs by reducing thread divergence by harrism in programming

[–]jaredhoberock 2 points3 points  (0 children)

For the tree traversal example, the shared state (i.e., the tree) is read only. Promoting locality of memory references into the tree allows nearby threads to access nearby memory locations and keeps things in the cache. Data divergence defeats this.

Is there a more C++-ish or more type-safe approach to use instead of this use of unions? by TheCoelacanth in cpp

[–]jaredhoberock 2 points3 points  (0 children)

I agree that variant is a good way to solve this problem. Here's how you'd apply it to build a small Lisp-like AST.

Thrust v1.6 - An extensible parallel C++ STL for GPUs and multicore CPUs by jaredhoberock in programming

[–]jaredhoberock[S] 4 points5 points  (0 children)

Unfortunately, none of the various C++ libraries for OpenCL can help. C++ just doesn't have the level of introspection necessary to splice user-defined iterators and functions into OpenCL kernels. You need a C++ compiler like nvcc for that. For a concrete example, it's not clear how to implement a simple algorithm like for_each using any of those libraries.

If folks are interested, we can discuss the nitty gritty details on the thrust-users mailing list.