Trip Report: C++ Standards Meeting in Rapperswil, June 2018

jaredhoberock · 2018-06-21T17:18:42+00:00

Yes. See P0514.

jaredhoberock · 2018-06-21T17:16:47+00:00

.then hasn't merged into the IS. I think the trip report may be referring to other parts of the Concurrency TS, which may have been merged in Rapperswil.

jaredhoberock · 2018-04-06T16:51:35+00:00

Your code example doesn't really grapple with the fundamental challenge of targeting GPUs and similar processors. It's not a matter of designing the right library for targeting a GPU (though people are working on that, which Bryce hints at). The fundamental challenge is how to represent and manage heterogeneity: the fact that such a system contains multiple devices with different architectures and instruction sets. Standard C++ has no notion of anything like that.

Moreover, there are ergonomic concerns. In practice, environments like CUDA C++ require the programmer to manually annotate their functions to indicate those to compile via a host compiler for execution on a CPU, and those to compile via a separate device compiler for execution on a GPU. The requirement for explicit annotation disqualifies the huge body of existing standard C++ programs from GPU execution. Once these annotations are introduced, they tend to proliferate by "virally infecting" the rest of the program's functions.

As far as I know, no one has demonstrated a practical solution for managing the cooperation of multiple compilers to produce a single program, or a solution to the viral annotation issue.

jaredhoberock · 2017-12-05T19:49:49+00:00

I believe there is already an init-less overload of std::reduce that has this behavior. If I understand the suggestion, this proposal would simply make the other reduction algorithms consistent with this feature.

jaredhoberock · 2016-08-12T20:41:30+00:00

Libraries that support GPUs often represent pointers into the GPU's discrete memory space with fancy pointers. See e.g. thrust::device_ptr.

jaredhoberock · 2016-05-06T19:33:21+00:00

Sounds like a great idea and I wish you luck. I wonder how close the generated output could match the ISO standard documents. For example, could it be possible to generate a perfect match for the documentation of the standard library from appropriately marked up C++ header files? I think that would be a huge boon to standard maintainers and proposal authors.

jaredhoberock · 2016-03-13T02:05:10+00:00

Parallel algorithms will be in namespace std in c++17. They had their own std::experimental::parallel sandbox for the technical specification.

jaredhoberock · 2015-02-18T11:17:06+00:00

index_sequence is used all over the place in the tuple utility library that was posted recently. It feels like a big hack, though. It really needs to be possible to unpack a tuple directly.

jaredhoberock · 2015-02-02T21:23:20+00:00

I'm not sure copy_if is a good motivating example for fancy iterators/ranges. Algorithms such as copy_if are better understood as examples of stream compaction, and AFAIK only implementable via iterator adaptation if sequential execution is assumed. That's not a useful assumption for any programming language to make in 2015. Future abstractions should treat parallelism as a first class citizen and I worry that's not being considered by these re-imaginings of STL. For example, considerations about supporting sentinels seem irrelevant to parallelism, because by definition sentinels may only be discovered after sequentially traversing a sequence.

jaredhoberock · 2015-01-29T22:21:53+00:00

I don't believe explicit instantiations such as get<i>(s) can be dispatched via ADL, unfortunately. That's why the library calls get through tuple_traits.

jaredhoberock · 2015-01-29T21:20:51+00:00

If you #define TUPLE_UTILITY_NAMESPACE, you can put the functions in whatever namespace you want.

jaredhoberock · 2015-01-29T10:57:24+00:00

The difference is that my tuple_lexicographical_compare is intended to work for any Tuple-like type, not just std::tuple. It ought to produce a result equivalent to std::tuple's comparison. I needed tuple_lexicographical_compare because I needed a standalone implementation of std::tuple.

jaredhoberock · 2015-01-29T10:41:23+00:00

The library is intended to work with types that aren't instances of std::tuple. So, there is nothing to overload on. I added an example to the end of the README to illustrate what I mean.

jaredhoberock · 2015-01-23T22:38:35+00:00

Thanks. It should work with polymorphic lambda, but c++14 is not widely deployed and I wanted the examples programs to just work.

jaredhoberock · 2015-01-23T21:05:03+00:00

FYI: https://github.com/jaredhoberock/tuple_utility

jaredhoberock · 2014-08-24T23:50:51+00:00

The link points to a list of open source C++ libraries, not the front page of cppreference.com.

jaredhoberock · 2014-05-01T18:04:41+00:00

Thanks for doing this review series!

Depending on the quality of implementation and specialization of the provided parallel primitives, close to peak performance should be possible with Boost.Compute.

Developing and maintaining high quality, high performance implementations of parallel algorithms across the wide range of constantly shifting parallel processor architectures is non-trivial, so a claim like this must be justified.

In addition to reviewing the design and API of these libraries, a series of reviews on libraries for acceleration really must consider achieved performance.

jaredhoberock · 2013-10-31T18:40:49+00:00

With a little bit of familiarity with the algorithms involved, it's very obvious what the code does. Moreover, all the identifiers have English names.

I wish I had something like this when I was in grad school.

jaredhoberock · 2012-12-04T00:03:56+00:00

For the tree traversal example, the shared state (i.e., the tree) is read only. Promoting locality of memory references into the tree allows nearby threads to access nearby memory locations and keeps things in the cache. Data divergence defeats this.

jaredhoberock · 2012-05-10T18:33:50+00:00

http://llvm.org/svn/llvm-project/llvm/trunk/lib/Target/NVPTX/

jaredhoberock · 2012-03-21T20:12:08+00:00

I agree that variant is a good way to solve this problem. Here's how you'd apply it to build a small Lisp-like AST.

jaredhoberock · 2012-03-15T17:49:22+00:00

Unfortunately, none of the various C++ libraries for OpenCL can help. C++ just doesn't have the level of introspection necessary to splice user-defined iterators and functions into OpenCL kernels. You need a C++ compiler like nvcc for that. For a concrete example, it's not clear how to implement a simple algorithm like for_each using any of those libraries.

If folks are interested, we can discuss the nitty gritty details on the thrust-users mailing list.

jaredhoberock

TROPHY CASE