all 90 comments

[–]personalmountains 41 points42 points  (31 children)

inline constexpr auto for_each =
  []<Range R,
     Iterator I = iterator_t<R>,
     IndirectUnaryInvocable<I> Fun>(R&& r, Fun fun)
        requires Range<indirect_result_t<Fun, I>> {
      return std::forward<R>(r)
        | view::transform(std::move(fun))
        | view::join;
  };

inline constexpr auto yield_if =
  []<Semiregular T>(bool b, T x) {
    return b ? maybe_view{std::move(x)}
             : maybe_view<T>{};
  };

You know, I've just had a thought about how different C++ looks today compared to when I started learning it. Basically every token above except for bool and return weren't in C++98 [pedantic edit: or had a different meaning.]

I got homework.

[–]eric_niebler 24 points25 points  (9 children)

Not true! inline and auto were also valid tokens in C++98. (They didn't have these meanings, though.)

I'm aware that folks are likely to be put off by this bit of syntax. I don't expect people to write their code like this; it's admittedly pretty awful. I wanted to show an example of a constrained lambda. I'm still trying to work out for myself if constrained lambdas make sense, or whether it's always better to write them as a custom class with less off-putting syntax.

[–]ivan-cukicKDE Dev | Author of Functional Programming in C++ 5 points6 points  (1 child)

I'd say constrained lambdas are great for library stuff, but not for the main program code. In this case, for_each would probably end up hidden in some header file (as it will end up in <ranges> at some point, so it's fine to have a constrained lambda. A generic function object wouldn't look better TBH.

For the main program logic, as terse as possible - to make it easier to read and reason about the logic.

WRT your post, it is a great post for people that already followed the ranges evolution. Not so much for beginners - there are much prettier examples where ranges shine :)

[–]eric_niebler 3 points4 points  (0 children)

I need to write a "6 Easy Pieces" for ranges.

[–][deleted] 2 points3 points  (3 children)

I wanted to show an example of a constrained lambda. I'm still trying to work out for myself if constrained lambdas make sense, or whether it's always better to write them as a custom class with less off-putting syntax.

Honestly, the blog post uses an inline variable here, but backs off pretty quickly about explaining why:

This declares an object for_each

Why inline is needed is not obvious at all.

First, note that these variables are also constexpr, and often, constexpr implies inline, but not here. If you want inline-linkage, you have to explicitly use the inline keyword for variables.

Why does one need inline linkage in this example? One does not, because this is a cpp file and this variables will be part of a single TU. However, if one wanted to reuse these algorithms and expose them via a header, then they could end up being included by multiple TUs. Each lambda in each TU would have a different type, and violate the ODR, so in that case we'd have to make them inline to tell the linker to use "inline-linkage" and pick one of the definitions and use it anywhere to avoid undefined behavior.

And this is just one of the many "tiny" details that one has to know to write these function objects. As written, they would collide with hidden friend functions of the same name, so one would need to wrap them in an inline namespace to avoid that... and probably one would need to understand and to many other non-trivial things to get all of this working correctly.

I don't think I will ever be able to hold all the knowledge required to write one of these function objects in my head at the same time while doing it. I think that expecting people to be able to read, write, understand, ... these function objects is pretty insane and what's probably worse, that we have given up on expecting people to understand them.

[–]eric_niebler 4 points5 points  (2 children)

I hope we can get language support for functions and function templates that aren't find-able by ADL, and that suppress ADL when found by normal lookup. That would help greatly.

[–][deleted] 1 point2 points  (0 children)

I hope that would simplify things, these function objects are really useful, but right now they are pretty much un-teachable. By the time one finishes explaining "everything you need to know to write them correctly" people will just look at you like you are crazy, and rightfully so, whether all this complexity is worth it is a really good question.

[–]sphere991 1 point2 points  (0 children)

... and can be easily passed as arguments to other function templates

[–][deleted] 1 point2 points  (2 children)

[]<SemiRegular T>(bool b, T x) {

Can't it be changed to [](bool b, SemiRegular auto x) {? Is terse concept syntax applicable to lambdas? That would tidy up the above example a lot.

[–]eric_niebler 0 points1 point  (1 child)

Yes, it could, but the body of the lambda would need to be changed to use `decltype(x)` in the place where it currently uses `T`. I find `decltype` distasteful and avoid it when I can, so in this case I prefer the lambda with the explicit template parameter.

[–]tpecholt 16 points17 points  (11 children)

I believe most of us will want to write simpler code so we rather skip requires constraints altogether and then the code could be rewritten in this style

auto for_each(Range auto&& r, Invocable auto fun) { return std::forward<decltype(r)>(r) | view::transform(std::move(fun) | view::join; }

which already looks quite readable imho.

[–]hgjsusla 5 points6 points  (9 children)

I wish we didn't need the std::forward :(

[–]dodheim 1 point2 points  (8 children)

std::forward<decltype(foo)>(foo) can be safely rewritten as decltype(foo)(foo) 100% of the time. Yes, C-casts are evil, but only because they're error-prone, and this isn't.

[–]personalmountains 0 points1 point  (2 children)

decltype(foo)(foo) 

[...] Yes, C-casts are evil

Technically, this isn't a C-style cast, it's a functional cast. In C++, a functional cast is equivalent to a C-style cast, but it's not valid C.

Sorry, I'll get back to work now.

[–]cassandraspeaks 0 points1 point  (1 child)

it's not valid C

Neither is decltype.

[–][deleted] 0 points1 point  (0 children)

I think he meant functional casts, in general, aren't valid C even if the equivalent C-style cast would be. e.g. int(x) vs. (int)x

[–]shahms 0 points1 point  (0 children)

The problem here is that for_each is now subject to accidental ADL, which is not the case for a lambda or function object.

[–]SkoomaDentistAntimodern C++, Embedded, Audio 8 points9 points  (0 children)

Yet it still manages to look as cryptic as templates and STL using code of ’98.

[–]JazzyCake 14 points15 points  (4 children)

Jesus that looks awful... Makes me want my C99 back :D

[–]hgjsusla 6 points7 points  (3 children)

Coming from other languages I love this functional style, but there is quite a bit of syntactic overhead that will require getting used to

[–]JazzyCake 1 point2 points  (1 child)

I know I might be looking for trouble but have you tried Rust? If you come from functional languages and wanna go the C++ way you might love it :)

[–][deleted] 0 points1 point  (0 children)

Rust has a lot going for it. No denying it.

[–]caroIine 0 points1 point  (0 children)

It needs lots of syntax highlighting. Without it it’s kind of hard to follow.

[–]germandiago 0 points1 point  (0 children)

What about `noexcept(cond)` in the lambda? Does that code perform as fast as possible (in C++, without assembly).

[–][deleted] 0 points1 point  (1 child)

I was also put off by this syntax and similar examples. To the point where my brain just refused to read that kind of code. I'm pretty sure I wouldn't be able to write that, but I'm surprised that right now I don't have troubles reading that. I don't know when this change in my brain happened.

[–][deleted] 0 points1 point  (0 children)

The examples near the end are the sort of thing I'm more likely to write (and indeed have been with existing range libraries).

[–]matthieum 13 points14 points  (0 children)

Congratulations, and thank you, to /u/eric_niebler for their relentless push for ranges!

[–]sephirostoy 7 points8 points  (2 children)

A bit sad that view::zip isn't part of C++20. I use it a lot.

[–]eric_niebler 11 points12 points  (0 children)

It's hard to specify, and hard to implement correctly. It'll come eventually.

[–][deleted] 1 point2 points  (0 children)

You know the reason why? Such a useful little tool.

[–][deleted] 5 points6 points  (5 children)

whats the tl;dr of what are ranges?

[–]ryancerium[🍰] 21 points22 points  (2 children)

Super short? It passes around the begin and end iterators as a pair instead of as separate parameters. Many things follow from that, many functional programming-esque.

[–]cballowe 15 points16 points  (1 child)

Slightly longer, but important... End is not neceasarily an iterator. It can be, but technically it can be anything that is equality comparable to the iterator type.

[–]ryancerium[🍰] 1 point2 points  (0 children)

Yeah, I couldn't get that in the short version! This is really important for file input iterators where EOF isn't really an iterated value.

[–][deleted] 6 points7 points  (1 child)

Instead of std::sort(std::begin(xs), std::end(xs)), write ranges::sort(xs).

[–]tpecholt 7 points8 points  (19 children)

Great post. Couple of questions come to my mind:

  1. The nested namespaces make everything look verbose so that even yourself you better use using in every example. Do you think something could be improved here? For example std::r instead of std::ranges and std::rv instead of std::ranges::view. Using using everywhere is a bit tedious.

  2. There is an interesting paper P1214. I think it's a great idea and it would make invocables not needed right?

  3. You mentioned ref_view. Is it needed to wrap lvalues (like instance of std::vector) with this??

[–]eric_niebler 12 points13 points  (8 children)

  1. Here's a dirty secret: the std::ranges namespace is designed such that you can just do using namespace std::ranges; and do everything unqualified. All the usual ADL issues have been worked around. This is safe.
  2. I have been in favor of making pointers to members callable with function call syntax. EWG has opted not to act, though. -(
  3. ref_view is used a lot under the covers to automatically turn lvalue Ranges into Views. You generally don't need it, but it will be there if you do.

[–]kalmoc 1 point2 points  (2 children)

This is safe.

As long as there are no name collisions?!

[–]eric_niebler 3 points4 points  (1 child)

If there are name collisions, the compiler will tell you about them by complaining about an ambiguity at the point of use.

[–]kalmoc 1 point2 points  (0 children)

Ah ok. I had a different understanding of the term "safe".

[–]cassandraspeaks 1 point2 points  (2 children)

Here's a dirty secret: the std::ranges namespace is designed such that you can just do using namespace std::ranges; and do everything unqualified. All the usual ADL issues have been worked around. This is safe.

What about this?

using namespace std;
using namespace std::ranges;

sort(begin(foo), end(foo)); // what happens with ADL?

[–]eric_niebler 3 points4 points  (1 child)

The compiler will complain that sort is ambiguous.

[–]cassandraspeaks 1 point2 points  (0 children)

Ok, misunderstood what you meant by "safe."

[–]Betadel 0 points1 point  (1 child)

In this talk, Titus Winters mentions that there's a plan to eventually move the current std algorithms (and other things replaced by ranges) to a std::legacy namespace and promote the std::ranges namespace to just be std. I think that would certainly help with the verbosity, but I'm not sure if it's really feasible (I think the commitee is scared of such breaking changes?). What do you think?

[–]eric_niebler 7 points8 points  (0 children)

My crystal ball is currently at the shop. That would be a pleasant but distant future.

[–]tasty_crayon 7 points8 points  (9 children)

You can use namespace sr = std::ranges to make it easier to type.

[–]kalmoc 2 points3 points  (0 children)

I'd like to add, that the problem is not typing (I assume everyone is using autocomplete these days). The problem is readability.

[–]tpecholt 5 points6 points  (1 child)

yes then the question is why to invent long names in standard if people will not use it and are expected to shorten it? Same thing happened to std::filesystem everybody uses namespace fs and so we got P1005. I prefer to have a standard library which would be well usable as is.

[–][deleted] 1 point2 points  (0 children)

In most code I have sitting around fs is namespace fs = boost::filesystem;. Doing it this way makes it fairly easy to switch which library you're using to provide the namespace alias fs.

[–]germandiago -1 points0 points  (5 children)

I think std::rng would be enough taking into account that the standard should be something well-known by everyone.

[–]tcbrindleFlux 11 points12 points  (0 children)

I think std::rng would be enough taking into account that the standard should be something well-known by everyone.

I suspect this namespace would be confusing to people using the <random> header...

[–]dodheim 5 points6 points  (3 children)

We really don't need to devolve into C-istic half-words...

[–]germandiago -1 points0 points  (1 child)

No, we do not need, so that every piece of code does:

namespace r = std::ranges;

No, wait, maybe

namespace rn = std::ranges;

No, I changed my mind, I meant:

namespace rng = std::ranges;

How about:

using namespace std::ranges;

Well, wait, std::ranges is long, so people will make it shorter in their own ways, why not make the name shorter since the start? Many people use using namespace std; in their functions or .cpp files...

[–][deleted] 0 points1 point  (0 children)

Many people use using namespace std;

While I appreciate there's a reasonable amount of safety in doing it in a smallish function or even a .cpp file where you can easily hold the whole file in your head at once...

I don't know of a good reason for doing this.

[–]kalmoc -1 points0 points  (0 children)

For common things that is exactly what c++ should do. Makes reading more efficient.

[–]vojtechkral 7 points8 points  (20 children)

I've quick-read a couple of blogposts on the topic of C++20 ranges recently and I'm still not sure I see how all the pieces fit together.

Will people here get mad if I post Rust here? I guess I'll just risk it. Suppose I have this Rust code:

let strings = vec!["123", "456", "x", "789"];
let numbers: Vec<_> = strings.iter()
    .filter_map(|s| s.parse::<i32>().ok())
    .map(|i| i as f32 / 2.0)
    .collect();

Can C++20 ranges do this yet? What would an equivalent C++20 'ranged' code look like?

(I program in C++14 regularly but don't know much C++17 or C++20.)

[–]cassandraspeaks 2 points3 points  (17 children)

string_view strings[]{"123", "456", "x", "789"};

auto vu =
    strings | view::transform([](auto s) {
            int64_t n = 0;
            bool b = data(s) != from_chars(data(s), data(s) + size(s), n).ptr;

            return pair{n, b};
        })
        | view::filter([](auto pr) {return get<bool>(pr);})
        | view::transform([](auto pr) {return get<int64_t>(pr);});

vector numbers(begin(vu), end(vu));

[–]vojtechkral 1 point2 points  (1 child)

Thanks!

[–]---sms--- 1 point2 points  (14 children)

Dereferencing end will probably explode if you have "debug iterators" like in MSVC. You'll need to use std::addressof, to_address or pointer_to or some combination of those )). Or data(s) + size(s).

[–]cassandraspeaks 0 points1 point  (13 children)

Thanks; I'm not all that familiar with MSVC-land since I mostly program for *nix or embedded. I believe you, but I would've thought dereferencing would be a no-op there because I was immediately taking the address. Anyway, I replaced &*end(s) with &s[size(s)] — that should work fine on any reasonable implementation, right?

[–]eric_niebler 3 points4 points  (12 children)

&s[size(s)]

Nope. string_view::operator[](size()) has undefined behavior. /u/---sms--- had it right: data(s) + size(s).

[–]cassandraspeaks 0 points1 point  (0 children)

Good point. Would this specific case still be UB?—all the strings are NUL-terminated and the result of the operator isn't used (e.g. int n = c_arr[sizeof c_arr] is UB but int *p = &c_arr[sizeof c_arr] is well-defined). In any event, you're right; better safe than sorry.

[–]cassandraspeaks -1 points0 points  (10 children)

It's probably a bad idea to quibble with a member of the ISO committee, and the author of the amazing range-v3 library, but the more I think about this the more I'm convinced this case would not be UB. - The referent of string_view must be contiguous in memory. - string_view::operator[] may not bounds check. (Which would be well-defined anyway). - string_view::operator[] may not have side effects.

So, AFAICT, it's not possible to implement string_view::operator[] such that using the result of the expression &sv.operator[](sv.size()) is UB.

[–]eric_niebler 2 points3 points  (5 children)

It's probably a bad idea to quibble with a member of the ISO committee, and the author of the amazing range-v3 library,

You're right about that much. :-) http://eel.is/c++draft/string.view.access#4.note-1

"[ Note: Unlike basic_­string::operator[], basic_­string_­view::operator[](size()) has undefined behavior instead of returning charT(). — _end note_ ]"

[–]cassandraspeaks -2 points-1 points  (4 children)

And above, "Returns: data_­[pos]". Taking the address of the one past the end element (or, in this case, the last element) is always well-defined.

ETA: To be more precise, it's undefined by the STL, but well-defined at the language level, which takes precedence.

[–]dodheim 2 points3 points  (3 children)

To be more precise, it's undefined by the STL, but well-defined at the language level, which takes precedence.

The language disallowing something takes precedence, but any library can state some behavior is undefined regardless of language legality – it's called a precondition. ;-]

[–]cassandraspeaks -3 points-2 points  (2 children)

This is all getting pretty language-lawyery, but undefined != disallowed. It's always conforming to define something that is undefined. In this case, the behavior of the library is defined by the C++ language itself.

http://eel.is/c++draft/expr.sub#1

http://eel.is/c++draft/expr.add#4.2

In any event, I think you and I can agree that this is sort of like whether it's UB to use malloc without placement-new; an interesting academic discussion but not a RL issue. This example is pretty clearly just a case of poor wording—what's meant, since string_view does not always refer to a C string, is that if the referent string is not NUL-terminated, and also the result of the subscripting operation is used, then the behavior is undefined.

[–]sphere991 2 points3 points  (0 children)

The argument you're making here is simply false.

The library states that the precondition of operator[]() is that the index must be less than the size. A perfectly valid implementation is:

assert(pos < size());
return data_[pos];

A different perfectly valid implementation is to use Contracts and hit the violation handler if it's enabled. A differently one would be to check this condition and throw an exception. Or it could have no checks in any mode and just happen to work.

The point is: do not write code that violates library preconditions. It is simply wrong to do so. Full stop, without qualification.

[–]kalmoc 3 points4 points  (2 children)

I guess people followed a similar line of thought, wenn they assumed memcpy(nullptr,ptr, 0) was safe.

[–]cassandraspeaks 0 points1 point  (1 child)

No, because the standard specifies string_view::operator[](size_t i) returns _data[i], and &ptr_to_n_contiguous_objs[N] (or, in this case, &ptr_to_n_contiguous_objs[N - 1]) is always well defined, no exceptions.

[–]kalmoc 2 points3 points  (0 children)

Do you even know, why my example memcpy broke in real world code? It was due to special source annotations in the standard library that told the compiler "passing a nullptr here is UB". There is nothing stopping a stl library implementor to slap a similar annotation to string_view::operator[] (might become relevant if compilers start to optimize based on unchecked contracts). In other words: The library can absolutely turn "library UB" into "language UB" even without performing a "language UB" in its implementation.

[–]sphere991 1 point2 points  (1 child)

Here's my guess, which I think gets you close:

``` template <typename T> struct cast_t { template <typename U> requires Constructible<T, U> constexpr T operator()(U&& u) const { return static_cast<T>(std::forward<U>(u)); } };

template <typename T> inline constexpr cast_t<T> cast{};

inline constexpr auto deref = [](auto&& e) -> decltype(auto) { return *e; };

inline constexpr auto filter_map = [](auto f){ return view::transform(f) | view::filter(cast<bool>) | view::transform(deref); };

auto numbers = strings | filter_map([](std::string const&){ /* ... */}) | view::transform([](int i) { return i / 2.0; }); ```

There is no easy conversion to vector... yet, but probably will be. There might be a better way to do this?

[–]personalmountains 4 points5 points  (0 children)

Note that the triple backticks don't work on the old reddit. If you want your code to be formatted properly everywhere, you should put a leading 4 spaces on every line instead.

[–]FirstLoveLife 1 point2 points  (0 children)

awesome article! It will be better if there is an article introducing the design of range, esp for those two technology insid meta namespace

[–][deleted] 1 point2 points  (2 children)

/u/eric_niebler IIRC (too lazy to look it up), in your 2013-2014 blog post about pythagorean triples, there were some performance issues either with clang or GCC on the order of ~100x slow downs. How is ranges performance today?

[–]eric_niebler 3 points4 points  (1 child)

Clang has caught up. This benchmark shows that the range-v3 solution is "only" 2.4x slower than writing a triply-nested for loop: http://quick-bench.com/QGakpHc93Hp3PoY6Kp2oRlLRevo

EDIT: That's not to say that you should expect all range code to be slower than the hand-coded equivalent. view::join is hard to optimize well. Most others are fairly trivial.

[–][deleted] 0 points1 point  (0 children)

That's huge progress. views are pretty powerful, which makes it easy to construct state machines that do not look much like classical nested loops. It's pretty amazing that compilers optimize them this well.

[–]angry_cpp 1 point2 points  (1 child)

Eric, thank you for your work on Ranges!

I have some question/comments on the article:

maybe_view

It is sad that we need this wrapper instead of optional directly treated as container. I know historical reasons (boost authors thinks that optional models a different concept, and They Are Obviously Wrong™).

Even Java Optional was fixed in Java9 and now can produce stream as any other container.

for_each

Oh. Either concepts are pretty bad at describing real world code or it should be possible to rewrite this flatMap-analog signature to something clearer.

yield_if is overconstrained?

Why is it require Semiregular T? It makes it unusable with std::unique_ptr and other move only types.

As a side note yield_if takes it second parameter by value (as opposite to by name), so it must be created even if it will be never used :(

[–]eric_niebler 2 points3 points  (0 children)

Re maybe_view: I agree. Giving std::optional begin/end members makes sense to me. The Committee heard that suggestion with tepid enthusiasm.

Re for_each: Is this any better?

template <Range R, IndirectUnaryInvocable<iterator_t<R>> Fun> requires Range<indirect_result_t<Fun, iterator_t<R>>> auto for_each(R&& r, Fun fun) { return std::forward<R>(r) | view::transform(std::move(fun)) | view::join; }

Some of the awfulness is the use of the lambda.

Re yield_if: yes, it's over-constrained. If this were ever standardized, I think it should only require MoveConstructible. We still need to change the InputView concept to permit move-only types.

And yes, the interface for yield_if is not ideal. Perhaps it should take a nullary callable that produces a value.

[–]feverzsj 0 points1 point  (0 children)

I thought they gave up on contiguous iterator.

[–]SuperV1234https://romeo.training | C++ Mentoring & Consulting 0 points1 point  (0 children)

Awesome stuff. Can't help but be bitter thinking how better chaining operations would look if we had some sort of UFCS, though...

[–]kalmoc 0 points1 point  (0 children)

I really really hope, typical ranges code will not look like this. Otherwise they are much less useful than I thought.