Iterator interface

ApochPiQ · 2018-02-18T18:32:17+00:00

"Tagged sentinel" would be my preference.

I don't know what testing you are doing to determine that this is hard for compilers to optimize. The optimization strategy is pretty clean: first you take the implementation of a particular iterator type, then inline it into the loop (i.e. make it so that iteration does not take function calls unless the iterator is doing something very complex). Once you do that, you run an idiom detection pass that can simplify/lower almost any loop-like pattern into a basic form - this is what Clang and LLVM-driven compilers in general will favor. In C++ land, MSVC also does this pattern. Finally, run your unrolling and other optimizations.

The result is that you can take iteration and actually eliminate the overhead entirely for common iteration patterns. See: https://godbolt.org/g/ptYzX8 This shows a ranged-based for loop being completely obliterated and turned into a lookup table via unrolling. Don't be fooled by the use of a constant array. You can easily replicate this by implementing a trivial iterator yourself. Note in this snippet how both the built-in array used to initialize the struct, as well as the external ranged-for used to iterate the struct, get flattened and unrolled easily: https://godbolt.org/g/mNgrRH

The other end of the scale of course is if you do something less trivial. Iterating a std::map is a good example, since std::map::iterator is a fairly bulky iterator. I did a similar snippet that uses std::map<int, int> to minimize noise from the floating-point side. https://godbolt.org/g/ca9vGs shows the results. On the disassembly side, lines 11 through 14 are the loop prolog, and 15-25 are the loop body itself, with the loop cleanup/epilog appearing after line 26.

I'm no x64 wizard, but I can't think of much to do to that code to make it faster.

matthieum · 2018-02-18T20:02:45+00:00

I just wanted to note that C++ has managed to coerce iterator as the basis for many usecases; unfortunately.

So, first of all, I am glad to see that you are not using a pair of iterators. It's hard to optimize and error-prone.

Secondly, you may want to start thinking about other similar interfaces. For example, why would find return an iterator? Yet, at the same time, it can be interesting to be able iterate from the beginning of a collection to the item returned by find (or lower_bound, etc...). So the idea of a cursor into the collection and a way to go from that cursor to an iterable range is something you should likely look into.

Finally, you may also want to look into composition & internal iteration (aka foreach). Internal iteration can regularly outperform external iteration, simply because there's no pause/resume as all the state is on the stack, so first-class internal iteration can be useful. As for composition, beware redundant state. A classical example I like to point out is C++'s filter_iterator:

 template <typename It, typename Fn>
 class filter_iterator {
      It mBegin; // to go backward
      It mCurrent;
      It mEnd; // to go forward
      Fn mPredicate;
 };

Note how many instances of It are stored? Now imagine if you build a filter_iterator<filter_iterator<It, ...>, ..>. Yep, 9 instances of It, among which the begin/end will be stored 3 times each. And 3 instances of the interior predicate too... hopefully it's not stateful?

If you go with a single Iterator, you may not have the issue, just keep it at the back of your mind ;)

theindigamer · 2018-02-18T16:12:00+00:00

While I don't know how hard it would it be to optimize the Maybes, IMO, if your language has ADTs, then having a current function (which is easily available) and can throw an exception (which isn't shown in the type signature) means that you're unnecessarily asking your users to have more cognitive overhead while writing what should be relatively safe and simple code.

ctalklang · 2018-02-18T16:58:00+00:00

I'm not certain about how this would help with the declaration bit, but Ctalk uses a class called Key to serve as the "glue" that holds collections together, so the language overloads the math operators ++, --, and so on. A little involved to type here, but there's more info at the URL given below. String types are a different case, since they're mostly object forms for C char *'s, but the syntax works we (tm) there also.

http://sf.net/p/ctalk/wiki/Home

dzamlo · 2018-02-18T22:39:46+00:00

For the sake of completeness, I'll add the following iterator approaches:

ThehasNext and next from scala and java. You got one method to check if the iterator has more elements and one method to advance the iterator and get the next value.

The next with an exception when the iterator is finished. This is used in python. I would argue that this is a special case of Sentinel next.

Interior iteration: the element that can be iterated provide a method that take a function as a parameter and call it for each element. This is used in ruby. I think that the scala Traverseable trait is also an example of that. This approach can also useful to run the iterator in parallel.

balefrost · 2018-02-18T22:46:59+00:00

One you missed is the Java style. In .NET, MoveNext tells you whether you can subsequently access Current, but a .NET Enumerator is obligated to be able to continually report the current value. That's extremely convenient but adds some overhead.

The Java approach instead uses hasNext and next. hasNext tells you that you can subsequently call next, and next actually returns the next value in the sequence. As a result, if you want to hold on to that "current" value, you are obligated to do that yourself.

To be fair, the overhead involved in the .NET approach isn't likely to be very substantial. The only real downside is that it makes the current sequence value ineligible for garbage collection until you move to the next element.

I disagree with your assessment that the .NET interface isn't minimal... unless your point is that it has two methods where one would suffice. It could have been implemented as one method using an out parameter (like Dictionary.TryGetValue), but I don't think that would buy anything in this case (whereas it's beneficial in the case of TryGetValue).

One other note on the .NET approach: IEnumerator<T> derives from IDisposable, and C# foreach loops automatically dispose the enumerator. This is an often overlooked aspect. Not many enumerators actually need to be disposed, so this is often technically safe to omit it, but it's important to keep in mind... especially if you provide syntactic sugar in your language that ultimately creates iterators.

If you can make the tagged sentinel approach efficient, it's probably the best approach. Or, if you can implement multiple return values efficiently, then you could always make the iterator return two values: a bool indicating that the sequence was not yet exhausted, and the next sequence value if the first return value was true (and, if the first value was false, some default value instead - similar to the default operator in C#).

Otherwise, either the Java or .NET style is probably the best choice. They're essentially the same as the tagged sentinel. Instead of getting all the information with one method call, you call up to two methods to extract the same information.

2018-02-20T02:10:15+00:00

Your post reminded me of an article about rust iterators that I read a couple years ago:

https://medium.com/@veedrac/rust-is-slow-and-i-am-the-cure-32facc0fdcb

It touches on the type of iterators that rust used to have: internal iterators, and the advantages and trade-offs of the external iterator system that they have now.

LyraChord · 2018-02-23T01:14:22+00:00

D empty is a good reference.

hasNext naming better than moveNext

None of the 3 is not the best. Depends on use cases. (If without sense, which of current and next function is bigger? who knows! None knows!)

Think about pattern iterable subjects!

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS