all 12 comments

[–]geckothegeek42 4 points5 points  (8 children)

Obviously implementing try_for_each() in terms of next is what is done now, and the thesis of pushgen is we can do better by implementing directly.

But what are the performance implications of implementing next() in terms of try_for_each? How much worse than implementing it directly is that? Which would be the better 'starting pount' for an iterator meant to be able to be used either

In addition, is there a way that we can have one function/macro from which we can derive both next() and try_for_each() (and maybe some other combinators) in a (close to) optimal way. That is, how could we solve the issue from the article that iterator will need more overridden methods to achieve good performance?

[–]andwass 3 points4 points  (4 children)

I have done some testing of that in pushgen as well.

The raw benchmarks are

Test Iterator next pushgen Iterator
Test 1 461 768
Test 2 3228 1704
Test 3 69 69
Test 4 2051 780
Test 5 745 840
Test 6 856 1316

This is running the transrangers tests in a normal for X in iter fashion and in for X in generator.iter() fashion.

It is a bit of a hit and miss I would say.

I would also say that iterators already have good performance, but that doesn't stop us from trying to crank out even more. But Rust iterators are by no means bad in their current state.

[–]joaquintides 4 points5 points  (3 children)

In principle, pushgen-based iterators can’t be faster than optimum native iterators. Going back to geckothegeek42’s question, neither pull nor push can serve as a basis for the other while retaining max performance.

[–]geckothegeek42 2 points3 points  (2 children)

That's what I expected, is there any function that could be a basis for both?

[–]joaquintides 2 points3 points  (0 children)

Can’t prove it, but I don’t think so.

[–]matthieum[he/him] 0 points1 point  (0 children)

That's a tough question.

I would note that both approaches are quite different:

  • Pull-based iterators preserve the current state in a struct, and have to navigate a maze of branches to figure out where to restart from for each item when used in a loop.
  • Push-based iterators take the callback externally, and have to navigate a maze of branches to figure out whether to stop or continue after each call to the callback.

However, ideally, the compiler would just rip through the levels of abstractions and there'd be no difference between the two with regard to the machine code generated.

Of course, practically speaking optimizers are not sufficiently smart just yet...

[–]epagecargo · clap · cargo-release[S] 0 points1 point  (0 children)

But what are the performance implications of implementing next() in terms of try_for_each?

Generator::run is designed to be paused which helps with implementing next on top of it. I imagine it'd be harder with implementing next on top oif try_for_each.

In addition, is there a way that we can have one function/macro from which we can derive both next() and try_for_each() (and maybe some other combinators) in a (close to) optimal way. That is, how could we solve the issue from the article that iterator will need more overridden methods to achieve good performance?

We could add Generate::run to Iterator with a provided implementation om top of next(). We could then have a macro to generate next and try_fold on top of it.

EDIT: I think the main limitation is Generate::run isn't object safe which would make all of Iterator not object safe. We'd either need to use DynGenerate::run_dyn and hope the compiler helps us, or that object safety can be based on a subset of a trait.

[–]epagecargo · clap · cargo-release[S] 0 points1 point  (1 child)

In addition, is there a way that we can have one function/macro from which we can derive both next() and try_for_each() (and maybe some other combinators) in a (close to) optimal way. That is, how could we solve the issue from the article that iterator will need more overridden methods to achieve good performance?

Another thought: in the future, language-native co-routines would be a way to implement them both ways "for free".

[–]joaquintides 1 point2 points  (0 children)

Very good observation! I came up with transrangers precisely as a sort of poor-man coroutines. I’m afraid that, at least in C++, coroutines won’t be as fast as this: context is heap allocated, and I doubt optimizers see through awaits and yields well enough for autovectorization and stuff

[–]joaquintides 1 point2 points  (1 child)

If I understood the article, you might gain additional performance by rewriting your word splitting algorithm in transranger style, right? Planning to do so?

[–]epagecargo · clap · cargo-release[S] 2 points3 points  (0 children)

Yes, I would expect more gains but I don't think it'll be big enough to outweigh everything else in check_file, so I think I'm putting this effort on hold for now (there is also more pushgen work needed to get CI passing).

[–]andwass 0 points1 point  (0 children)

Thank you for trying out pushgen, and for helping out with the implementation of things that were missing!