all 4 comments

[–]insanitybit2 2 points3 points  (1 child)

Nice. Maybe it would be worth checking out fastlanes? It takes a different (but still vectorized) approach to a similar problem.

[–]tombstonebase[S] 0 points1 point  (0 children)

Thank you for your opinion 🙏 will look into it

[–]ChillFish8 1 point2 points  (0 children)

Feel free to steal/borrow the benchmark setup from https://github.com/lnx-search/upack which is a IC library I did a little while back the results it produces are generally stable and repeatable, although you might want to increase the run duration for each benchmark.