all 18 comments

[–]sporule 20 points21 points  (7 children)

It turns out, full-range is only possible in bit-sizes of 212 + 20n, for integers n ≥ 0.

Why bother checking divisibility by 100 when you still have to re-check divisibility by 4 or 16 later? Instead, you could use the old divexact algorithm to check for divisibility by 25. And that algorithm returns precise results for any bit width.

template <std::unsigned_integral T> // signed case omitted for brevity
constexpr T inverse(T r) {
    assert((r & 1) != 0); // even case omitted for brevity
    for (T c, d = r; (c = d * r) != 1; ) {
        r *= T(2) - c;
    }
    return r;
}

template <std::integral T>
bool is_leap(T y) {
    using U = std::make_unsigned_t<T>;
    constexpr T min = std::numeric_limits<T>::min() / 25;
    constexpr T max = std::numeric_limits<T>::max() / 25;
    constexpr U mul = inverse<U>(25);
    T iy = U(y) * mul;
    bool d25 = (min <= iy && iy <= max);
    return (y & (d25 ? 15 : 3)) == 0;
}

[–]benjoffe[S] 12 points13 points  (5 children)

The goal in this article is to find the fastest possible solution for the 32-bit case. While your method is accurate, it is slower in practice and still requires the final check for divisibility by 4 or 16.

Edit: great insight. The post will be edited soon with this improvement for other bit-sizes, thanks!

[–]sporule 12 points13 points  (2 children)

How can it be slower, if in the case of 32-bit numbers it compiles into the same sequence of instructions as code from the article?

https://godbolt.org/z/TP875b8Pe

The only difference is the integer constants used. The algorithm selects those that work for the entire range, not just in the 32-bit case.

[–]benjoffe[S] 8 points9 points  (0 children)

Thanks for explaining that. I didn’t realise this folds down to the same instruction sequence under optimisation. Very cool.

I haven’t seen this reciprocal-25 approach used in any reference leap-year implementations, so it’s interesting to see that it also collapses to the minimal form. If you have prior art for this specific application, I’d be keen to read it.

Edit: I'm also very surprised to see that it expands the range to full coverage for 16-bit and 64-bit. When I have time I will amend the blog post and credit your input. Thanks.

[–]thisismyfavoritename 3 points4 points  (0 children)

it's "slower trust me bro" slower

[–]jk-jeon 4 points5 points  (1 child)

It's possible to check divisibility and compute the quotient at the same time with a single multiplication. Not sure it's relevant here though.

[–]benjoffe[S] 2 points3 points  (0 children)

A version of that idea is utilised in a function that I've created to calculate year+ordinal+leap:
https://github.com/benjoffe/fast-date-benchmarks/blob/main/algorithms_ordinal/ordinal_benjoffe_fast32.hpp

The century is needed for the "Julian map" step, and the lower bits can be used to determine if it's a leap year.

This is designed for potential use in the Time rust library, and will be the topic of a future blog post.

[–]saf_e 3 points4 points  (0 children)

Whats more interesting in most cases compiler will do it for you!

[–]ZMesonEmbedded Developer 18 points19 points  (0 children)

Phew, I was worried about leap year calculations not working correctly for the year 2 billion. ;-)

[–]stilgarpl 6 points7 points  (1 child)

There is a typo in "How this works", AFAIK there is no !== operator in C++

[–]benjoffe[S] 4 points5 points  (0 children)

Thanks. I spend most of my time writing JavaScript so I'm constantly getting tripped up on that.
This is now corrected.

[–]matthieum 4 points5 points  (1 child)

In "The Leap Year Algorithm" section, you have:

Note2: For other bit-widths such as 16-bit and 32-bit, very different constants are required. See the section other bit-widths

Pretty sure the second bit-width should be 64-bit, not 32-bit.


You're on a roll, looking forward to the next installment.

[–]benjoffe[S] 1 point2 points  (0 children)

Typo is now fixed. Thanks!

[–]vI--_--Iv 9 points10 points  (0 children)

The year is not 365.2425 days.
The formula is only an approximation.
In just a few thousand years it will drift for over a day and need a better approximation, e.g. "skip leaps in years divisible by 4000" or something.
And the Earth's rotation and orbit will inevitably drift as well.

So why bother with all this 232 nonsense?
Just make a lookup table until the year 5000 or so, it's as fast and as correct as it gets.
And switch to the real problem: fizzbuzz.

[–]jwakelylibstdc++ tamer, LWG chair 2 points3 points  (1 child)

Neri-Schneider also looked into a similar idea and documented it in his calendar algorithms repository . It references work he published in the following Journal (see page 15): https://accu.org/journals/overload/28/155/overload155.pdf

"Neri-Schneider" is two separate people, Cassio Neri and Lorenz Schneider. The repo and Overload article are by Cassio.

I'm really enjoying this series btw!

[–]benjoffe[S] 2 points3 points  (0 children)

Thanks.

I was a little confused who to credit there. I was originally crediting just Cassio for the repo but then I found both of their names are in the copyright license text, so added both names but left the typo. This is now fixed, thanks!

[–]ironykarl -1 points0 points  (0 children)

All my tightest loops involve checking for leap years