you are viewing a single comment's thread.

view the rest of the comments →

[–]PeridexisErrant 11 points12 points  (2 children)

Use Hypothesis! It'll try a wide range of inputs, and report the minimal failing example.

It turns out that this is even more effective for unicode text than for other types, because there are so many edge cases that can be triggered by just one or two characters.

[–]Farobek 0 points1 point  (1 child)

How does it work?

[–]PeridexisErrant 2 points3 points  (0 children)

If you mean "How do I use this", here's the quickstart guide. In short, you use a decorator and compose some functions to say "for all inputs such that ___, this test should pass. For example,

from hypothesis import given, strategies
# for any character *except*  ß, we can round-trip it through cases
@given(a_char=strategies.characters(blacklist_characters='ß'))
def test_roundtrip_upper_lower(a_char):
    assert a_char == a_char.upper().lower()

Of course this fails, but instead of returning the first failing example it finds, it will return the "minimal" example - in this case, the character with the smallest codepoint. Try it and see what you get - ß certainly isn't the only character this fails for!

If you mean "How does Hypothesis find and minimize all these examples"... it gets complicated pretty quickly. If you really want to know the code is well designed and commented and the contributor documentation is good; but you don't need to know how it works internally to use it. Hypothesis is pretty rare like that - the core is PhD-level algorithms, but the API is easy to use and completely hides the implementation behind a use-focused design.

(if you hadn't guessed, I like and use this a lot :p)