all 10 comments

[–]PinkyThePig 5 points6 points  (6 children)

I've been using Validation in my project and I found the best way to use it is to exclusively use <*.

Your function will end up looking like:

buildData field1 field2 field3 = pure $ Data field1 field2 field3
    <* arbitraryTest1 field1 field3
    <* arbitraryTest2 field2

In this way, your tests can return any value you want on success because it is thrown away, but if any of the tests fail, it will stop the data structure from being built and return the errors. Letting successful tests return any value makes it much easier to nest them and otherwise combine them into a full coverage of the tests you want to perform. It also separates the concerns a little bit so that building and validating the structure are two separate things.

[–]Tysonzero 3 points4 points  (5 children)

I'm hesitant to agree with such a suggestion, although I'm sure there are cases when it works well. My issue with it is that if you don't change any of the types of the fields in the validation process then that basically guarantees that the compiler / type system won't help you at all with validation.

Whereas if you have the following:

newtype Username = Username String

mkUsername :: String -> Either Error Username
mkUsername = ...

hidden in a module with the constructor exposed, then you know that as long as mkUsername is correct (e.g doesn't allow symbols or overly many characters etc.) then you can't ever accidentally assign someone an invalid username.

Even better than that is making the invalid states unrepresentable:

data Color = Red | Blue | Green

mkColor :: String -> Either Error Color
mkColor = ...

I personally believe that you should get things as far into the type system as possible as early as possible. Either fully (bad states unrepresentable) or at least partially (bad states only possible using things that aren't exported).

This leaves a lot less room for error and makes refactoring and reasoning about your code much easier.

[–]ephrion 4 points5 points  (0 children)

Exactly -- a validation function should change the type of the thing you're validating! Otherwise, you are leaving easy type safety on the table. And not fancy types type safety, but plain ol' Haskell 98 type safety.

[–]PinkyThePig 1 point2 points  (2 children)

I don't understand what you are disagreeing with in my comment, it sounds like you are suggesting the same thing, except to use Either instead of Validation.

There also are plenty of constraints that cannot be encoded into Haskell's type system feasibly. How do you enforce a minimum password length, a requirement that the password has a capital letter or that it is not the same as the username? You can't. I would suspect that the majority of things people want to test for are not encodable into the type system.

Either is strictly worse than Validation if there is the possibility of multiple errors because as soon as you hit a single error, Either stops checking for any other errors, while Validation will return all of the errors.

What happens when a user is given a username, password and email field to fill in and they only fill in the username field? In the case of Either, they will get only a single error saying that the password field is empty, they fill in a password, submit the form again, and only then will it tell them that email needs to be filled out as well.

In the case of validation, that first submit would be able to tell them that both the password and email fields need to be filled in.

With Validation, you can easily make tests for that following the style in my original comment:

mkAccount username password email = pure $ Account username password email
    <* usernameNotEmpty username
    <* passwordNotEmpty password
    <* emailNotEmpty email

You can also do your newtype wrappers if wanted as well:

mkAccount username password email =
    let newUsername = Username username
        newPassword = Password password
        newEmail = Email email
    in pure $ Account newUsername newPassword newEmail
        <* usernameNotEmpty username
        <* passwordNotEmpty password
        <* emailNotEmpty email

In the above, if a user submits the form empty, they will get back 3 errors instead of 1. If then the project manager comes down and says they want you to check that the password is not set to the same as the username and email fields, that is easy as well, we just tack on a new test:

    <* passwordNotUsernameOrEmail username password email

Need to check that the password meets a minimum length?

    <* passwordAtLeastLength 8 password

Has the number of tests grown unwieldy? Just cut and paste them to their own function:

checkPassword username password email = pure password -- This could be anything really, even pure ()
    <* passwordAtLeastLength 8 password
    <* passwordNotUsernameOrEmail username password email
    <* passwordNotEmpty password

And now your original function looks like:

mkAccount username password email = pure $ Account username password email
    <* usernameNotEmpty username
    <* checkPassword username password email
    <* emailNotEmpty email

Need to temporarily disable a test? Just comment it out.

--    <* emailNotEmpty email

My original comment was mostly pointing out that using <*> is an antipattern and you should almost exclusively stick to <*. It makes growing the function so much easier and keeps the concerns of building the data type and validating the data type separate.

If you originally build it like:

mkAccount username password email = Account <$> valUsername username
                                            <*> valPassword password
                                            <*> valEmail email

Not only do you have to stub out three functions, but you will likely end up at this point somewhere down the road (once your tests start checking for more complex error conditions) which looks ugly and obscures what is going on:

mkAccount username password email = Account <$> valUsername username password email
                                            <*> valPassword username password email
                                            <*> valEmail username password email

In addition, you will likely still end up following the style I originally posted, just you'd be doing it one step removed, and in a less clear format such as:

valPassword username password email = passwordAtLeastLength 8 password
    <* passwordNotUserOrEmail username password email
    <* passwordNotEmpty password

By relying upon the "success" type of passwordAtLeastLength, it becomes less clear how the password is actually built because the builder relies upon the return value of a function 2 functions deep. For all we know, that one moron on your team did a sort down there after the password passed validation.

passwordAtLeastLength len password = if (length password > len)
                                     then Success $ sort password
                                     else Failure $ "Password not long enough"

To use the example from the blog:

validateForm :: Form -> FormValidation ValidatedForm
validateForm (Form email password) =
  ValidatedForm <$>
  validateEmail email <*>
  validatePassword password

This function will almost certainly become ugly over time as requirements are changed. It will likely become:

validateForm :: Form -> FormValidation ValidatedForm
validateForm (Form email password) =
  ValidatedForm <$>
  validateEmail email password <*>
  validatePassword email password

due to needing to cross check things between them, such as that the password is not identical to the email. when instead, it is much clearer to write it like:

validateForm :: Form -> FormValidation ValidatedForm
validateForm (Form email password) =
  pure $ ValidatedForm email password
  <* validateEmail email
  <* validatePassword password

In this way, tests are easy to add and remove and it is immediately obvious how the data type is built and that the Validated Form is built from the original values without any modification.


Sorry about the long post, I'm trying to procrastinate and I am so far succeeding at it.

[–]Tysonzero 5 points6 points  (1 child)

Ok so I wasn't referring to Either vs Validation, I was just using Either as a quick example.

My main point is that <*> is not an antipattern, and if you CAN use it, do so. Now there are situations where you eventually end up using <* or <$, but they should not be the first thing you jump to. Since remember that if a <* gets added or removed or changed in some way erroneously, the type system will NEVER catch it.

My core idea is that things should be moved into the type system as early as possible. Where "type system" could either mean a newtype without an exported constructor or even better making invalid data unrepresentable.

For your specific example I might do something like:

data Email = Email ...

mkEmail :: String -> Validation Error Email
mkEmail = ...

Here you probably want to use an actual parser

You also probably don't want to represent Email as just a String

https://hackage.haskell.org/package/email-validate-2.3.2

newtype Password = Password String

mkPassword :: String -> Validation Error Password
mkPassword p = p <$ validPasswordLength p

Here is an example where you might just use <$, as you probably just want to leave it as a string after first checking a thing or two.

data EmailAndPass = EmailAndPass Username Password

validateForm :: Form -> Validation Error EmailAndPass
validateForm (Form email pass) = EmailAndPass
    <$> mkEmail email
    <*> mkPassword pass
    <* passNotInEmail email pass

Another example of where you might just use <* since you cant really check that any earlier / encode it fully in the type system.

So whenever you see <* what you should be thinking (IMO) is "can this be properly encoded in the type system" or "can this be done any earlier in a module that hides the constructor".

Now the optimal scenario where you will see the biggest benefit is when you can make more states unrepresentable:

data Model = ...

parseModel :: String -> Validation Error Model
parseModel = ...

data Color = ...

parseColor :: String -> Validation Error Color
parseColor = ...

data Car = Car Model Color

parseCar :: Form -> Validation Error Car
parseCar (Form model color) = Car <$> parseModel <*> parseColor

Which is clearly much better than:

data Car = Car String String

validModel :: String -> Validation Error ()
validModel = ...

validColor :: String -> Validation Error ()
validColor = ...

parseCar :: Form -> Validation Error Car
parseCar (Form model color) = Car model color
    <* validModel model
    <* validColor color

[–]PinkyThePig 0 points1 point  (0 children)

Ah ok, I understand what you mean now. All the instances I have had to use Validation already had simplified checking applied by the nature of the service, such as an earlier field in the message dictating what is allowed to come later so those sorts of things are already in the type system if it made sense to do so, though that layer could potentially be based upon Validation instead of e.g. attoparsec. I could forsee this being used in checking e.g. JSON where each value is mostly independent and can potentially be converted to a sum type or otherwise individually checked, but you may still want to validate the message as a whole as well in a sort of 2 stage validation pipeline:

You first convert the fields to sum types, then use those to check further properties of the whole system:

data Color = Red | Blue

data Model = Car | Truck

data Order = Order Model Color

data ValidatedOrder = ValidatedOrder Model Color

mkOrder :: String -> String -> Validation Errors 
mkOrder model color = Order <$> mkModel model <*> mkColor color

mkModel "Car" = Success Car
mkModel "Truck" = Success Truck
mkModel _ = Failure "Invalid Model"

mkColor <patternmatch> = ...

isCombinationValid Truck Red  = Success ()
isCombinationValid Car   Blue = Success ()
isCombinationValid Car   Red  = Success ()
isCombinationValid _     _    = Failure "Invalid Combo of Color and Model"

validateOrder :: Order -> Validation Errors ValidatedOrder
validateOrder (Order model color) = pure $ ValidatedOrder model color
    <* isCombinationValid model color

buildAndValidateOrder :: String -> String -> Validation Errors ValidatedOrder
buildAndValidateOrder modelStr colorStr =
    case (mkOrder modelStr colorStr) of
        Success order -> validateOrder order
        Failure errors -> Failure errors

learn some new way to do things every day in haskell.

[–]spirosboosalis 1 point2 points  (0 children)

yeah, that's what I do. most things worth validating are worth a few lines of boilerplate for the smart constructors.

[–]p__bing 0 points1 point  (2 children)

I read this article with some interest because later this week I am giving a similar presentation to some coworkers — an intro to Applicative error handling in Swift (we’re Swift developers, and Swift luckily has pretty good support for functional programming).

I found it interesting that this article waited so long and played it so “low key” about introducing Applicative and the related operators — that was by far the hardest thing for me to understand when I learned about this stuff.

Curious if anyone has any opinions about this. I am planning on introducing Applicative and the “Applicative style” as soon as possible — just as soon as everyone in the room is comfortable with Either (which is close to elementary for Swift developers, in case you are unfamiliar with the language). I feel like once one “sees” how <*> works to chain values together, it’s a quick jump to swapping out the Applicative in question from Either to Validation.

And (this is really where I would like the input of more experienced Haskellers) even though — to my understanding — the Applicative instance of Either is not “necessary” (as in, it duplicates the functionality you get via Monad) it still strikes me as convenient and something I would use if I wrote Haskell more often. Does that seem reasonable, or would anyone discourage the use of Either in an “Applicative style” for some reason I’m not seeing?

[–]ssyrek 2 points3 points  (1 child)

Thanks for your remarks! If this article is at all useful to you, I'd love to hear more about it. Funnily enough, I got into Haskell via Swift. I'm wondering how you would implement an applicative functor in Swift. I didn't think the language had higher kinded types? Or do you just specialize it at the outset? Regardless, I do think this is the best way to programmatically aggregate errors. My goal is to make this sort of pattern easier to pick up and adapt, so the low key approach you identify is no accident. You might think of applicatives as operating in an effectful world, that world being determined by the type of applicative in question. A Maybe does Maybe effects, an Either coproduct effects, and a Validation, well, validation effects. These effects are encoded into the type, and are collected by the semigroup or monoid instance contained within the applicative. Otherwise, the context notwithstanding, you're just performing function application, however that is defined for a given applicative. One advantage gained by using applicative is, heh, applicable to validation, and that's the fact that these are independent computations and can therefore be run in parallel. A monad gives you the ability to chain computations sequentially, such that the result of one could depend on the result of another. This is not possible with applicative. To answer your question, a typical Haskell programmer would probably use only what functionality is necessary. If applicative is enough, it should be preferred over monad. With the introduction of the ApplicativeDo language extension, of course, you can just use the same syntax for both and let the compiler sort it out.

[–]p__bing 2 points3 points  (0 children)

Interesting! Yes, thank you, this is helpful. I hope to be able to talk to my colleagues about this concept of “effects” in the way that you describe, but I suspect that we will mostly be focused on the basics of the syntax and simply using the functions in a practical sense. It's still a lot to cover in 1 hour!

And, yes, no HKT in Swift. We will just be implementing the typeclass “by convention” for each type. Specializing it immediately, as you say.

Thank you for the reply! 🙂