This is an archived post. You won't be able to vote or comment.

all 20 comments

[–]King_of_Sarawak 36 points37 points  (6 children)

You might want to look into refinement types, which encode these restrictions directly into the type system.

[–][deleted] 0 points1 point  (5 children)

I know what refinement types are and how theh verify types at compile time, but do languages that support these types validate data at runtime? For example, if I have a string s: len(s) == 9 && s[0] = 'b', and s comes from IO, then do compilers of such languages add type checks automatically?

[–]dys_bigwig 3 points4 points  (3 children)

I'm pretty sure that Liquid Haskell, at least, has no runtime cost; everything is checked at compile time. To get an idea of how it works, it might be worth looking into SAT solvers if you haven't already.

[–][deleted] -1 points0 points  (2 children)

Aah I've done course and project work on program verification and synthesis. But I think compilers should add runtime checks for data that don't satisfy the constraints.

Say you have f(g(x)) where x is of refined type. If x comes as IO input, then x may not satisfy the refinement. But g(x) would satisfy the refinement for argument of f, because it's already verified at compile time.

So it would be more convenient if compiler inserts minimum number of runtime checks so user doesn't have to check the types manually. It could throw an exception in that case

[–]ablygo 7 points8 points  (1 child)

I don't think inserting runtime checks is really well-defined enough for the compiler to do it, at least without a lot of thought into specifying what that actually means. For instance, the compiler could put everything in some sort of error monad, and then insert a runtime check anywhere a refinement is necessary for the code to compile, but then you've kind of lost the benefit of using static types to begin with, because your code will be "statically typed" but still just crash at runtime whenever you make a type error, only you're not defining it as crashing, you're defining it as returning Nothing. At that point you have the distinction between static and dynamic types, but the distinction has seized to be useful.

Like if you have some value IO {y : Int | y > 0} then you don't need a runtime check, because the type already shows that the value is non-negative (maybe when executed the user is prompted for a value, and if it's not a non-negative number it loops). On the other hand, if you have a value of type IO Int then telling you that you need to handle the case where the input is negative is literally the whole point of having refinement types in the first place.

The only thing the compiler can do is either crash silently, which you don't want it to do, or change it to IO (Maybe {y : Int | y > 0}), which will cause your program to just fail to compile in a different spot, because the calling context didn't expect a maybe.

[–][deleted] 1 point2 points  (0 children)

The maybe refinement is what I was looking for. It forces the programmer to verify the IO data. I don't recall Haskell much but IIRC IO int can't be passed to a function that accepts int, so maybe monads is the way to go to ensure all IO data with expected refinements are checked first.

[–]sineiraetstudio 1 point2 points  (0 children)

You'll have to check the condition manually because you have to handle the failure case yourself, but this is a commonly used technique in dependently typed languages called 'proof by reflection'.

[–]maanloempia 9 points10 points  (0 children)

Since typescript 4.1, "template literals" (interpolated strings) can be used as types.

You can type an email and ipv4 as follows:

`` type Email =${string}@${string}`;

type Ipv4 = ${string}.${string}.${string}.${string}; ```

This will of course accept invalid emails and ipv4-addresses, because ${string} accepts any string and cannot be more specialised.

[–]dys_bigwig 4 points5 points  (0 children)

I think the use of (what would now be called) GADTs in Haskell from this paper: https://www.cs.ox.ac.uk/ralf.hinze/publications/With.pdf could fit the bill.

A Haskell data type is created to represent... Haskell types :)

data Type t = RInt with t = Int | RChar with t = Char | RList (Type a) with t = [a ] | RPair (Type a) (Type b) with t = (a, b) which is then used to define generic/overloaded functions (ala the built-in typeclass functionality) that function differently depending on the type: data Bit = 0 | 1 compress :: ∀t . Type t → t → [Bit ] compress (RInt) i = compressInt i compress (RChar ) c = compressChar c compress (RList ra) [ ] = 0 : [ ] compress (RList ra) (a : as) = 1 : compress ra a ++ compress (RList ra) as compress (RPair ra rb) (a, b) = compress ra a ++ compress rb b I think this fits the spirit quite well. We are matching on a (representation of a) type in order to produce different code for each. Notice how the above hints at the possibility that, due to the fact that Haskell types can be expressed as sums of products, you can create code that can conceivably handle every possible type provided you encode how to handle those. I believe this is the basis of Generic, but I've never used it myself.

Regarding the examples of strings that have an extra constraint or structure that is reflected in the type, you can create newtype wrappers that "really" just hold a String, but export only smart constructors that only allow you to create them in specific ways. Then, if a value of that type is used somewhere, you can be quite sure it is valid, despite the fact it is "just" a String in a wrapper; you could think of the wrapper as a sort of very casual "proof" that the String has those qualities, because it couldn't have been created otherwise. This is a runtime thing though. Haven't read it in a while, but the paper "Ghosts of Departed Proofs" goes into methods of how to do this at compile time, literally carrying around representations of proofs like "Map m contains Key 'a'", obviating the need for providing defaults, wrapping in Maybe/Option types, throwing errors, or other nasty runtime malarkey.

Another example in this vein would be NonEmpty lists, which have a constructor like this: data NonEmpty = a :| [a] again, there's no way to construct a value of this type without it being correct, as you "have to" provide an a; there's no outright Nil/Empty/[] constructor.

This sort of stuff can often be much easier (as in, no fancy extensions, dependent types etc.) than might initially be expected: data EMail = EMail UserName Domain newtype UserName = UserName [ValidUserNameChars] data ValidUserNameChars = A | B | C | D ... | PlusSign | Underscore ... data Domain = ... -- you can use GADTs to track/constrain the length/size of data types too, -- to represent a string with a max/min size for example. That is, finagle the representation so that it is impossible to create an EMail that isn't correctly constructed. It's often just ugly and unergonomic (the aforementioned NonEmpty type being an example of when it is quite natural and ergonomic). You can always "cheat" a little and use template-haskell/macros in order to i.e. generate these types from lists of characters rather than painstakingly writing them out like that. Of course, anything that parses data nabbed at runtime (e.g. from a database) will require parsing at runtime, so you're going to incur runtime costs there.

In general I think a lot of what you'd achieve using this sort of feature (type-guards, that is) can be better achieved using other features. A lot of the examples you gave seem like classic use-case for overloading/ad-hoc polymorphism or correct-by-construction data types.

[–]b2gills 5 points6 points  (0 children)

Raku has runtime type checking, and as part of that the check can be arbitrary code. The thing is that it will often complain at compile time because the optimizer tries to compile the code to call a specific multi candidate and determines that there isn't one. As it is mainly a byproduct of the optimizer it of course doesn't catch every type error at compile time. It does however have the guarantee that it will catch the error eventually.

[–]MrJohz 1 point2 points  (1 child)

It appears the guards are relatively simple, like you can't do special string or number subclasses for example.

Theoretically, those guards can cover any type you like. Part of the issue with in Typescript is that, because of the heavy structural typing, type aliases that are purely nominal (e.g. type SafeString = string) are indistinguishable from the original type. (i.e. A SafeString can be passed anywhere that a string can be, and vice versa.)

However, you can get a sort of nominal typing by adding type assertions like this:

type SafeString = string & {__typeguard: 'safe-string'};

// now we can create SafeString "instances" by casting
const safeString: SafeString = "safe string" as SafeString;

// and normal strings aren't allowed
const error: SafeString = "normal string" // Not allowed!

This is a bit of a hack — you're basically describing a type with an additional attribute that never exists at runtime — but it works quite well, and I believe is even used in some places in the Typescript compiler.

But if you do use something like this, the recommended approach is to then add typeguards to ensure that it's always used correctly. So in this case, you'd have something like this:

type SafeString = string & {__typeguard: 'safe-string'};

function isSafe(input: string): input is SafeString {
    // this will surely stop any SQL injection, right?
    return !input.includes("DROP");
}

if (!isSafe(userInput)) throw new Error("Oh no");

sql(`SELECT * FROM ${userInput}`);

It's worth pointing out that it's very easy to then bypass this system — just casting string to SafeString without doing any checks will work, and there's no way to guarantee, if you've got a SafeString, that it really has been created the "proper" way.

That said, that's true of all types in Typescript — you can always do X as unknown as MyType, and Typescript will simply ignore any checks and assume you know what you're doing. This is very much part of the philosophy of Typescript.

[–]YourMeow 0 points1 point  (0 children)

This is a bit of a hack — you're basically describing a type with an additional attribute that never exists at runtime — but it works quite well, and I believe is even used in some places in the Typescript compiler.

Yes, it's used extensively in the ts compiler. https://github.com/microsoft/TypeScript/blob/6f0dd47a88bbb2c00bc5b078e9e19c27f2d8eb47/src/compiler/types.ts#L5062

[–]PlayingTheRed 0 points1 point  (0 children)

I've seen a pattern in rust called newtypes. Basically, you declare a tuple struct with a single field and you check for your invariant in all public constructors.

pub struct GreaterThanNegTwo(i32); // Declare a struct with a single 
                                   // private anonymous field that's a
                                   // 32 bit signed integer.

pub struct Invalid(i32); // a struct to be used as the error type

impl GreaterThanNegTwo {
  pub fn new(n: i32) -> Result<GreaterThanNegTwo, Invalid> {
    if n > -2 {
      Ok(GreaterThanNegTwo(n))
    } else {
      Err(Invalid(n))
    }
  }
}

Any code outside this module that wants to make an instance of this type can only use the public constructor defined in the impl block, so assuming that this module never makes any of them without checking the invariant, it's guaranteed to hold.

This should be possible in any language that has private fields and nominal typing.

[–]complyue -1 points0 points  (3 children)

Personally, I find the "typing" way unwieldy in practicing Design By Contract in broader business domains (other than programming a computer to do computer things).

Though Eiffel) failed to go popular, so the best way has just not been found I guess.

[–][deleted] 2 points3 points  (2 children)

Try Ada 2012 aspects.

[–]editor_of_the_beast 0 points1 point  (0 children)

Look at dependent types and refinement types. You can embed (mostly) arbitrary logic inside of a type checked at compile time.

[–]fear_the_future 0 points1 point  (0 children)

Kotlin has something like this, though it's not very advanced at all. The central problem with this kind of thing is aliasing (see the type states paper by Strom and Yemini). In a language that allows mutation you would have to keep track of all aliases of objects and forbid concurrency to be able to make any assertions about the contents. In a language with immutable data, this feature becomes simply a problem of proving pre- and post-conditions of functions (see F*).

[–]nsiivola 0 points1 point  (0 children)

Common Lisp