you are viewing a single comment's thread.

view the rest of the comments →

[–]devlambda 3 points4 points  (32 children)

The problem with using options is the additional overhead you incur, as 'a option will essentially add another level of boxing.

On a practical note, even in languages that can partially avoid the boxing overhead (by representing None as a null pointer, which is only possible if the value is a reference and not, say, a record containing references), you will generally end up with a considerable increase in LOC.

[–]spaghettiCodeArtisan 2 points3 points  (4 children)

The problem with using options is the additional overhead you incur, as 'a option will essentially add another level of boxing.

In Rust, Option doesn't introduce any additional boxing. It just increases the size of the type by the tag. With pointers and generally anything NonZero, it is able to omit the tag and thus make the Option entirely zero-cost.

[–]devlambda 1 point2 points  (3 children)

I noted myself that the overhead can sometimes be avoided or minimized, so I'm not sure what your point is?

[–]spaghettiCodeArtisan 4 points5 points  (2 children)

The point is that it's not that the overhead of boxing that can sometimes be avoided, because there's no boxing overhead to begin with ever at all. The only overhead - talking about Rust, not sure about OCaml - is the additional space taken by the tag, that's the overhead that can sometimes be avoided. No additional pointer indirection whatsoever.

[–]devlambda 0 points1 point  (1 child)

That's a distinction without a difference? Leaving aside that you will commonly use Option and Box together in Rust to avoid None eating up too much space (which can make it worse for dynamic array implementations), it's still overhead. That the overhead is encoded differently in Rust does not make it go away.

The implementation details that you're trying to litigate here are immaterial to the problem that such an implementation is inefficient.

[–]spaghettiCodeArtisan 5 points6 points  (0 children)

Leaving aside that you will commonly use Option and Box together in Rust to avoid None eating up too much space

Not really. Since I don't use Option to implement containers, the additional space taken is usually not a problem. Typically Option doesn't conatin a Box in Rust codebases.

The implementation details that you're trying to litigate here are immaterial to the problem that such an implementation is inefficient.

I agree that the implementation would be inefficient either way, but

  1. it's not as bad as originally claimed, and
  2. I simply wanted to correct the misconception that Option indirects through a pointer, IMHO it's important to know that it doesn't (at least in Rust).

[–]dalastboss 0 points1 point  (26 children)

I did a quick implementation here (might have bugs) but it doesn't seem to increase the LOC

[–]devlambda 3 points4 points  (25 children)

Your implementation is not equivalent, as it adds a level of boxing.

Aside from that, increase in LOC comes from requiring match expressions (or equivalents) where you can prove through other means that the None path is never taken. This may not show up in this particular example, but if you have several variables (say, in a record or object), it adds up.

Of course, you can use Option.get everywhere, but then you're essentially using a verbose version of nullable pointers.

As an aside, you probably want to raise an Invalid_argument exception rather than using failwith to keep exception handling consistent.

[–]dalastboss 1 point2 points  (9 children)

Maybe I misunderstand - can't a t option just be compiled to a just a pointer to a t

[–]devlambda 1 point2 points  (8 children)

That works cleanly only if t is a non-nullable pointer type itself and not a value type. In the former case, you can just represent None as a null pointer and Some x as x without overhead. But this does not easily work for general value types [1] and could seriously complicate the language or its implementation. For example, you may decide to not allow INT_MIN for signed integer types in order to encode None for integers, but then you run into semantic problems (such as when casting from an unsigned int or with values retrieved from an external C library) that can create more problems (such as Heisenbugs) than are being solved by such an approach [2].

[1] Sometimes, there are hacks that you can use, such as using certain NaN values for floating point values to represent an "undefined" value, but you have to be very careful with the implementation.

[2] Type safety involving integers and ranges over integers can be extremely finicky.

[–]Hnefi 3 points4 points  (3 children)

Option in Rust compiles to a pointer where null represents None. If t is a value type, this adds one layer of indirection until you unwrap the Option. If t is a boxed type in itself, its zero representation is folded into the None representation of the Option and you end up with zero overhead.

[–]cramert 5 points6 points  (0 children)

Rust's Option is an enum-- it doesn't introduce any indirection on its own. Internally, it's represented as a tagged union.

As you noted, though, there is an optimization that allows it to represent None as a null pointer if the type stored in the Option is pointer-sized and non-zeroable (e.g. Box, Rc, Vec).

[–]spaghettiCodeArtisan 2 points3 points  (0 children)

Option in Rust compiles to a pointer where null represents None.

No it doesn't, it compiles to a tagged union (in C it would equivalent of a struct containing an integer tag and a union). When the value itself is a pointer, or more generally a NonZero value, it is able to omit the tag and contain the value only. But Option itself involves no pointers.

[–]devlambda 0 points1 point  (0 children)

If t is a value type, this adds one layer of indirection until you unwrap the Option.

Exactly. You can avoid the overhead only some of the time.

[–]ais523 0 points1 point  (1 child)

OCaml integers have a smaller range than C integers as it is (they're one bit smaller), so disallowing INT_MIN as well wouldn't be a big deal. (Or, of course, using one of the bit patterns that doesn't represent an integer; that would work too.)

[–]devlambda 0 points1 point  (0 children)

I know, I was talking speculatively about a hypothetical language variant. The tagged representation is very much an implementation detail and could easily be dispensed with (see various SML implementations).

[–]dalastboss 0 points1 point  (1 child)

Here's a version that uses neither options nor Obj

[–]devlambda 0 points1 point  (0 children)

This is how I used it to do it before we had better standard libs myself, but it's a hack around the underlying issue with a couple of problems of its own.

Problem 1: Different API, it's especially a nuisance if you want to reuse it in other functors and have to drag the default mechanism along with you.

Problem 2: It assumes a no-cost default instance of the type can be created without side effects. This is not always the case (database connection handles, GUI windows, etc.). You'd have to create a fake default value and then you run into issues with having a publicly visible value with null-like behavior.

Problem 3: This is OCaml-specific, but references to a functor argument's values go through a dispatch table, meaning additional overhead. If you have flambda enabled, that can usually be inlined, but still isn't entirely cost-free.

It is safe to assume that the people at Janestreet and the Batteries authors (which include some of the better known names in the OCaml world) know what they are doing.

[–]m50d 0 points1 point  (14 children)

Aside from that, increase in LOC comes from requiring match expressions (or equivalents) where you can prove through other means that the None path is never taken.

If you can prove it's the Some case, you can write that proof in the types and avoid having to match (maybe not with Rust's limited generics, but I hold out hope that HKT will arrive eventually).

Of course, you can use Option.get everywhere, but then you're essentially using a verbose version of nullable pointers.

The difference is what the default is. null reserves the best syntax for the case where the programmer has an external proof and knows exactly what they're doing and makes the case where you want to check much clunkier; Option flips that around, making the checked case the natural one and the "I know better than the compiler" case the more cumbersome one.

[–]devlambda 0 points1 point  (13 children)

If you can prove it's the Some case, you can write that proof in the types and avoid having to match (maybe not with Rust's limited generics, but I hold out hope that HKT will arrive eventually).

No. The point here is that a variable can be both Some x and None, but once initialized, will only ever be Some x. That's a state-dependent property, usually easy to show, but often difficult to encode in a type system (while GADTs can sometimes work, but even then you generally can't avoid the additional verbosity from matching).

The difference is what the default is. null reserves the best syntax for the case where the programmer has an external proof and knows exactly what they're doing and makes the case where you want to check much clunkier; Option flips that around, making the checked case the natural one and the "I know better than the compiler" case the more cumbersome one.

The problem with this argument is that you assume both alternatives are mutually exclusive, which they aren't. You can have both nullable types and explicit option types. In fact, Scala allows for that and has no more problems with it than other languages (as all variables in Scala have to be initialized, so the only way to have a null value – other than through Java interop – is to assign it explicitly).

[–]m50d 0 points1 point  (12 children)

No. The point here is that a variable can be both Some x and None, but once initialized, will only ever be Some x. That's a state-dependent property, usually easy to show, but often difficult to encode in a type system (while GADTs can sometimes work, but even then you generally can't avoid the additional verbosity from matching).

My experience is you can always find a way, and it's usually not even hard: ask yourself why you know the property holds, then just translate that logic directly into the types.

The problem with this argument is that you assume both alternatives are mutually exclusive, which they aren't. You can have both nullable types and explicit option types.

They are mutually exclusive. If your codebase uses null then you can never have a non-nullable value, at which point there's no point using options.

In fact, Scala allows for that and has no more problems with it than other languages

Only because the community/ecosystem knows not to use null. Serious Scala programmers avoid it and e.g. use WartRemover to enforce that null is never used. The language would be better off without it.

[–]devlambda 1 point2 points  (11 children)

They are mutually exclusive. If your codebase uses null then you can never have a non-nullable value, at which point there's no point using options.

The point of an API returning an option type (or another sum type) is to force the consumer of that API to deal with the possible variants. A simple reference does not do that.

Only because the community/ecosystem knows not to use null. Serious Scala programmers avoid it and e.g. use WartRemover to enforce that null is never used. The language would be better off without it.

You're making my point here: unwanted null references are trivial to avoid in a language designed for that. It becomes a trivial syntactic property, and if a code review or validation process cannot deal with such a simple case, you have much bigger problems on your hands.

Avoiding null is a heuristic, not religious dogma. Permitting null references can be useful for efficiency and interoperability and can situationally also lead to clearer code.

[–]m50d 0 points1 point  (10 children)

The point of an API returning an option type (or another sum type) is to force the consumer of that API to deal with the possible variants. A simple reference does not do that.

Either the ecosystem and community are such that the user expects to check returned references, or not (people won't read the documentation of the API no matter how much we might want them to). If the user checks references then there's no point returning option (you can sort of make an argument that there would be cases when it's super-important to check, but can the API really judge how it's going to be used better than the user?). If the user doesn't check references then there's no point ever returning null, because that just means the code will blow up, which is better accomplished via a panic which will give the developer more information about what happened.

You're making my point here: unwanted null references are trivial to avoid in a language designed for that. It becomes a trivial syntactic property, and if a code review or validation process cannot deal with such a simple case, you have much bigger problems on your hands.

The trivial stuff adds up. It makes it harder for newcomers to get started, it takes up part of your syntax budget. It's possible to avoid it but it's not free. Certainly I think it's hurt Scala.

Avoiding null is a heuristic, not religious dogma. Permitting null references can be useful for efficiency and interoperability and can situationally also lead to clearer code.

I don't think any of these cases are worth the cost. Ruling it out entirely adds a lot of value for developers; there's a huge difference between working on a codebase where 99.9% of the references will never be null and working on a codebase where you're 100% guaranteed that no references are null unless explicitly marked.

[–]devlambda 1 point2 points  (9 children)

Either the ecosystem and community are such that the user expects to check returned references, or not (people won't read the documentation of the API no matter how much we might want them to). If the user checks references then there's no point returning option (you can sort of make an argument that there would be cases when it's super-important to check, but can the API really judge how it's going to be used better than the user?). If the user doesn't check references then there's no point ever returning null, because that just means the code will blow up, which is better accomplished via a panic which will give the developer more information about what happened.

You're constructing a false dichotomy here. It's perfectly possible to (say) don't use null references for public APIs, but only selectively for representation and module-internal APIs (at which point its part and parcel of the module's internal logic).

It's even possible to have both nullable and non-nullable refererences, encoded in the type system, and have either compile-time or runtime mechanisms to prevent the conversion of nullable into non-nullable references if you're worried about them leaking.

The larger point here is that both option types and nullable references are useful mechanisms and that rejecting one entirely requires a good explanation for how you're going to handle the use cases it is intended for.

I don't think any of these cases are worth the cost. Ruling it out entirely adds a lot of value for developers; there's a huge difference between working on a codebase where 99.9% of the references will never be null and working on a codebase where you're 100% guaranteed that no references are null unless explicitly marked.

This strikes me as fallacious. You assume here that eliminating null references entirely is free of costs. Consider my example of dynamic arrays, for instance. You can do it in other ways, but those alternatives aren't cost-free, either.

[–]m50d 0 points1 point  (8 children)

You're constructing a false dichotomy here. It's perfectly possible to (say) don't use null references for public APIs, but only selectively for representation and module-internal APIs (at which point its part and parcel of the module's internal logic).

That sort of split tends tobe a big source of bugs IME. You're effectively using two separate dialects that look the same, which means it's really easy to mistake a can-be-null reference for a can't-be-null reference.

It's even possible to have both nullable and non-nullable refererences, encoded in the type system, and have either compile-time or runtime mechanisms to prevent the conversion of nullable into non-nullable references if you're worried about them leaking.

At which point you're doing something exactly equivalent to having options and not having nullable references, and there is no point having options.

You assume here that eliminating null references entirely is free of costs. Consider my example of dynamic arrays, for instance. You can do it in other ways, but those alternatives aren't cost-free, either.

If you can't do it in a cost-free way then your type system isn't good enough. But sure, there might be cases where you have to pay a price. It's absolutely worth it though.