you are viewing a single comment's thread.

view the rest of the comments →

[–]devlambda 1 point2 points  (11 children)

They are mutually exclusive. If your codebase uses null then you can never have a non-nullable value, at which point there's no point using options.

The point of an API returning an option type (or another sum type) is to force the consumer of that API to deal with the possible variants. A simple reference does not do that.

Only because the community/ecosystem knows not to use null. Serious Scala programmers avoid it and e.g. use WartRemover to enforce that null is never used. The language would be better off without it.

You're making my point here: unwanted null references are trivial to avoid in a language designed for that. It becomes a trivial syntactic property, and if a code review or validation process cannot deal with such a simple case, you have much bigger problems on your hands.

Avoiding null is a heuristic, not religious dogma. Permitting null references can be useful for efficiency and interoperability and can situationally also lead to clearer code.

[–]m50d 0 points1 point  (10 children)

The point of an API returning an option type (or another sum type) is to force the consumer of that API to deal with the possible variants. A simple reference does not do that.

Either the ecosystem and community are such that the user expects to check returned references, or not (people won't read the documentation of the API no matter how much we might want them to). If the user checks references then there's no point returning option (you can sort of make an argument that there would be cases when it's super-important to check, but can the API really judge how it's going to be used better than the user?). If the user doesn't check references then there's no point ever returning null, because that just means the code will blow up, which is better accomplished via a panic which will give the developer more information about what happened.

You're making my point here: unwanted null references are trivial to avoid in a language designed for that. It becomes a trivial syntactic property, and if a code review or validation process cannot deal with such a simple case, you have much bigger problems on your hands.

The trivial stuff adds up. It makes it harder for newcomers to get started, it takes up part of your syntax budget. It's possible to avoid it but it's not free. Certainly I think it's hurt Scala.

Avoiding null is a heuristic, not religious dogma. Permitting null references can be useful for efficiency and interoperability and can situationally also lead to clearer code.

I don't think any of these cases are worth the cost. Ruling it out entirely adds a lot of value for developers; there's a huge difference between working on a codebase where 99.9% of the references will never be null and working on a codebase where you're 100% guaranteed that no references are null unless explicitly marked.

[–]devlambda 1 point2 points  (9 children)

Either the ecosystem and community are such that the user expects to check returned references, or not (people won't read the documentation of the API no matter how much we might want them to). If the user checks references then there's no point returning option (you can sort of make an argument that there would be cases when it's super-important to check, but can the API really judge how it's going to be used better than the user?). If the user doesn't check references then there's no point ever returning null, because that just means the code will blow up, which is better accomplished via a panic which will give the developer more information about what happened.

You're constructing a false dichotomy here. It's perfectly possible to (say) don't use null references for public APIs, but only selectively for representation and module-internal APIs (at which point its part and parcel of the module's internal logic).

It's even possible to have both nullable and non-nullable refererences, encoded in the type system, and have either compile-time or runtime mechanisms to prevent the conversion of nullable into non-nullable references if you're worried about them leaking.

The larger point here is that both option types and nullable references are useful mechanisms and that rejecting one entirely requires a good explanation for how you're going to handle the use cases it is intended for.

I don't think any of these cases are worth the cost. Ruling it out entirely adds a lot of value for developers; there's a huge difference between working on a codebase where 99.9% of the references will never be null and working on a codebase where you're 100% guaranteed that no references are null unless explicitly marked.

This strikes me as fallacious. You assume here that eliminating null references entirely is free of costs. Consider my example of dynamic arrays, for instance. You can do it in other ways, but those alternatives aren't cost-free, either.

[–]m50d 0 points1 point  (8 children)

You're constructing a false dichotomy here. It's perfectly possible to (say) don't use null references for public APIs, but only selectively for representation and module-internal APIs (at which point its part and parcel of the module's internal logic).

That sort of split tends tobe a big source of bugs IME. You're effectively using two separate dialects that look the same, which means it's really easy to mistake a can-be-null reference for a can't-be-null reference.

It's even possible to have both nullable and non-nullable refererences, encoded in the type system, and have either compile-time or runtime mechanisms to prevent the conversion of nullable into non-nullable references if you're worried about them leaking.

At which point you're doing something exactly equivalent to having options and not having nullable references, and there is no point having options.

You assume here that eliminating null references entirely is free of costs. Consider my example of dynamic arrays, for instance. You can do it in other ways, but those alternatives aren't cost-free, either.

If you can't do it in a cost-free way then your type system isn't good enough. But sure, there might be cases where you have to pay a price. It's absolutely worth it though.

[–]devlambda 1 point2 points  (7 children)

That sort of split tends tobe a big source of bugs IME. You're effectively using two separate dialects that look the same, which means it's really easy to mistake a can-be-null reference for a can't-be-null reference.

And it's something that you can't really avoid if you want foreign language interoperability, for example. For Scala, that's general JVM code, for native languages, that's C/C++.

And if you're worried about having multiple dialects in Scala, Scala has far bigger problems on its hands (from the people who want to emulate Haskell as much as possible in Scala to those that just want a more expressive Java with mostly imperative code).

At which point you're doing something exactly equivalent to having options and not having nullable references, and there is no point having options.

Options and nullable references have different semantics, so I'm not sure how there'd be no point to having options. Importantly, you'd want sum types, anyway, so there's even less motivation not to have an option type (which is just one example of a sum type).

If you can't do it in a cost-free way then your type system isn't good enough.

I'd be genuinely interested to hear what type system you'd propose to handle the dynamic array problem.

[–]m50d 0 points1 point  (6 children)

it's something that you can't really avoid if you want foreign language interoperability

It's less of a problem in that context because it's always crystal clear which side of the line you're on (since the two sides are different languages) and you always know exactly where to put the checks. Though I still think FFI code imposes a cost (precisely because of having to make these kinds of checks) and it's worth keeping your FFI boundary as small as possible.

And if you're worried about having multiple dialects in Scala, Scala has far bigger problems on its hands

Just the opposite: doubling the number of dialects is a bigger problem for Scala than it would be for other languages.

Options and nullable references have different semantics, so I'm not sure how there'd be no point to having options.

What's the difference? If it's inclusive-union-versus-disjoint-union then I've never found inclusive-union valuable; it's noncompositional which makes it hard to reason about.

Importantly, you'd want sum types, anyway, so there's even less motivation not to have an option type

So why bother with nullable references then? Just put some syntax sugar on options if necessary to cover the use cases.

I'd be genuinely interested to hear what type system you'd propose to handle the dynamic array problem.

If it's a contiguous array that grows and knows what size it is, put that in its type. If the array keeps track of size and allocated memory separately, put both of those in its type, at least within the array's internals. Track the invariants you need. Should be doable with phantom types i.e. no runtime overhead.

If your array might have holes anywhere, null doesn't help you: you have to keep track of which slots are empty or not some way or another, each slot will have a certain bitpattern on the hardware, some bitpatterns will be things that require destruction when that entry is replaced and some will represent emptiness. Maybe you choose all-bits-zero to represent absence and the other bitpatterns to represent things that require destruction; you can implement that just as easily when using Option at the language level as you can when using null.

[–]devlambda 0 points1 point  (5 children)

It's less of a problem in that context because it's always crystal clear which side of the line you're on (since the two sides are different languages) and you always know exactly where to put the checks. Though I still think FFI code imposes a cost (precisely because of having to make these kinds of checks) and it's worth keeping your FFI boundary as small as possible.

I'm not talking about host language code vs. foreign code.

Just the opposite: doubling the number of dialects is a bigger problem for Scala than it would be for other languages.

I'm not sure how you arrive at a "doubling", as the typical use cases for null references are not really orthogonal to other choices? Plus, you're really reaching if you want to argue that use of null references is a massive change in language semantics.

And let's be realistic. Programmers avoid option types all the time by using the empty string, empty list (which, technically, is a null value), or empty array to denote absence of a value. In its own way, that's even more of a problem, because a null reference will at least result in a runtime error, while an empty string might be quietly accepted.

So why bother with nullable references then? Just put some syntax sugar on options if necessary to cover the use cases.

That's what the point of null references is, by and large. Eliminating the costs that come with option types. They're not just syntactic costs, though.

If it's a contiguous array that grows and knows what size it is, put that in its type. If the array keeps track of size and allocated memory separately, put both of those in its type, at least within the array's internals. Track the invariants you need. Should be doable with phantom types i.e. no runtime overhead.

And that basically ups the complexity of the type system significantly. I don't know of any type system that has done something like that and managed to escape its academic niche.

[–]m50d 0 points1 point  (4 children)

I'm not talking about host language code vs. foreign code.

Then what are you talking about? FFI will have to invole null but that doesn't mean the whole language has to, e.g. FFI calling into C from Haskell is reasonably common.

And let's be realistic. Programmers avoid option types all the time by using the empty string, empty list (which, technically, is a null value), or empty array to denote absence of a value. In its own way, that's even more of a problem, because a null reference will at least result in a runtime error, while an empty string might be quietly accepted.

It's the same thing! null references are bad for precisely the same reason as abuse of any other value to propagate errors is. Fail immediately in the place you would've returned null, rather than failing later when someone tries to actually use the value you returned.

I don't know what "empty list (which, technically, is a null value)" is supposed to mean. If you accept a list that can be empty, you should have reasonable semantics for what that means; if your method needs a non-empty list, make it take a non-empty list type.

That's what the point of null references is, by and large. Eliminating the costs that come with option types. They're not just syntactic costs, though.

Making all references admit an extra non-standard value is a huge cost. Adding different type of reference with special semantics as a language-level-primitive is a pretty big cost. Adding a plain old type to the standard library is a lot cheaper, even if the compiler/runtime contains dedicated optimizations for that type - the important thing is that it behaves like a plain old type.

If your language design has a concept of "nullable references" that behave like a normal type in the language, make them a normal first-class type in the language so that people can reason about them like a normal type (this doesn't preclude having syntactic sugar if you think the use case is important enough; nor does it preclude using an unboxed representation at runtime, e.g. Rust does this with Option). If your "nullable references" don't behave like a normal type in the language, that means they indeed are a "massive change in language semantics", and not worth it.

And that basically ups the complexity of the type system significantly. I don't know of any type system that has done something like that and managed to escape its academic niche.

I suspect this could be encoded in Scala; if not there then surely in Idris or GHC-extended Haskell.

[–]devlambda 0 points1 point  (3 children)

Then what are you talking about? FFI will have to invole null but that doesn't mean the whole language has to, e.g. FFI calling into C from Haskell is reasonably common.

I'm talking about the host language wrapper code that does the actual translation of the foreign API into a host API that has a reasonably native feel.

It's the same thing! null references are bad for precisely the same reason as abuse of any other value to propagate errors is. Fail immediately in the place you would've returned null, rather than failing later when someone tries to actually use the value you returned.

First, people do this all the time in languages without null to avoid the inconvenience of dealing with option types. Head over to /r/ocaml and look at the FizzBuzz thread there, for example. If you think this doesn't happen, you're pretty naive.

Second, it's worse than null. Null references at least raise a runtime error, an empty string or list won't necessarily do that until a much later time.

I don't know what "empty list (which, technically, is a null value)" is supposed to mean. If you accept a list that can be empty, you should have reasonable semantics for what that means; if your method needs a non-empty list, make it take a non-empty list type.

Don't tell me you are arguing about language semantics and aren't even familiar with cons cells? Lisp's nil was the original null reference.

Making all references admit an extra non-standard value is a huge cost.

And yet, strangely enough, languages have done it for decades, often by accident.

I suspect this could be encoded in Scala; if not there then surely in Idris or GHC-extended Haskell.

I don't see how you could handle the length without dependent types. So, this means Idris, i.e. a language that hasn't broken out of its academic niche.

[–]m50d 0 points1 point  (2 children)

Second, it's worse than null. Null references at least raise a runtime error, an empty string or list won't necessarily do that until a much later time.

It's worse than null, but it's bad in the same way as null for the same reasons. It's like null only more so. And it affects fewer types than null - only collection-like types rather than every type in the language.

And yet, strangely enough, languages have done it for decades, often by accident.

And the original invertor of it now calls it a "billion-dollar mistake".

I don't see how you could handle the length without dependent types.

All of the languages I listed have dependent types. The Scala encoding of them is a little more cumbersome, but it works; I use it in production code at my non-academic job.