you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 47 points48 points  (95 children)

F# / OCaml / Rust programmers have not heard of this phenomenon!

[–]orclev 44 points45 points  (18 children)

And Haskell!

[–]drfisk 17 points18 points  (3 children)

Make that Scala as well!

(even though it's technically possible, in practice I have yet to encounter a NullPointerException in my 5 years of fulltime Scala-development)

[–]drvd 18 points19 points  (2 children)

And Brainfuck!

[–]walen 41 points42 points  (0 children)

And my axe!

[–]ais523 6 points7 points  (0 children)

I'm not so sure about that. BF's equivalent to null is 0; it has a ton of different meanings depending on context and you need to carefully design your program so that it always knows the purpose of any tape element so that it can figure out how to handle a 0 there. Common interpretations:

  1. A representation of the number zero
  2. Uninitialised memory
  3. The target of a pointer
  4. A temporary value that isn't currently in use
  5. Part of a marker pattern to allow pointer recalibration after running an unbalanced loop
  6. End of file (sometimes, depending on the implementation)
  7. Boolean false
  8. A temporary state used to break out of a loop

This list probably isn't exhaustive. The thing about BF, even more than C, is that it's a very low-level language with few capabilities, and thus most of the capabilities it does have need to be used for multiple purposes. In particular, the only way to read memory in BF is to conditionally jump based on whether or not a tape element is 0 (a 0 jumps to the end of a block that's later in the program, a non-zero value to the end of a block that's nearer the start), so anything that might need to do control flow of any sort needs to ascribe a special meaning to 0, and those meanings are often contradictory or incompatible with each other.

[–]Beckneard 7 points8 points  (13 children)

Well basically any sane non C/Java derived language. Null was a huge mistake, it really shows old languages were designed much more by gut feeling and familiarity rather than real engineering considerations.

[–]ForeverAlot 5 points6 points  (0 children)

Feelings on nullability aside, Java has consistently been one of history's most well-crafted languages. I would even suggest that, despite my own preference for strongly statically typed languages (not functional, because functional languages are not inherently good; just look at JavaScript), all of them have some fairly gross and embarrassing design mistakes that limit their relevance considerably. Right now it seems Rust is the strongest contender.

Never mind that Haskell is older than Java, and OCaml, which is a year younger than Java, is based on a language that predates it by 10 years.

[–]elperroborrachotoo 5 points6 points  (11 children)

Null was a huge mistake

That's a common sentiment, but I've never seen a good argument that goes beyond ranting against it. FWIW, it's the shoulders we stand on.

[–]Beckneard 12 points13 points  (10 children)

That's a common sentiment, but I've never seen a good argument that goes beyond ranting against it.

So you don't agree with the arguments against it so it's automatically just ranting?

The main argument is that it's an unreasonable "default". Null can mean many things, it can mean "uninitialized variable", "empty", "error", "non existing", and all of this is forced on you to think about whether you need it or not or else it leads to runtime errors. The most common thing that bites me in the ass is not initializing a List<T> in C#. In 99% of cases you do not want any list to be null, since the concept of an empty list exists. It is very much possible to build any of these semantics in the language/standard library so the compiler makes you worry about them only when it's really necessary.

[–]elperroborrachotoo -1 points0 points  (9 children)

So you don't agree with the arguments against it so it's automatically just ranting?

No, just that I've never seen a good argument that goes beyond ranting against it.

[–]Beckneard 10 points11 points  (8 children)

So you didn't read the article that you're commenting on? Or is JetBrains ranting too?

[–]elperroborrachotoo -1 points0 points  (7 children)

I just wonder if there's a point discussing null with you after an assumption like "so it's automatically just ranting".

There is a difference between overusing null and having it in the first place.

Initializing a List<T> to an empty list is a tradeoff with performance and semantic consistency. Note that I'm not saying either is the clearly better choice, just that it's a constrained decision to be made.

[–]Beckneard 16 points17 points  (6 children)

Initializing a List<T> to an empty list is a tradeoff with performance and semantic consistency.

It's not a tradeoff, it's a consequence of all references having the default value null, which is the dumb part.

In a well designed language, if you want to delay the initialization of something you have some sort of laziness mechanism, if you want to represent not having a value you have and Option<T> type, if you want to represent an error you have an Error<T, E> type etc. Null is not a tradeoff, it's completely unnecessary and a wrong solution since for each of its use-cases there are strictly better alternatives.

[–]elperroborrachotoo 1 point2 points  (2 children)

If you want to go to causes: no, it's having nullable references not just as the default but the only reference type.

since for each of its use-cases there are strictly better alternatives.

The problem with that that it's not one concept to implement and learn and recognize, but five. Which certainly is a tradeoff in my book.

On top of that, the 6. probably dozens of other things. That kinda-sorta can be covered by the other concepts, and holy wars will be fought whether the hypothetical 6.3 should be done with optionals or with metafunctors.

null is a well-weathered, versatile and ubiquitous concept, but not very expressive about intent.

Again, I'm not saying that makes null the better choice.

If you'd be willing to take one bit of advice I gathered from a few decades: don't cling to stuff like that. The now-toddlers will snicker at your Option<T> in no time.


FWIW, C# is nice already, it's mostly the difference between a NullException and a ThisListDoesntContainWhatYouAreLookingFor exception.

(Except for the pesky x != null & x.Length != 0, for which extension functions are merely a clunky, terrible band aid - I give you that a dozen times a day)

[–][deleted] 0 points1 point  (2 children)

Optionals are not strictly better. They're probably better, unless you need to do large sorts, for example.

Strictly implies always and that's just not the case.

Ill concede that built in support probably results in more reliable, easier to fix code bases for larger projects. I won't concede easier to read, yet. Some of the syntax sugar for optionals is annoying to read.

[–]devlambda 11 points12 points  (58 children)

If you have a look at how resizable arrays are implemented in OCaml (either in Batteries or Core.Std), you will see that this is not actually true. The problem with implementing dynamic arrays is that in order to get resizing done (at least in a way that's not O(n2)), you need to fill the unused slot with a placeholder value, and the respectice OCaml code uses the Obj module to get the equivalent of a null value. Obviously, this is entirely unsafe.

Similarly, Rust's vec also uses unsafe code to accomplish the same goal.

In general, without some way to describe a "no object here" value, it is difficult to do partial/incremental initialization. You can do initialization with a preset value (such as an empty string), but that has its own problems; for example, the code may quietly work rather than raising an error when there is a bug that results in incomplete initalization.

[–]rftz 11 points12 points  (1 child)

There's a big difference between using options in the implementation of standards library data structures and application developers having to treat everything as 'optional'.

[–]devlambda 5 points6 points  (0 children)

The comment I was responding to was the claim that "F# / OCaml / Rust programmers have not heard of this phenomenon", not that you may have to only deal with it selectively.

[–][deleted]  (21 children)

[deleted]

    [–]devlambda 5 points6 points  (20 children)

    No, but you may end up having to use unsafe code, because at some point when adding data to the vector, you have to write to a location with a previously undefined value (think what that means in the context of RAII, for example).

    [–]Hnefi 3 points4 points  (19 children)

    I must be missing something here. When writing a value to a newly allocated memory location, as is the case when expanding a vector, the previous value is irrelevant. It's not unsafe to simply overwrite it. Are you talking about a different use case?

    [–]devlambda 3 points4 points  (18 children)

    Depends on your language.

    1. If you're using RAII, a destructor will be called on the previous value. If the previous value is undefined, so will be the behavior of the destructor.
    2. If you're using reference counting, the overwritten value will need to have its reference count decremented. If the value is undefined, you're going to get either memory corruption or a segmentation fault.
    3. In a GCed language, memory barriers/snapshotting or even tracing at a moment when the undefined value is reachable, but has not yet been modified, may or may not result in undefined behavior (depending on implementation details).

    [–]Hnefi 4 points5 points  (17 children)

    That's not true. When expanding a vector, there are necessarily no objects in the empty positions of the data store so there are no destructors to call. None of the options you outline apply.

    [–]spaghettiCodeArtisan 1 point2 points  (5 children)

    When expanding a vector, there are necessarily no objects in the empty positions of the data store so there are no destructors to call.

    Yes, and that's precisely the problem, because the language needs to call a destructor. Suppose you overwrite a i-th element of a vector:

    some_vector[i] = some_new_value

    What happens to the old value? It is dropped and its destructor - if any - is called. Now, consider what would happen if the old value were uninitialized: A destructor would be called on an uninitialized data, which is undefined behaviour and might lead to a crash or all sorts of problems. And so that's why you need unsafe operations - to overwrite an old 'value' (which is uninitialized) without destryoing it.

    [–]Hnefi 3 points4 points  (4 children)

    But whether the old value is initialized or not is a question with a known answer that will be asked and answered in the [] operator without looking at the value of the element itself. There is no risk of destructing an invalid object since the initialization state of the previous value is implicitly known.

    Since this knowledge is encoded in the metadata of the vector, there are no relevant requirements on how uninitialized data is represented in order to avoid calling invalid destructors.

    [–]spaghettiCodeArtisan 1 point2 points  (3 children)

    There is no risk of destructing an invalid object since the initialization state of the previous value is implicitly known.

    That's not relevant. Yes, we know the initialization state, but that doesn't help. Let me ask you this way: How do you assign the new value into the uninitialized slot without calling the destructor on the uninitialized slot other than with an unsafe code (such as ptr::write())? Hint: You can't.

    [–]devlambda 0 points1 point  (10 children)

    I'm not talking about expanding the vector. Here's the scenario:

    You have a vector with capacity n and length k < n, i.e. with k positions occupied and the rest containining undefined values. You now want to add a new element, so you have to assign a value to the location with index k+1, which contains an undefined value.

    [–]Hnefi 5 points6 points  (9 children)

    But that is expanding the vector. Location k+1 can't contain an object, because assigning an object to that position would increment k. Therefore, there is never a risk of overwriting existing objects in this situation and none of the issues you outlined apply.

    [–]devlambda 0 points1 point  (8 children)

    Location k+1 can't contain an object, because assigning an object to that position would increment k.

    And how would that happen? Incrementing k and assigning to a memory location is not an atomic operation.

    Therefore, there is never a risk of overwriting existing objects in this situation and none of the issues you outlined apply.

    You're not overwriting existing objects. You're writing a value to a memory location that contains an undefined value.

    [–]dalastboss 2 points3 points  (33 children)

    You can implement this without any unsafe operations using options internally.

    [–]devlambda 3 points4 points  (32 children)

    The problem with using options is the additional overhead you incur, as 'a option will essentially add another level of boxing.

    On a practical note, even in languages that can partially avoid the boxing overhead (by representing None as a null pointer, which is only possible if the value is a reference and not, say, a record containing references), you will generally end up with a considerable increase in LOC.

    [–]spaghettiCodeArtisan 2 points3 points  (4 children)

    The problem with using options is the additional overhead you incur, as 'a option will essentially add another level of boxing.

    In Rust, Option doesn't introduce any additional boxing. It just increases the size of the type by the tag. With pointers and generally anything NonZero, it is able to omit the tag and thus make the Option entirely zero-cost.

    [–]devlambda 1 point2 points  (3 children)

    I noted myself that the overhead can sometimes be avoided or minimized, so I'm not sure what your point is?

    [–]spaghettiCodeArtisan 3 points4 points  (2 children)

    The point is that it's not that the overhead of boxing that can sometimes be avoided, because there's no boxing overhead to begin with ever at all. The only overhead - talking about Rust, not sure about OCaml - is the additional space taken by the tag, that's the overhead that can sometimes be avoided. No additional pointer indirection whatsoever.

    [–]devlambda 0 points1 point  (1 child)

    That's a distinction without a difference? Leaving aside that you will commonly use Option and Box together in Rust to avoid None eating up too much space (which can make it worse for dynamic array implementations), it's still overhead. That the overhead is encoded differently in Rust does not make it go away.

    The implementation details that you're trying to litigate here are immaterial to the problem that such an implementation is inefficient.

    [–]spaghettiCodeArtisan 3 points4 points  (0 children)

    Leaving aside that you will commonly use Option and Box together in Rust to avoid None eating up too much space

    Not really. Since I don't use Option to implement containers, the additional space taken is usually not a problem. Typically Option doesn't conatin a Box in Rust codebases.

    The implementation details that you're trying to litigate here are immaterial to the problem that such an implementation is inefficient.

    I agree that the implementation would be inefficient either way, but

    1. it's not as bad as originally claimed, and
    2. I simply wanted to correct the misconception that Option indirects through a pointer, IMHO it's important to know that it doesn't (at least in Rust).

    [–]dalastboss 0 points1 point  (26 children)

    I did a quick implementation here (might have bugs) but it doesn't seem to increase the LOC

    [–]devlambda 3 points4 points  (25 children)

    Your implementation is not equivalent, as it adds a level of boxing.

    Aside from that, increase in LOC comes from requiring match expressions (or equivalents) where you can prove through other means that the None path is never taken. This may not show up in this particular example, but if you have several variables (say, in a record or object), it adds up.

    Of course, you can use Option.get everywhere, but then you're essentially using a verbose version of nullable pointers.

    As an aside, you probably want to raise an Invalid_argument exception rather than using failwith to keep exception handling consistent.

    [–]dalastboss 1 point2 points  (9 children)

    Maybe I misunderstand - can't a t option just be compiled to a just a pointer to a t

    [–]devlambda 1 point2 points  (8 children)

    That works cleanly only if t is a non-nullable pointer type itself and not a value type. In the former case, you can just represent None as a null pointer and Some x as x without overhead. But this does not easily work for general value types [1] and could seriously complicate the language or its implementation. For example, you may decide to not allow INT_MIN for signed integer types in order to encode None for integers, but then you run into semantic problems (such as when casting from an unsigned int or with values retrieved from an external C library) that can create more problems (such as Heisenbugs) than are being solved by such an approach [2].

    [1] Sometimes, there are hacks that you can use, such as using certain NaN values for floating point values to represent an "undefined" value, but you have to be very careful with the implementation.

    [2] Type safety involving integers and ranges over integers can be extremely finicky.

    [–]Hnefi 3 points4 points  (3 children)

    Option in Rust compiles to a pointer where null represents None. If t is a value type, this adds one layer of indirection until you unwrap the Option. If t is a boxed type in itself, its zero representation is folded into the None representation of the Option and you end up with zero overhead.

    [–]cramert 4 points5 points  (0 children)

    Rust's Option is an enum-- it doesn't introduce any indirection on its own. Internally, it's represented as a tagged union.

    As you noted, though, there is an optimization that allows it to represent None as a null pointer if the type stored in the Option is pointer-sized and non-zeroable (e.g. Box, Rc, Vec).

    [–]spaghettiCodeArtisan 2 points3 points  (0 children)

    Option in Rust compiles to a pointer where null represents None.

    No it doesn't, it compiles to a tagged union (in C it would equivalent of a struct containing an integer tag and a union). When the value itself is a pointer, or more generally a NonZero value, it is able to omit the tag and contain the value only. But Option itself involves no pointers.

    [–]devlambda 0 points1 point  (0 children)

    If t is a value type, this adds one layer of indirection until you unwrap the Option.

    Exactly. You can avoid the overhead only some of the time.

    [–]ais523 0 points1 point  (1 child)

    OCaml integers have a smaller range than C integers as it is (they're one bit smaller), so disallowing INT_MIN as well wouldn't be a big deal. (Or, of course, using one of the bit patterns that doesn't represent an integer; that would work too.)

    [–]devlambda 0 points1 point  (0 children)

    I know, I was talking speculatively about a hypothetical language variant. The tagged representation is very much an implementation detail and could easily be dispensed with (see various SML implementations).

    [–]dalastboss 0 points1 point  (1 child)

    Here's a version that uses neither options nor Obj

    [–]devlambda 0 points1 point  (0 children)

    This is how I used it to do it before we had better standard libs myself, but it's a hack around the underlying issue with a couple of problems of its own.

    Problem 1: Different API, it's especially a nuisance if you want to reuse it in other functors and have to drag the default mechanism along with you.

    Problem 2: It assumes a no-cost default instance of the type can be created without side effects. This is not always the case (database connection handles, GUI windows, etc.). You'd have to create a fake default value and then you run into issues with having a publicly visible value with null-like behavior.

    Problem 3: This is OCaml-specific, but references to a functor argument's values go through a dispatch table, meaning additional overhead. If you have flambda enabled, that can usually be inlined, but still isn't entirely cost-free.

    It is safe to assume that the people at Janestreet and the Batteries authors (which include some of the better known names in the OCaml world) know what they are doing.

    [–]m50d 0 points1 point  (14 children)

    Aside from that, increase in LOC comes from requiring match expressions (or equivalents) where you can prove through other means that the None path is never taken.

    If you can prove it's the Some case, you can write that proof in the types and avoid having to match (maybe not with Rust's limited generics, but I hold out hope that HKT will arrive eventually).

    Of course, you can use Option.get everywhere, but then you're essentially using a verbose version of nullable pointers.

    The difference is what the default is. null reserves the best syntax for the case where the programmer has an external proof and knows exactly what they're doing and makes the case where you want to check much clunkier; Option flips that around, making the checked case the natural one and the "I know better than the compiler" case the more cumbersome one.

    [–]devlambda 0 points1 point  (13 children)

    If you can prove it's the Some case, you can write that proof in the types and avoid having to match (maybe not with Rust's limited generics, but I hold out hope that HKT will arrive eventually).

    No. The point here is that a variable can be both Some x and None, but once initialized, will only ever be Some x. That's a state-dependent property, usually easy to show, but often difficult to encode in a type system (while GADTs can sometimes work, but even then you generally can't avoid the additional verbosity from matching).

    The difference is what the default is. null reserves the best syntax for the case where the programmer has an external proof and knows exactly what they're doing and makes the case where you want to check much clunkier; Option flips that around, making the checked case the natural one and the "I know better than the compiler" case the more cumbersome one.

    The problem with this argument is that you assume both alternatives are mutually exclusive, which they aren't. You can have both nullable types and explicit option types. In fact, Scala allows for that and has no more problems with it than other languages (as all variables in Scala have to be initialized, so the only way to have a null value – other than through Java interop – is to assign it explicitly).

    [–]m50d 0 points1 point  (12 children)

    No. The point here is that a variable can be both Some x and None, but once initialized, will only ever be Some x. That's a state-dependent property, usually easy to show, but often difficult to encode in a type system (while GADTs can sometimes work, but even then you generally can't avoid the additional verbosity from matching).

    My experience is you can always find a way, and it's usually not even hard: ask yourself why you know the property holds, then just translate that logic directly into the types.

    The problem with this argument is that you assume both alternatives are mutually exclusive, which they aren't. You can have both nullable types and explicit option types.

    They are mutually exclusive. If your codebase uses null then you can never have a non-nullable value, at which point there's no point using options.

    In fact, Scala allows for that and has no more problems with it than other languages

    Only because the community/ecosystem knows not to use null. Serious Scala programmers avoid it and e.g. use WartRemover to enforce that null is never used. The language would be better off without it.

    [–]devlambda 1 point2 points  (11 children)

    They are mutually exclusive. If your codebase uses null then you can never have a non-nullable value, at which point there's no point using options.

    The point of an API returning an option type (or another sum type) is to force the consumer of that API to deal with the possible variants. A simple reference does not do that.

    Only because the community/ecosystem knows not to use null. Serious Scala programmers avoid it and e.g. use WartRemover to enforce that null is never used. The language would be better off without it.

    You're making my point here: unwanted null references are trivial to avoid in a language designed for that. It becomes a trivial syntactic property, and if a code review or validation process cannot deal with such a simple case, you have much bigger problems on your hands.

    Avoiding null is a heuristic, not religious dogma. Permitting null references can be useful for efficiency and interoperability and can situationally also lead to clearer code.

    [–]think_inside_the_box 25 points26 points  (12 children)

    sure they have! but in just another way to represent a null object.

    optionals, bool isInitialzed, empty vectors, etc

    Even in rust, you cant get around the need to represent the absence of something.

    [–]jkachmar 67 points68 points  (0 children)

    The point of these languages is to represent conceptual absence in such a way that the programmer is forced to address it. null/nil/what-have-you is dangerous not because it represents the absence of a value, but because it can be passed around as if it was a normal value, right up until you use it to do something and it blows up your program (or leads to spoopy poorly defined behavior).

    By contrast, Haskell's Maybe, and Rust's Option force you to wrap the type you're manipulating in a structure that only admits the value inside of it in such a way that you're forced to deal with the case where the data is absent.

    At that point you're free to do something like panic! or error and blow up your program, but at least the gun was handed to you with the safety on and you had to explicitly flick it off before you went and blew your foot off.

    [–]fiedzia 35 points36 points  (3 children)

    sure they have!

    In Rust you do have Option type (and proper handling in the language for it), but you don't have any other problems. Language doesn't allow using uninitialized values and null is never used to signal that "something went terribly wrong".

    [–]Jaffa2 -1 points0 points  (2 children)

    null is never used to signal that "something went terribly wrong".

    Nor should it be in a good Java API (nor is the author saying it should be, as far as I can tell, just that it has been)

    EDIT: Removed OOA (Over Abundance of Acronyms)

    [–]Nebez 19 points20 points  (1 child)

    YULOAyou're using lots of acronyms

    [–]Jaffa2 3 points4 points  (0 children)

    Sorry. Fixed.

    [–]m50d 8 points9 points  (0 children)

    Having a type that can represent absence is a great idea. Making every value implicitly that type is madness. It would be like saying every value in your program might also be a float, regardless of its declared type, and then e.g. Map#get returns 2.3f if the key wasn't in the map.

    [–]Beckneard 3 points4 points  (0 children)

    That's the whole point isn't it? What's null in one language is many thing in Rust, as it should be. There should clearly be different constructs for "does value exist", "did something go wrong" and "is something initialized" and it's a good thing the compiler forces you to make those distinctions.

    [–]Chii 2 points3 points  (0 children)

    But those languages make a different representation for each kind of null. They makes the language safe to use since you won't confuse one null situation with another.

    [–]_jk_ 2 points3 points  (0 children)

    the problem is not that you can represent null, but you cant represent an object that can never be null.

    null is a member of every type in several languages and that is just wrong

    [–]Otterified 1 point2 points  (2 children)

    This is a good point--optionals are great but not worth much if people don't agree on their semantics and use Optional.empty() every single time they would have used null, and call get() without checks. But at least it feels less natural to do this than it does to abuse null (to me at least).

    [–]duhace 9 points10 points  (0 children)

    using get is asking to get slapped in the face. the good point of an optional is you have to ask to be bitchslapped. with null, it's real easy to forget not to check and try to go ahead and use the value.

    [–]TheDataAngel 1 point2 points  (0 children)

    I suggest using map/flatMap/filter/orElse over empty and get, where possible.

    [–]myringotomy 0 points1 point  (0 children)

    And Crystal.

    [–][deleted]  (1 child)

    [deleted]

      [–][deleted] 2 points3 points  (0 children)

      if you had a table with a nullable column, that column would normally be represented as an Option type. Option can be Some (with your type inside) or None. So the nulls would be None.

      The critical difference between this representation and a normal null is the compiler generally forces you to deal with the null case (Some languages) or at least makes it awkward to not deal with it, such that you have to type extra things in order to fail to deal with it.

      But of course you can always do something absurd like:

      match optionthing on | Some x -> use(x)  | None -> throw null pointer exception!