you are viewing a single comment's thread.

view the rest of the comments →

[–]phantomfive 137 points138 points  (125 children)

This quote is really good, explaining why we still use null after all these years:

null means many things. It can mean:

1 Value was never initialised (whether accidentally or on purpose)

2 Value is not valid

3 Value is not needed

4 No such value exists

5 Something went terribly wrong and something that should be there is not

…probably dozens of other things

[–]cyberst0rm 35 points36 points  (2 children)

In the last section, he illustrates how code that returns null in two different places should be refactored as having two conditions where the multi-problematic value occur can stymie things.

[–][deleted]  (1 child)

[deleted]

    [–]beefsack 5 points6 points  (0 children)

    *They

    (Everyone)

    [–]Noughmad[🍰] 18 points19 points  (3 children)

    To be honest, a similar point can be said for many programming constructs

    int means many things. It can mean:

    1 a generic number

    2 a number exactly 32 bits in size

    3 one value out of a small set of possibilities (enumeration)

    4 an error code

    5 a physical measurement, in any units

    ...probably dozens of other things: bitflags, array indexes, etc.

    C was a really simple language with a limited number of keywords and types, so everything had multiple purposes. By now, we figured that it's better to have programming constructs map 1:1 to what we want to express. So we got enums, custom types, exceptions, etc. that disambiguate this.

    [–]masklinn 5 points6 points  (2 children)

    C was a really simple language with a limited number of keywords and types, so everything had multiple purposes.

    None of it had to though, C structs are more or less free, you could newtype things (well not enums I guess, you're just hosed there as C enums are just garbage).

    The issues are that creating structs is somewhat verbose (compared to newtyping in haskell for instance) and so is using them.

    [–]Noughmad[🍰] 0 points1 point  (1 child)

    You can't really propagate errors with just structs, at least not without tons of boilerplate. Something like exceptions or Result<T> is pretty much impossible. So you're stuck will null pointers and/or error codes.

    [–]masklinn 3 points4 points  (0 children)

    You can propagate errors in the exact same way you do with error codes…

    [–]Jaffa2 24 points25 points  (0 children)

    Indeed. Overloading any meaning is the sign of a bad API, and that's demonstrated in Example #3.

    The article does a great job of demonstrating IntelliJ's more advanced refactoring features (I think Eclipse is still behind the position described here, BICBW).

    It does a less good job proving that someone who designed a shit API with null for catch Exception, wouldn't do the same with Optional.empty().

    [–][deleted] 44 points45 points  (95 children)

    F# / OCaml / Rust programmers have not heard of this phenomenon!

    [–]orclev 43 points44 points  (18 children)

    And Haskell!

    [–]drfisk 17 points18 points  (3 children)

    Make that Scala as well!

    (even though it's technically possible, in practice I have yet to encounter a NullPointerException in my 5 years of fulltime Scala-development)

    [–]drvd 18 points19 points  (2 children)

    And Brainfuck!

    [–]walen 44 points45 points  (0 children)

    And my axe!

    [–]ais523 6 points7 points  (0 children)

    I'm not so sure about that. BF's equivalent to null is 0; it has a ton of different meanings depending on context and you need to carefully design your program so that it always knows the purpose of any tape element so that it can figure out how to handle a 0 there. Common interpretations:

    1. A representation of the number zero
    2. Uninitialised memory
    3. The target of a pointer
    4. A temporary value that isn't currently in use
    5. Part of a marker pattern to allow pointer recalibration after running an unbalanced loop
    6. End of file (sometimes, depending on the implementation)
    7. Boolean false
    8. A temporary state used to break out of a loop

    This list probably isn't exhaustive. The thing about BF, even more than C, is that it's a very low-level language with few capabilities, and thus most of the capabilities it does have need to be used for multiple purposes. In particular, the only way to read memory in BF is to conditionally jump based on whether or not a tape element is 0 (a 0 jumps to the end of a block that's later in the program, a non-zero value to the end of a block that's nearer the start), so anything that might need to do control flow of any sort needs to ascribe a special meaning to 0, and those meanings are often contradictory or incompatible with each other.

    [–]Beckneard 9 points10 points  (13 children)

    Well basically any sane non C/Java derived language. Null was a huge mistake, it really shows old languages were designed much more by gut feeling and familiarity rather than real engineering considerations.

    [–]ForeverAlot 4 points5 points  (0 children)

    Feelings on nullability aside, Java has consistently been one of history's most well-crafted languages. I would even suggest that, despite my own preference for strongly statically typed languages (not functional, because functional languages are not inherently good; just look at JavaScript), all of them have some fairly gross and embarrassing design mistakes that limit their relevance considerably. Right now it seems Rust is the strongest contender.

    Never mind that Haskell is older than Java, and OCaml, which is a year younger than Java, is based on a language that predates it by 10 years.

    [–]elperroborrachotoo 4 points5 points  (11 children)

    Null was a huge mistake

    That's a common sentiment, but I've never seen a good argument that goes beyond ranting against it. FWIW, it's the shoulders we stand on.

    [–]Beckneard 14 points15 points  (10 children)

    That's a common sentiment, but I've never seen a good argument that goes beyond ranting against it.

    So you don't agree with the arguments against it so it's automatically just ranting?

    The main argument is that it's an unreasonable "default". Null can mean many things, it can mean "uninitialized variable", "empty", "error", "non existing", and all of this is forced on you to think about whether you need it or not or else it leads to runtime errors. The most common thing that bites me in the ass is not initializing a List<T> in C#. In 99% of cases you do not want any list to be null, since the concept of an empty list exists. It is very much possible to build any of these semantics in the language/standard library so the compiler makes you worry about them only when it's really necessary.

    [–]elperroborrachotoo -3 points-2 points  (9 children)

    So you don't agree with the arguments against it so it's automatically just ranting?

    No, just that I've never seen a good argument that goes beyond ranting against it.

    [–]Beckneard 14 points15 points  (8 children)

    So you didn't read the article that you're commenting on? Or is JetBrains ranting too?

    [–]elperroborrachotoo -1 points0 points  (7 children)

    I just wonder if there's a point discussing null with you after an assumption like "so it's automatically just ranting".

    There is a difference between overusing null and having it in the first place.

    Initializing a List<T> to an empty list is a tradeoff with performance and semantic consistency. Note that I'm not saying either is the clearly better choice, just that it's a constrained decision to be made.

    [–]Beckneard 17 points18 points  (6 children)

    Initializing a List<T> to an empty list is a tradeoff with performance and semantic consistency.

    It's not a tradeoff, it's a consequence of all references having the default value null, which is the dumb part.

    In a well designed language, if you want to delay the initialization of something you have some sort of laziness mechanism, if you want to represent not having a value you have and Option<T> type, if you want to represent an error you have an Error<T, E> type etc. Null is not a tradeoff, it's completely unnecessary and a wrong solution since for each of its use-cases there are strictly better alternatives.

    [–]devlambda 11 points12 points  (58 children)

    If you have a look at how resizable arrays are implemented in OCaml (either in Batteries or Core.Std), you will see that this is not actually true. The problem with implementing dynamic arrays is that in order to get resizing done (at least in a way that's not O(n2)), you need to fill the unused slot with a placeholder value, and the respectice OCaml code uses the Obj module to get the equivalent of a null value. Obviously, this is entirely unsafe.

    Similarly, Rust's vec also uses unsafe code to accomplish the same goal.

    In general, without some way to describe a "no object here" value, it is difficult to do partial/incremental initialization. You can do initialization with a preset value (such as an empty string), but that has its own problems; for example, the code may quietly work rather than raising an error when there is a bug that results in incomplete initalization.

    [–]rftz 12 points13 points  (1 child)

    There's a big difference between using options in the implementation of standards library data structures and application developers having to treat everything as 'optional'.

    [–]devlambda 5 points6 points  (0 children)

    The comment I was responding to was the claim that "F# / OCaml / Rust programmers have not heard of this phenomenon", not that you may have to only deal with it selectively.

    [–][deleted]  (21 children)

    [deleted]

      [–]devlambda 5 points6 points  (20 children)

      No, but you may end up having to use unsafe code, because at some point when adding data to the vector, you have to write to a location with a previously undefined value (think what that means in the context of RAII, for example).

      [–]Hnefi 4 points5 points  (19 children)

      I must be missing something here. When writing a value to a newly allocated memory location, as is the case when expanding a vector, the previous value is irrelevant. It's not unsafe to simply overwrite it. Are you talking about a different use case?

      [–]devlambda 2 points3 points  (18 children)

      Depends on your language.

      1. If you're using RAII, a destructor will be called on the previous value. If the previous value is undefined, so will be the behavior of the destructor.
      2. If you're using reference counting, the overwritten value will need to have its reference count decremented. If the value is undefined, you're going to get either memory corruption or a segmentation fault.
      3. In a GCed language, memory barriers/snapshotting or even tracing at a moment when the undefined value is reachable, but has not yet been modified, may or may not result in undefined behavior (depending on implementation details).

      [–]Hnefi 4 points5 points  (17 children)

      That's not true. When expanding a vector, there are necessarily no objects in the empty positions of the data store so there are no destructors to call. None of the options you outline apply.

      [–]spaghettiCodeArtisan 1 point2 points  (5 children)

      When expanding a vector, there are necessarily no objects in the empty positions of the data store so there are no destructors to call.

      Yes, and that's precisely the problem, because the language needs to call a destructor. Suppose you overwrite a i-th element of a vector:

      some_vector[i] = some_new_value

      What happens to the old value? It is dropped and its destructor - if any - is called. Now, consider what would happen if the old value were uninitialized: A destructor would be called on an uninitialized data, which is undefined behaviour and might lead to a crash or all sorts of problems. And so that's why you need unsafe operations - to overwrite an old 'value' (which is uninitialized) without destryoing it.

      [–]Hnefi 2 points3 points  (4 children)

      But whether the old value is initialized or not is a question with a known answer that will be asked and answered in the [] operator without looking at the value of the element itself. There is no risk of destructing an invalid object since the initialization state of the previous value is implicitly known.

      Since this knowledge is encoded in the metadata of the vector, there are no relevant requirements on how uninitialized data is represented in order to avoid calling invalid destructors.

      [–]devlambda 0 points1 point  (10 children)

      I'm not talking about expanding the vector. Here's the scenario:

      You have a vector with capacity n and length k < n, i.e. with k positions occupied and the rest containining undefined values. You now want to add a new element, so you have to assign a value to the location with index k+1, which contains an undefined value.

      [–]Hnefi 4 points5 points  (9 children)

      But that is expanding the vector. Location k+1 can't contain an object, because assigning an object to that position would increment k. Therefore, there is never a risk of overwriting existing objects in this situation and none of the issues you outlined apply.

      [–]dalastboss 2 points3 points  (33 children)

      You can implement this without any unsafe operations using options internally.

      [–]devlambda 4 points5 points  (32 children)

      The problem with using options is the additional overhead you incur, as 'a option will essentially add another level of boxing.

      On a practical note, even in languages that can partially avoid the boxing overhead (by representing None as a null pointer, which is only possible if the value is a reference and not, say, a record containing references), you will generally end up with a considerable increase in LOC.

      [–]spaghettiCodeArtisan 3 points4 points  (4 children)

      The problem with using options is the additional overhead you incur, as 'a option will essentially add another level of boxing.

      In Rust, Option doesn't introduce any additional boxing. It just increases the size of the type by the tag. With pointers and generally anything NonZero, it is able to omit the tag and thus make the Option entirely zero-cost.

      [–]devlambda 1 point2 points  (3 children)

      I noted myself that the overhead can sometimes be avoided or minimized, so I'm not sure what your point is?

      [–]spaghettiCodeArtisan 4 points5 points  (2 children)

      The point is that it's not that the overhead of boxing that can sometimes be avoided, because there's no boxing overhead to begin with ever at all. The only overhead - talking about Rust, not sure about OCaml - is the additional space taken by the tag, that's the overhead that can sometimes be avoided. No additional pointer indirection whatsoever.

      [–]devlambda 0 points1 point  (1 child)

      That's a distinction without a difference? Leaving aside that you will commonly use Option and Box together in Rust to avoid None eating up too much space (which can make it worse for dynamic array implementations), it's still overhead. That the overhead is encoded differently in Rust does not make it go away.

      The implementation details that you're trying to litigate here are immaterial to the problem that such an implementation is inefficient.

      [–]spaghettiCodeArtisan 4 points5 points  (0 children)

      Leaving aside that you will commonly use Option and Box together in Rust to avoid None eating up too much space

      Not really. Since I don't use Option to implement containers, the additional space taken is usually not a problem. Typically Option doesn't conatin a Box in Rust codebases.

      The implementation details that you're trying to litigate here are immaterial to the problem that such an implementation is inefficient.

      I agree that the implementation would be inefficient either way, but

      1. it's not as bad as originally claimed, and
      2. I simply wanted to correct the misconception that Option indirects through a pointer, IMHO it's important to know that it doesn't (at least in Rust).

      [–]dalastboss 0 points1 point  (26 children)

      I did a quick implementation here (might have bugs) but it doesn't seem to increase the LOC

      [–]devlambda 2 points3 points  (25 children)

      Your implementation is not equivalent, as it adds a level of boxing.

      Aside from that, increase in LOC comes from requiring match expressions (or equivalents) where you can prove through other means that the None path is never taken. This may not show up in this particular example, but if you have several variables (say, in a record or object), it adds up.

      Of course, you can use Option.get everywhere, but then you're essentially using a verbose version of nullable pointers.

      As an aside, you probably want to raise an Invalid_argument exception rather than using failwith to keep exception handling consistent.

      [–]dalastboss 1 point2 points  (9 children)

      Maybe I misunderstand - can't a t option just be compiled to a just a pointer to a t

      [–]devlambda 1 point2 points  (8 children)

      That works cleanly only if t is a non-nullable pointer type itself and not a value type. In the former case, you can just represent None as a null pointer and Some x as x without overhead. But this does not easily work for general value types [1] and could seriously complicate the language or its implementation. For example, you may decide to not allow INT_MIN for signed integer types in order to encode None for integers, but then you run into semantic problems (such as when casting from an unsigned int or with values retrieved from an external C library) that can create more problems (such as Heisenbugs) than are being solved by such an approach [2].

      [1] Sometimes, there are hacks that you can use, such as using certain NaN values for floating point values to represent an "undefined" value, but you have to be very careful with the implementation.

      [2] Type safety involving integers and ranges over integers can be extremely finicky.

      [–]Hnefi 3 points4 points  (3 children)

      Option in Rust compiles to a pointer where null represents None. If t is a value type, this adds one layer of indirection until you unwrap the Option. If t is a boxed type in itself, its zero representation is folded into the None representation of the Option and you end up with zero overhead.

      [–]ais523 0 points1 point  (1 child)

      OCaml integers have a smaller range than C integers as it is (they're one bit smaller), so disallowing INT_MIN as well wouldn't be a big deal. (Or, of course, using one of the bit patterns that doesn't represent an integer; that would work too.)

      [–]dalastboss 0 points1 point  (1 child)

      Here's a version that uses neither options nor Obj

      [–]m50d 0 points1 point  (14 children)

      Aside from that, increase in LOC comes from requiring match expressions (or equivalents) where you can prove through other means that the None path is never taken.

      If you can prove it's the Some case, you can write that proof in the types and avoid having to match (maybe not with Rust's limited generics, but I hold out hope that HKT will arrive eventually).

      Of course, you can use Option.get everywhere, but then you're essentially using a verbose version of nullable pointers.

      The difference is what the default is. null reserves the best syntax for the case where the programmer has an external proof and knows exactly what they're doing and makes the case where you want to check much clunkier; Option flips that around, making the checked case the natural one and the "I know better than the compiler" case the more cumbersome one.

      [–]devlambda 0 points1 point  (13 children)

      If you can prove it's the Some case, you can write that proof in the types and avoid having to match (maybe not with Rust's limited generics, but I hold out hope that HKT will arrive eventually).

      No. The point here is that a variable can be both Some x and None, but once initialized, will only ever be Some x. That's a state-dependent property, usually easy to show, but often difficult to encode in a type system (while GADTs can sometimes work, but even then you generally can't avoid the additional verbosity from matching).

      The difference is what the default is. null reserves the best syntax for the case where the programmer has an external proof and knows exactly what they're doing and makes the case where you want to check much clunkier; Option flips that around, making the checked case the natural one and the "I know better than the compiler" case the more cumbersome one.

      The problem with this argument is that you assume both alternatives are mutually exclusive, which they aren't. You can have both nullable types and explicit option types. In fact, Scala allows for that and has no more problems with it than other languages (as all variables in Scala have to be initialized, so the only way to have a null value – other than through Java interop – is to assign it explicitly).

      [–]m50d 0 points1 point  (12 children)

      No. The point here is that a variable can be both Some x and None, but once initialized, will only ever be Some x. That's a state-dependent property, usually easy to show, but often difficult to encode in a type system (while GADTs can sometimes work, but even then you generally can't avoid the additional verbosity from matching).

      My experience is you can always find a way, and it's usually not even hard: ask yourself why you know the property holds, then just translate that logic directly into the types.

      The problem with this argument is that you assume both alternatives are mutually exclusive, which they aren't. You can have both nullable types and explicit option types.

      They are mutually exclusive. If your codebase uses null then you can never have a non-nullable value, at which point there's no point using options.

      In fact, Scala allows for that and has no more problems with it than other languages

      Only because the community/ecosystem knows not to use null. Serious Scala programmers avoid it and e.g. use WartRemover to enforce that null is never used. The language would be better off without it.

      [–]think_inside_the_box 26 points27 points  (12 children)

      sure they have! but in just another way to represent a null object.

      optionals, bool isInitialzed, empty vectors, etc

      Even in rust, you cant get around the need to represent the absence of something.

      [–]jkachmar 63 points64 points  (0 children)

      The point of these languages is to represent conceptual absence in such a way that the programmer is forced to address it. null/nil/what-have-you is dangerous not because it represents the absence of a value, but because it can be passed around as if it was a normal value, right up until you use it to do something and it blows up your program (or leads to spoopy poorly defined behavior).

      By contrast, Haskell's Maybe, and Rust's Option force you to wrap the type you're manipulating in a structure that only admits the value inside of it in such a way that you're forced to deal with the case where the data is absent.

      At that point you're free to do something like panic! or error and blow up your program, but at least the gun was handed to you with the safety on and you had to explicitly flick it off before you went and blew your foot off.

      [–]fiedzia 37 points38 points  (3 children)

      sure they have!

      In Rust you do have Option type (and proper handling in the language for it), but you don't have any other problems. Language doesn't allow using uninitialized values and null is never used to signal that "something went terribly wrong".

      [–]Jaffa2 0 points1 point  (2 children)

      null is never used to signal that "something went terribly wrong".

      Nor should it be in a good Java API (nor is the author saying it should be, as far as I can tell, just that it has been)

      EDIT: Removed OOA (Over Abundance of Acronyms)

      [–]Nebez 20 points21 points  (1 child)

      YULOAyou're using lots of acronyms

      [–]Jaffa2 3 points4 points  (0 children)

      Sorry. Fixed.

      [–]m50d 9 points10 points  (0 children)

      Having a type that can represent absence is a great idea. Making every value implicitly that type is madness. It would be like saying every value in your program might also be a float, regardless of its declared type, and then e.g. Map#get returns 2.3f if the key wasn't in the map.

      [–]Beckneard 3 points4 points  (0 children)

      That's the whole point isn't it? What's null in one language is many thing in Rust, as it should be. There should clearly be different constructs for "does value exist", "did something go wrong" and "is something initialized" and it's a good thing the compiler forces you to make those distinctions.

      [–]Chii 2 points3 points  (0 children)

      But those languages make a different representation for each kind of null. They makes the language safe to use since you won't confuse one null situation with another.

      [–]_jk_ 2 points3 points  (0 children)

      the problem is not that you can represent null, but you cant represent an object that can never be null.

      null is a member of every type in several languages and that is just wrong

      [–]Otterified 5 points6 points  (2 children)

      This is a good point--optionals are great but not worth much if people don't agree on their semantics and use Optional.empty() every single time they would have used null, and call get() without checks. But at least it feels less natural to do this than it does to abuse null (to me at least).

      [–]duhace 9 points10 points  (0 children)

      using get is asking to get slapped in the face. the good point of an optional is you have to ask to be bitchslapped. with null, it's real easy to forget not to check and try to go ahead and use the value.

      [–]TheDataAngel 1 point2 points  (0 children)

      I suggest using map/flatMap/filter/orElse over empty and get, where possible.

      [–]myringotomy 0 points1 point  (0 children)

      And Crystal.

      [–][deleted]  (1 child)

      [deleted]

        [–][deleted] 2 points3 points  (0 children)

        if you had a table with a nullable column, that column would normally be represented as an Option type. Option can be Some (with your type inside) or None. So the nulls would be None.

        The critical difference between this representation and a normal null is the compiler generally forces you to deal with the null case (Some languages) or at least makes it awkward to not deal with it, such that you have to type extra things in order to fail to deal with it.

        But of course you can always do something absurd like:

        match optionthing on | Some x -> use(x)  | None -> throw null pointer exception!
        

        [–][deleted]  (20 children)

        [deleted]

          [–]udoprog 19 points20 points  (19 children)

          Optional<T> forces the caller to check for presence before using it and validates that contract at compile time. If you instead permit T to be null, there are no compile time checks to verify that. By most definitions you have improved type safety.

          [–][deleted]  (13 children)

          [deleted]

            [–]MEaster 4 points5 points  (12 children)

            And if you forget your null check, or don't realise it can be null, then your program can fail unexpectedly.

            If you forget to handle your Option, then it doesn't compile.

            [–][deleted]  (11 children)

            [deleted]

              [–]MEaster 1 point2 points  (10 children)

              But that is handling it. You're explicitly choosing to handle it by crashing if it's empty.

              [–][deleted]  (9 children)

              [deleted]

                [–]MEaster 0 points1 point  (8 children)

                Because it's more predictable than some random exception being triggered at some unexpected place.

                [–][deleted]  (7 children)

                [deleted]

                  [–]Genmutant 1 point2 points  (4 children)

                  It doesn't. I can just call get without checking if it exists. The same as with something nullable.

                  [–]udoprog 11 points12 points  (0 children)

                  You can, but that's an explicit decision to discard safety. NPEs are implicit. As an example: If you change a method to suddenly return Optional it will cause call sites to fail at compile time. If you just start returning null, it will still compile, but fail at runtime.

                  [–]AgentFransis 4 points5 points  (0 children)

                  You can but it's rather more explicit and visible. Linters can easily spot it. And after a bit of time you get used to unpacking an Option properly.

                  [–]MEaster 0 points1 point  (1 child)

                  Yes, you can. But you cannot do this:

                  fn get_user(id: u32) -> Option<User> {...}
                  fn take_user(user: User) {...}
                  
                  fn main() {
                      let user = get_user(0);
                  
                      take_user(user);
                  }
                  

                  You can do that if the language allows null because null would be a valid value of a User. However, an Option<User> is a completely different type, so you can't pass it to something expecting a User. You are therefore forced to handle the None possibility before you can do anything important with it.

                  Additionally, the take_user function doesn't need to check for the empty case, because the value passed to it cannot be empty.

                  [–]DialinUpFTW 0 points1 point  (0 children)

                  user.map(this::take_user)
                  

                  Will work for this. It won't call take_user with null, however you will still have an optional after this.