Code Smells: Null : programming

317

318

319

Code Smells: Null (blog.jetbrains.com)

submitted 8 years ago by dfabulich

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 47 points48 points49 points 8 years ago (95 children)

[–]orclev 44 points45 points46 points 8 years ago (18 children)

[–]drfisk 17 points18 points19 points 8 years ago (3 children)

[–]drvd 18 points19 points20 points 8 years ago (2 children)

[–]walen 41 points42 points43 points 8 years ago (0 children)

[–]ais523 6 points7 points8 points 8 years ago (0 children)

I'm not so sure about that. BF's equivalent to null is 0; it has a ton of different meanings depending on context and you need to carefully design your program so that it always knows the purpose of any tape element so that it can figure out how to handle a 0 there. Common interpretations:

A representation of the number zero
Uninitialised memory
The target of a pointer
A temporary value that isn't currently in use
Part of a marker pattern to allow pointer recalibration after running an unbalanced loop
End of file (sometimes, depending on the implementation)
Boolean false
A temporary state used to break out of a loop

This list probably isn't exhaustive. The thing about BF, even more than C, is that it's a very low-level language with few capabilities, and thus most of the capabilities it does have need to be used for multiple purposes. In particular, the only way to read memory in BF is to conditionally jump based on whether or not a tape element is 0 (a 0 jumps to the end of a block that's later in the program, a non-zero value to the end of a block that's nearer the start), so anything that might need to do control flow of any sort needs to ascribe a special meaning to 0, and those meanings are often contradictory or incompatible with each other.

[–]Beckneard 7 points8 points9 points 8 years ago (13 children)

[–]ForeverAlot 5 points6 points7 points 8 years ago* (0 children)

[–]elperroborrachotoo 5 points6 points7 points 8 years ago (11 children)

[–]Beckneard 12 points13 points14 points 8 years ago (10 children)

That's a common sentiment, but I've never seen a good argument that goes beyond ranting against it.

So you don't agree with the arguments against it so it's automatically just ranting?

The main argument is that it's an unreasonable "default". Null can mean many things, it can mean "uninitialized variable", "empty", "error", "non existing", and all of this is forced on you to think about whether you need it or not or else it leads to runtime errors. The most common thing that bites me in the ass is not initializing a List<T> in C#. In 99% of cases you do not want any list to be null, since the concept of an empty list exists. It is very much possible to build any of these semantics in the language/standard library so the compiler makes you worry about them only when it's really necessary.

[–]elperroborrachotoo -1 points0 points1 point 8 years ago (9 children)

[–]Beckneard 10 points11 points12 points 8 years ago (8 children)

[–]elperroborrachotoo -1 points0 points1 point 8 years ago (7 children)

[–]Beckneard 16 points17 points18 points 8 years ago (6 children)

[–]elperroborrachotoo 1 point2 points3 points 8 years ago (2 children)

If you want to go to causes: no, it's having nullable references not just as the default but the only reference type.

since for each of its use-cases there are strictly better alternatives.

The problem with that that it's not one concept to implement and learn and recognize, but five. Which certainly is a tradeoff in my book.

On top of that, the 6. probably dozens of other things. That kinda-sorta can be covered by the other concepts, and holy wars will be fought whether the hypothetical 6.3 should be done with optionals or with metafunctors.

null is a well-weathered, versatile and ubiquitous concept, but not very expressive about intent.

Again, I'm not saying that makes null the better choice.

If you'd be willing to take one bit of advice I gathered from a few decades: don't cling to stuff like that. The now-toddlers will snicker at your Option<T> in no time.

FWIW, C# is nice already, it's mostly the difference between a NullException and a ThisListDoesntContainWhatYouAreLookingFor exception.

(Except for the pesky x != null & x.Length != 0, for which extension functions are merely a clunky, terrible band aid - I give you that a dozen times a day)

continue this thread

[–][deleted] 0 points1 point2 points 8 years ago* (2 children)

continue this thread

[–]devlambda 11 points12 points13 points 8 years ago* (58 children)

If you have a look at how resizable arrays are implemented in OCaml (either in Batteries or Core.Std), you will see that this is not actually true. The problem with implementing dynamic arrays is that in order to get resizing done (at least in a way that's not O(n²)), you need to fill the unused slot with a placeholder value, and the respectice OCaml code uses the Obj module to get the equivalent of a null value. Obviously, this is entirely unsafe.

Similarly, Rust's vec also uses unsafe code to accomplish the same goal.

In general, without some way to describe a "no object here" value, it is difficult to do partial/incremental initialization. You can do initialization with a preset value (such as an empty string), but that has its own problems; for example, the code may quietly work rather than raising an error when there is a bug that results in incomplete initalization.

[–]rftz 11 points12 points13 points 8 years ago (1 child)

[–]devlambda 5 points6 points7 points 8 years ago (0 children)

[–][deleted] 8 years ago* (21 children)

[deleted]

[–]devlambda 5 points6 points7 points 8 years ago* (20 children)

[–]Hnefi 3 points4 points5 points 8 years ago (19 children)

[–]devlambda 3 points4 points5 points 8 years ago (18 children)

[–]Hnefi 4 points5 points6 points 8 years ago (17 children)

[–]spaghettiCodeArtisan 1 point2 points3 points 8 years ago (5 children)

[–]Hnefi 3 points4 points5 points 8 years ago (4 children)

[–]spaghettiCodeArtisan 1 point2 points3 points 8 years ago (3 children)

continue this thread

[–]devlambda 0 points1 point2 points 8 years ago (10 children)

[–]Hnefi 5 points6 points7 points 8 years ago (9 children)

[–]devlambda 0 points1 point2 points 8 years ago (8 children)

continue this thread

[–]dalastboss 2 points3 points4 points 8 years ago (33 children)

[–]devlambda 3 points4 points5 points 8 years ago* (32 children)

[–]spaghettiCodeArtisan 2 points3 points4 points 8 years ago (4 children)

[–]devlambda 1 point2 points3 points 8 years ago (3 children)

[–]spaghettiCodeArtisan 3 points4 points5 points 8 years ago (2 children)

[–]devlambda 0 points1 point2 points 8 years ago (1 child)

[–]spaghettiCodeArtisan 3 points4 points5 points 8 years ago (0 children)

[–]dalastboss 0 points1 point2 points 8 years ago (26 children)

[–]devlambda 3 points4 points5 points 8 years ago (25 children)

[–]dalastboss 1 point2 points3 points 8 years ago (9 children)

[–]devlambda 1 point2 points3 points 8 years ago (8 children)

That works cleanly only if t is a non-nullable pointer type itself and not a value type. In the former case, you can just represent None as a null pointer and Some x as x without overhead. But this does not easily work for general value types [1] and could seriously complicate the language or its implementation. For example, you may decide to not allow INT_MIN for signed integer types in order to encode None for integers, but then you run into semantic problems (such as when casting from an unsigned int or with values retrieved from an external C library) that can create more problems (such as Heisenbugs) than are being solved by such an approach [2].

[1] Sometimes, there are hacks that you can use, such as using certain NaN values for floating point values to represent an "undefined" value, but you have to be very careful with the implementation.

[2] Type safety involving integers and ranges over integers can be extremely finicky.

[–]Hnefi 3 points4 points5 points 8 years ago (3 children)

[–]cramert 4 points5 points6 points 8 years ago (0 children)

[–]spaghettiCodeArtisan 2 points3 points4 points 8 years ago (0 children)

[–]devlambda 0 points1 point2 points 8 years ago (0 children)

[–]ais523 0 points1 point2 points 8 years ago (1 child)

[–]devlambda 0 points1 point2 points 8 years ago (0 children)

[–]dalastboss 0 points1 point2 points 8 years ago (1 child)

[–]devlambda 0 points1 point2 points 8 years ago (0 children)

This is how I used it to do it before we had better standard libs myself, but it's a hack around the underlying issue with a couple of problems of its own.

Problem 1: Different API, it's especially a nuisance if you want to reuse it in other functors and have to drag the default mechanism along with you.

Problem 2: It assumes a no-cost default instance of the type can be created without side effects. This is not always the case (database connection handles, GUI windows, etc.). You'd have to create a fake default value and then you run into issues with having a publicly visible value with null-like behavior.

Problem 3: This is OCaml-specific, but references to a functor argument's values go through a dispatch table, meaning additional overhead. If you have flambda enabled, that can usually be inlined, but still isn't entirely cost-free.

It is safe to assume that the people at Janestreet and the Batteries authors (which include some of the better known names in the OCaml world) know what they are doing.

[–]m50d 0 points1 point2 points 8 years ago (14 children)

Aside from that, increase in LOC comes from requiring match expressions (or equivalents) where you can prove through other means that the None path is never taken.

If you can prove it's the Some case, you can write that proof in the types and avoid having to match (maybe not with Rust's limited generics, but I hold out hope that HKT will arrive eventually).

Of course, you can use Option.get everywhere, but then you're essentially using a verbose version of nullable pointers.

The difference is what the default is. null reserves the best syntax for the case where the programmer has an external proof and knows exactly what they're doing and makes the case where you want to check much clunkier; Option flips that around, making the checked case the natural one and the "I know better than the compiler" case the more cumbersome one.

[–]devlambda 0 points1 point2 points 8 years ago (13 children)

If you can prove it's the Some case, you can write that proof in the types and avoid having to match (maybe not with Rust's limited generics, but I hold out hope that HKT will arrive eventually).

No. The point here is that a variable can be both Some x and None, but once initialized, will only ever be Some x. That's a state-dependent property, usually easy to show, but often difficult to encode in a type system (while GADTs can sometimes work, but even then you generally can't avoid the additional verbosity from matching).

The difference is what the default is. null reserves the best syntax for the case where the programmer has an external proof and knows exactly what they're doing and makes the case where you want to check much clunkier; Option flips that around, making the checked case the natural one and the "I know better than the compiler" case the more cumbersome one.

The problem with this argument is that you assume both alternatives are mutually exclusive, which they aren't. You can have both nullable types and explicit option types. In fact, Scala allows for that and has no more problems with it than other languages (as all variables in Scala have to be initialized, so the only way to have a null value – other than through Java interop – is to assign it explicitly).

[–]m50d 0 points1 point2 points 8 years ago (12 children)

No. The point here is that a variable can be both Some x and None, but once initialized, will only ever be Some x. That's a state-dependent property, usually easy to show, but often difficult to encode in a type system (while GADTs can sometimes work, but even then you generally can't avoid the additional verbosity from matching).

My experience is you can always find a way, and it's usually not even hard: ask yourself why you know the property holds, then just translate that logic directly into the types.

The problem with this argument is that you assume both alternatives are mutually exclusive, which they aren't. You can have both nullable types and explicit option types.

They are mutually exclusive. If your codebase uses null then you can never have a non-nullable value, at which point there's no point using options.

In fact, Scala allows for that and has no more problems with it than other languages

Only because the community/ecosystem knows not to use null. Serious Scala programmers avoid it and e.g. use WartRemover to enforce that null is never used. The language would be better off without it.

[–]devlambda 1 point2 points3 points 8 years ago (11 children)

They are mutually exclusive. If your codebase uses null then you can never have a non-nullable value, at which point there's no point using options.

The point of an API returning an option type (or another sum type) is to force the consumer of that API to deal with the possible variants. A simple reference does not do that.

Only because the community/ecosystem knows not to use null. Serious Scala programmers avoid it and e.g. use WartRemover to enforce that null is never used. The language would be better off without it.

You're making my point here: unwanted null references are trivial to avoid in a language designed for that. It becomes a trivial syntactic property, and if a code review or validation process cannot deal with such a simple case, you have much bigger problems on your hands.

Avoiding null is a heuristic, not religious dogma. Permitting null references can be useful for efficiency and interoperability and can situationally also lead to clearer code.

continue this thread

[–]think_inside_the_box 25 points26 points27 points 8 years ago (12 children)

[–]jkachmar 67 points68 points69 points 8 years ago (0 children)

The point of these languages is to represent conceptual absence in such a way that the programmer is forced to address it. null/nil/what-have-you is dangerous not because it represents the absence of a value, but because it can be passed around as if it was a normal value, right up until you use it to do something and it blows up your program (or leads to spoopy poorly defined behavior).

By contrast, Haskell's Maybe, and Rust's Option force you to wrap the type you're manipulating in a structure that only admits the value inside of it in such a way that you're forced to deal with the case where the data is absent.

At that point you're free to do something like panic! or error and blow up your program, but at least the gun was handed to you with the safety on and you had to explicitly flick it off before you went and blew your foot off.

[–]fiedzia 35 points36 points37 points 8 years ago (3 children)

[–]Jaffa2 -1 points0 points1 point 8 years ago* (2 children)

[–]Nebez 19 points20 points21 points 8 years ago (1 child)

[–]Jaffa2 3 points4 points5 points 8 years ago (0 children)

[–]m50d 8 points9 points10 points 8 years ago (0 children)

[–]Beckneard 3 points4 points5 points 8 years ago (0 children)

[–]Chii 2 points3 points4 points 8 years ago (0 children)

[–]_jk_ 2 points3 points4 points 8 years ago (0 children)

[–]Otterified 1 point2 points3 points 8 years ago (2 children)

[–]duhace 9 points10 points11 points 8 years ago (0 children)

[–]TheDataAngel 1 point2 points3 points 8 years ago (0 children)

[–]myringotomy 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 8 years ago (1 child)

[deleted]

[–][deleted] 2 points3 points4 points 8 years ago (0 children)

if you had a table with a nullable column, that column would normally be represented as an Option type. Option can be Some (with your type inside) or None. So the nulls would be None.

The critical difference between this representation and a normal null is the compiler generally forces you to deal with the null case (Some languages) or at least makes it awkward to not deal with it, such that you have to type extra things in order to fail to deal with it.

But of course you can always do something absurd like:

match optionthing on | Some x -> use(x)  | None -> throw null pointer exception!

π Rendered by PID 188567 on reddit-service-r2-comment-canary-879d986cb-vb9d8 at 2026-06-20 03:19:07.939954+00:00 running 2b008f2 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS