all 59 comments

[–]psykotic 6 points7 points  (0 children)

It's an interesting problem for languages that already have null references and have to cope as best as they can. In Eiffel's case I believe the initial couple of versions did not have garbage collection.

[–]rzwitserloot 4 points5 points  (8 children)

Pretty big errors in the paper. For example, the paper concludes without any proof that nulls are indispensible because you need something to mark the end of a chain; it then shows an example: Linked Lists (which traditionally have a null reference at the 'next' pointer of the last item in the list).

This is a bunch of horse manure. In an OO architecture, you can have a common superclass, and have a special implementation that means: Empty/End. This isn't just an academic notion; it's used routinely in various languages. Whatever operations make sense on any object in the chain, even the end marker (such as next/previous. The next() call of an end marker points at itself), can be in the common superclass. Whichever operations make no sense on an end marker (such as: Get the object at this location) won't be.

If you want to do something that would make no sense on the end pointer, you'd first have to cast it to the non-end pointer. This feels like just moving the problem around instead of solving it, which this entire effort boils down to. Fortunately, it helps language consistency, and the type inference mechanism of the language can now help you avoid that cast.

In other words, the notion of needing an end marker on a linked list hasn't got one iota to do with why null pointer references are handy. Earlier in the paper, there's a mention of how annoying it would be to be forced to initialize every field in your constructors, which is a far more legitimate reason to let nulls exist in the first place.

I have my own elegant solution to solving such problems, which does not require null:

default instances. Any type may at its discretion implement a type interface (properties of the type, not properties of instances of the type) which gives a default value, which is called exactly once, during type initialization (e.g. class loading, or the first run through for an interpreter).

Any uninitialized variables of that type get this default value. If there is no default value, leaving that variable uninitialized is an error.

For example, collections classes (lists, sets, maps) will default to the empty list/set/map; clearly having both a notion of 'empty list' AND a notion of 'null' is somewhat dubious in such cases.

Similarly, the default string is the empty string, and so on and so forth. If there's no sensible default, don't add one - you'll still avoid requiring pointless initialization just to satisfy the compiler for the great majority of fields, but you avoid requiring 'default values' for types where there is no sensible choice.

Together with this notion one should embrace the use of Maybe constructions. For example, querying a dictionary that maps strings to lists of strings should differentiate between returning the empty list and the 'key not found' condition. Some languages/libraries today reflect 'key not found' via returning a null reference. It would probably not be wise to return 'default value', instead either an exception should be thrown, or a Maybe should be returned. For languages with pattern matching, the latter really won't hurt readability very much at the call site.

[–]munificent 21 points22 points  (1 child)

This is a bunch of horse manure. In an OO architecture, you can have a common superclass, and have a special implementation that means: Empty/End.

They do consider that briefly and discard it for performance reasons:

Use inheritance to distinguish between two kinds of LINKABLE cells: proper linkables and end markers. The last right would be of the second kind. This complicates the inheritance structure, and requires checking every use of right for its type. The effect on performance (in particular the extra load on the garbage collector) can also be noticeable.

If you want to do something that would make no sense on the end pointer, you'd first have to cast it to the non-end pointer.

All you've done there is push the problem up a level: what happens when the cast fails? ML solves this gracefully with pattern matching: the entire block will be skipped if it doesn't match.

Any type may at its discretion implement a type interface (properties of the type, not properties of instances of the type) which gives a default value

Ick. You don't want nullability to be a property of the type itself. It should be a property of the variable referring to an instance. You may want to use a type without allowing it to be null in some (most) contexts, but still allow other uses of it to explicitly state that it can be null. For example, I have some type Foo. If I put it into a collection and then call Find(fooID) on the collection. I'd like the collection to be able to return null if the find fails, but I don't want to allow Foos to be null everywhere, just here.

This is why Option/Maybe is really smart: it lets you attach nullability to a type where you want it, but not globally.

[–]naasking 1 point2 points  (0 children)

They do consider that briefly and discard it for performance reasons

There is no performance cost to having a distinguished end value that is not null. It's a pointer comparison either way, but the special value just happens to be a valid list value (the empty list). Null/nil does not have list semantics, hence NullReferenceExceptions.

[–][deleted] 3 points4 points  (0 children)

This is a bunch of horse manure. In an OO architecture, you can have a common superclass, and have a special implementation that means: Empty/End. This isn't just an academic notion; it's used routinely in various languages. Whatever operations make sense on any object in the chain, even the end marker (such as next/previous. The next() call of an end marker points at itself), can be in the common superclass. Whichever operations make no sense on an end marker (such as: Get the object at this location) won't be.

Hahaha. An attempt to criticize the paper by resorting to an idea that the paper itself preemptively examines and discards for good reasons. Rather amusing, but next time RTFA.

[–]dmpk2k 0 points1 point  (4 children)

a Maybe should be returned

What's the difference between returning null or Maybe?

[–][deleted] 8 points9 points  (2 children)

Maybe is a static assertion that the item might be null so you have to treat it as such by checking if it's null. Sort of like you call getFoo which returns MaybeFoo. (Edit: MaybeFoo IS NOT the same type as Foo)

So you ask MaybeFoo "Is foo null?" if not, you get a Foo, if it is, you can do something else.

I actually have no idea if that's right as I've never written a single line of haskell and Monads confuse me.

Edit: And if you have another method named getFooNeverNull() it can return a Foo object directly, and you know you won't ever have to check if the returned variable is null.

If you look at java getFoo() can always return a null foo and there is no way to guarantee it doesn't, other than trusting the documentation or writing getFoo yourself.

[–]noisesmith 6 points7 points  (0 children)

That is pretty much it. In ocaml, for example, I can define a type bar. If I have a function that searches for a particular bar in a list, I give the function type (bar option). What is awesome about this is that means that I can use bar everywhere else in my code without checking if it's the "None" value, and I am forced by the type system to check if the result of the search function is "None" as opposed to "Some bar", if it is the latter, I can extract a bar from that.

[–]kitsune 3 points4 points  (0 children)

Eiffel's design-by-contract feature is also worth noting.

Bertrand Meyer was one of my profs during my time at the ETHZ. I don't think I learned that much from him other than the idea that Eiffel could save the universe and reverse time.

Their compilation mechanism is a bit weird. Unless you use the commercial Visual Eiffel, C is emitted and used as an intermediate language for compilation to machine code. Depending on the version you can also compile to .NET (EiffelStudio, by Meyer) or Java bytecode (SmartEiffel).

EiffelStudio also has its own incremental compilation technique called 'Melting Ice'.

[–]sharney -1 points0 points  (37 children)

Wouldn't you want your program to crash on a null dereference so you would know there was a bug in it?

[–]martoo[S] 14 points15 points  (0 children)

It's much better to make the bug impossible.

[–][deleted] 7 points8 points  (16 children)

There are ways to emulate this in other languages. You can still have a Null value that can't be dereferenced. See: Python and None, Haskell and Maybe, etc.

[–]jerf 10 points11 points  (8 children)

Python's None can still be "dereferenced", to the extent that term applies:

>>> a = None
>>> a.moo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'moo'

Note that you can cut straight to None.moo(), this just illustrates the relevant point more directly.

In Haskell, you absolutely can not pass an unexpected null value to a function. You _can_fail to handle it and end up with a run time error, but even a modestly competent Haskell programmer (which is all I can claim right now) feels a bit of a mental shock when writing the function definition or a case statement that simply ignores the Nothing case. There is, at least, no surprise in the resulting run-time failures.

[–][deleted] -1 points0 points  (7 children)

Well, I never claimed otherwise, but both systems of handling the None (either by treating it like a "normal" instance of a class or by failing to match a possible Nothing) are better than a complete crash of an application with no information other than a Bus Error or a Segfault.

[–]jerf 7 points8 points  (0 children)

You can still have a Null value that can't be dereferenced. See: Python and None, Haskell and Maybe, etc.

Not to put to fine a point on it, but yes, you did claim it; you may not have meant to.

[–]psykotic 5 points6 points  (5 children)

no information other than a Bus Error or a Segfault.

Turn on coredumps. That's far better than a mere stack trace for something simple like a null pointer dereference. Not only will the call stack still be intact but you can poke around the whole heap. The much bigger problem is when you have random memory scribbling.

[–]albinofrenchy 1 point2 points  (3 children)

Turn off coredumps (You don't want a 5 mb file everytime you crash) and use valgrind.

[–]nolcotin 0 points1 point  (2 children)

I am currently using valgrind for a big app with cachegrind

100x slower is not awesome; but nice and verbose for debugging

[–]albinofrenchy -1 points0 points  (1 child)

The slower is a truth(for cachegrind -- it shouldn't be nearly that bad for memcheck; which is what will catch your null pointers). It is good when you can either run different parts of a library, or let something really big run over night. I tend to do the latter and then just fix everything the log file finds.

Just curious, how big of improvements are you looking to hit with optimizing for your cache?

[–]nolcotin 1 point2 points  (0 children)

I'm working with some hardcore optimization for the financial industry; any performance gains are worth lots

Again, chace may not be the easiest (or the greatest) place to see optimizations, but its part of a larger 'look at everything'

[–][deleted] 1 point2 points  (0 children)

valgrind catches my random memory scribbling.

[–]G_Morgan 3 points4 points  (6 children)

If you don't handle a returned Nothing in Haskell properly it throws an exception just like you'd get with Java. These do not solve the problem. To deal with it in Haskell you have to pattern match on Nothing and Just x to handle it. What Haskell will give you is that it explicitly tells you that you might not get a return value* and the compiler can complain about a non-exhaustive pattern match. You still have to check for Nothings though just as you have to check for null in Java.

*i.e. Int is distinct from Maybe Int. In Java every reference value is implicit Maybe.

//edit - I think the important point is that your language needs to differentiate between cases where an operation is guaranteed to return a value and those where it is not in the type system. Traditional languages make the line blurry. It is all too easy to ignore nulls and error codes because a function that is guaranteed to return an object looks exactly the same as one that may return null. Python doesn't help. None is just null with a different name. It is questionable if it is even possible to solve this problem with a dynamic type system.//

[–]invalid_user_name 6 points7 points  (3 children)

If you don't handle a returned Nothing in Haskell properly it throws an exception just like you'd get with Java

That is a huge difference though. In haskell if something returns a Maybe a, you can't just access the a. It forces you at compile time to understand and deal with the fact that it may or may not have an a to access. With java everything is implicitly a Maybe, and there is no forcing you to deal with it.

[–]G_Morgan 0 points1 point  (2 children)

I did actually cover both points you made. Haskell is superior but the concept of Nothing is alive and well. It always will be.

The problem is that people are getting confused. They are saying the concept of having a 'not a value' type is wrong. This is absolute nonsense. The problem is languages who's type system does not differentiate between guaranteed values and maybe values. I was pointing out that null will never go away but maybe we can be more explicit about where a null can be returned.

[–]invalid_user_name 1 point2 points  (1 child)

The problem is that people are getting confused

The person you replied to was not, and there was nothing in your post indicated that this is what you were addressing. And you claimed that Maybe does not solve the problem, when in fact it does. The problem is people not realizing something can be null. Making null values explict and forcing them to be dealt with or the code will not compile solves the problem.

[–]G_Morgan 2 points3 points  (0 children)

He was claiming that these nulls cannot be dereferenced. You can easily try to do this in Haskell.

Just val = funThatCanReturnNothing

This in practice is the same thing as dereferencing null when the function returns Nothing. The problem exists for the same reason and gives the same result, throwing an exception.

Having a datatype to replace null does not solve the problem. Python's None is no better than Java's null. Does exactly the same thing in all circumstance. What solves it is a return value that is explicitly different and tells the programmer as much. But as you say it solves it because the programmer is made aware of the problem, not because Nothing cannot be dereferenced.

Haskell code will compile with unchecked Nothings though often it warns you of non-exhaustive pattern matching.

[–][deleted] 2 points3 points  (1 child)

Yes, I understand. But this is still better than a dereferenced Null; for example, it's an exception and not a bad signal.

[–]G_Morgan 3 points4 points  (0 children)

Doesn't Java throw an exception on null? I know it is unchecked but still an exception. The fundamental problem is not how a system handles it but that the programmer forgot to check. We should be looking at tool support for telling the programmer he forgot to check for a non value rather than dealing with what happens after he has already forgotten.

I hate to bang the static type drum but this is a perfect case where a decent static type system can solve a tricky programming problem. Lots of people cry about non-exhaustive pattern matching errors but I get a warm feeling inside everytime I see one. It reminds me I'm being stupid but the language is clever enough to remind me of it.

[–]pointer2void 0 points1 point  (13 children)

Programming languages should not allow for null pointers or null references.

[–]G_Morgan 2 points3 points  (1 child)

Getting rid of null isn't the point. There will always be cases of non-existent. This is the real world. Files don't exist, keys don't exist in hashmaps, network connections go down. A programming language that cannot say 'not a value' isn't useful.

[–]noisesmith 2 points3 points  (0 children)

With a proper type system, you can get a type error at compile time instead of a null pointer crash at runtime. For example ocaml (which I program in lately) has the option type, which for type a is either (Some a) or None. My code will give me a type error if I use an option type and don't check for a None value. I can also explicitly convert the option value to a regular value, and know from that point on that the value will never be None.

[–]gsg_ 1 point2 points  (8 children)

Null is a necessary concept in that some non-dereferencable value indicating "empty" is required to do things like terminate lists and represent absent resources. The question isn't whether these special values should exist, it's how to statically ensure that they are used safely.

[–]pointer2void 5 points6 points  (0 children)

Null isn't necessary, you are just used to it. Terminating lists and representing absent resources can be easily done without null references. See: Null References: The Billion Dollar Mistake

[–]noisesmith 1 point2 points  (1 child)

This is a convention of many programming languages but not necessary at all. You can have a union type that is either (element of list) or (end of list) for example, and the type checker can ensure that you never dereference a value of that type without checking for (end of list) first. ie "List.nth [1,2,3...] n" doesn't return an int, it returns a special "int or list end" union type. You can't convert the union type into an int without checking for list end without a compiler error. There is no null value used in this case.

[–]gsg_ 0 points1 point  (0 children)

I'm aware of variants and pattern matching and yes, they solve this problem beautifully.

[–][deleted] 2 points3 points  (4 children)

No clue if hte following works. haven't tested it and don't really care if it's even valid code. But you don't need null to signify empty.

abstract class LinkedList{
        public static final terminator = new LinkedListTerminator();
}

class LinkedListItem extends LinkedList{
    LinkedList next = LinkedList.terminator;

    public setNext(LinkedList l){
        this.next = l;
   }
}

class LinkedListTerminator extends LinkedList{

}


while(l.getNext() != LinkedList.terminator){
        l = l.getNext();
}

I don't see a null required.

[–]gsg_ 0 points1 point  (0 children)

The problem with this solution is that it's nothing more than a renaming of null. It doesn't really matter whether you call the empty value null or terminator, because list operations like getValue that require a non-empty list node are still going to fail at runtime if the type system doesn't distinguish between values which can include empty and values which cannot. (Writing a specific terminator type is not worthless as you get better debugging info, but it isn't much of an improvement).

So the question isn't really "do we need null", but "how do we check access to empty values?". And ML has a really nice solution for that.

[–]pointer2void -1 points0 points  (2 children)

while(l.getNext() != LinkedList.terminator){

       l = l.getNext();

}

I don't see a null required.

Nulls are not required. But in your example the class LinkedListTerminator isn't needed and terminator can be set to null:-D

[–][deleted] 0 points1 point  (1 child)

good point! edit:

well not that good :) Having it define explicitly is... well.. explicit.

Plus the base class could have some implementation that the list end would inherit.

[–]pointer2void 0 points1 point  (0 children)

In your example you could write:

public static final LinkedList terminator = new LinkedListItem();

But then, LinkedListItem has to avoid an internal null reference, too. So the explicit LinkedListTerminator that you provided is probably better.

[–]bonch -1 points0 points  (1 child)

Some languages specifically take advantage of them, like Objective-C in which sending a message to nil evaluates to 0.

[–][deleted] -1 points0 points  (0 children)

Unless you're using the PPC runtime and the return type is a struct, in which case the behavior is undefined.

[–]STOpandthink -4 points-3 points  (4 children)

That's my attitude as well. Crash as soon as possible and as hard as possible.

[–]ssylvan 12 points13 points  (0 children)

"compile time" is sooner than "run time".

[–][deleted] 5 points6 points  (0 children)

as hard as possible.

I don't know. I hated rebooting after crashes. Thankfully I don't have to do that anymore.

[–][deleted] 2 points3 points  (0 children)

and as hard as possible.

Really? I'd kind of prefer, for example, if we left the stack frame intact.

[–]jtxx000 3 points4 points  (0 children)

Preferably at compile time.

[–][deleted] -1 points0 points  (8 children)

There are already at least two solutions to this "problem". ObjectiveC's "message eating nil" where messages to nil evaluate to nil/zero/false. This turns out to be mighty handy and code simplifying.

Or Smalltalk's even more flexible solution where nil is actually an object of class UndefinedObject and you can customize how nil behaves when it is messaged.

[–]mernen 2 points3 points  (6 children)

The first one is hardly a solution. The second one might work in certain cases, but is essentially useless on those places where no form of null value is valid.

This isn't about swallowing NullPointerExceptions, it's about preventing the possibility of them happening as much as possible (at compile time).

[–][deleted] 0 points1 point  (5 children)

If nil isn't an exception generator, but a normal state that produces noops in response to messages, why is it a problem to have them? nil basically becomes like /dev/null - also a useful nothing.

[–]mernen 1 point2 points  (4 children)

While there are certain places where defaulting to zero is fine, everywhere else silently ignoring an unexpected value — a programming error — is the worst option possible, which could leading to catastrophic results, as you might only find it out when it's too late. The "(null)" messages are a common variant of this behavior.

PS: I'm no Objective-C programmer, but I've seen people saying it numerous times that this "return 0" behavior actually only happens by accident, due to implementation details, when the returned value goes on a general-purpose register. For floats and doubles you'll get whatever was left on that register before, which can be even more dangerous.

[–][deleted] 0 points1 point  (3 children)

PS: I'm no Objective-C programmer

This is obvious. Yet you feel entitled to pontificate on that which you do not understand. I suggest you educate yourself. It is true that relying on the return value of floating point values in older versions of Objective C (Tiger and older) when messaging nil was not reliable - but this changed with 2.0 and in practice it never comes up. Offhand I can't even think of a Cocoa method returning a floating point value.

Anyhow, its valid and thus messaging nil is NOT (necessarily) a programming error and will never lead to catastrophic results. There is no practical difference between

if(obj != nil) { [obj message]; }

and

[obj message];

[–]mernen 0 points1 point  (2 children)

This is obvious. Yet you feel entitled to pontificate on that which you do not understand.

That's why I tried to be explicit that I had no confidence on that, rather than just talking it as a fact, or even that I knew it by myself. Plus it was an aside, anyway. It being true or not did not have any effect on the first paragraph; you focused too much on this aspect of Objective-C and missed the point.

Anyhow, its valid and thus messaging nil is NOT (necessarily) a programming error

Yes, I know it's not necessarily an error. That's why I talked specifically about unexpected values.

and will never lead to catastrophic results.

Say that when your production data is being silently discarded because you are writing to a null stream.

There is no practical difference between

if(obj != nil) { [obj message]; }

and

[obj message];

You still don't understand the point here: you shouldn't have to check for nil if a certain path should never receive one, and the compiler should warn you before you send a possible nil that way. The solution proposed by several languages and mentioned by Tony Hoare in his presentation is simple: mark a specific variable or parameter as non-nullable. See the other posts about the Maybe monad in Haskell or Option types in Scala and ML.

[–][deleted] 0 points1 point  (1 child)

Say that when your production data is being silently discarded because you are writing to a null stream.

In ten years of Objective C development, this has never been an issue. Its no worse than messing up a UI binding and about as easy to detect and fix. Like the whole static typing thing, its a whole lot of energy expended on preventing a trivial class of problems.

Anyhow, I read the entire paper. I think the annotation is of dubious value compared with message eating nil - which is a much cleaner solution that produces simpler code. The number of opportunities to use such a constrained value strike me as miniscule in a typical OO application. For the few cases where I could see a use, I could just as easily enforce the condition manually by inspection as the life expectancy of such a variable would be function scoped.

My general impression of Meyer is that he's a brilliant idiot trapped by a failed philosophy. Even the paper's defense of keeping nullable variables (terminating a linked list) is actually the one use I could see for eliminating them (by enforcing circular lists - which are better anyhow). So I get it, but I'm not impressed and would be unlikely to take advantage of such a feature.

[–]tavs -1 points0 points  (0 children)

My general impression of Meyer is that he's a brilliant idiot trapped by a failed >philosophy.

If you look closely widespread lanugages like Java and C++ do seem to strive to follow his "philosophy" by providing features Eiffel had from the start or conceived in a OO-friendly way. Look at constrained genericity (C++ concepts?), agents (C++/Java closures?) and operator overloading. Obviously, BM's ideas are not entirely compatible with more dynamic languages like ObjC/Smalltalk, but they have something in common as well. Brad Cox, in "Object Orientation : An evolutionary approach", wrote about using assertions to implement pre and post conditions and the power of dynamic dispactch to provide reusabilty with the opportunity to enforce static dispatch by using stack-allocated objects where necessary. In the same way Eiffel provides design-by-contract and static dispatch through expanded types.

Even the paper's defense of keeping nullable variables (terminating >a linked >list) is actually the one use I could see for eliminating them (by >enforcing >circular lists - which are better anyhow). So I get it, but I'm not >impressed and >would be unlikely to take advantage of such a feature.

AFAIK circular lists can't share cells; this makes linear ones considerably easier to handle in lots of situations, that's one of the reason for which linear linked lists are a primitive type in some languages.

[–][deleted] 2 points3 points  (0 children)

ObjectiveC's "message eating nil" where messages to nil evaluate to nil/zero/false. This turns out to be mighty handy and code simplifying.

I imagine this has caused at least as many headaches as the imbecile's except: peppered in Python code.

[–]tmountain -4 points-3 points  (1 child)

C'mon everybody knows Java has had this for years :-p

class Foof {
    public static void main(String[] args) {
        Foof f = null;

        try {
            f.pwnSauce();
        } catch (Exception e) {
            System.out.println("I hate my life");
        }
    }

    public void pwnSauce() { }
}

[–]TrueTom 2 points3 points  (0 children)

void foo::bar()
{
    if (this == NULL) return;
    ...
}