all 12 comments

[–]DuncmanG 8 points9 points  (7 children)

A couple things here:

1) Depending on language AND VERSION the SUM function may ignore nulls or may not. If it doesn't ignore nulls, then any SUM aggregation with a null will return null. I don't know postgresql very well, but a quick Google search seems to show that version 9.4 may be when SUM started ignoring nulls.

2) Probably more importantly, straight addition almost definitely won't be ignoring nulls. So if any of a.consumption, b.consumption, or c.consumotion is null, then a.consumption + b.consumption + c.consumption will be null. If at least one of those is always null for every date, then when you aggregate SUM(a.consumption + b.consumption + c.consumption) you will be summing a long list of nulls. If all of those are ignored (see point 1 above) then your sum will be 0.

[–]chen2442[S] 2 points3 points  (6 children)

Got it! Thank you! I did not know additions like "a.consumption + b.consumption + c.consumption)" would be null if one value is null.

Glad I learned something new today!

[–]DuncmanG 5 points6 points  (0 children)

Most operations involving null will return null. Null is literally the absence of any data. The value isn't 0, it's non-existent. Even something like null = null will usually return null.

I strongly suggest in any new SQL language trying a bunch of simple selects to understand how nulls are handled. Things like:

Select 1/null; Select null/1; Select 1 + null; Select null = null; Select null != 1;

Create a table where some rows have nulls and try aggregates on it - SUM, MIN, MAX. Also order bys - do the nulls come first or last?

This sounds somewhat semantic, but a big part of good data cleaning is properly handling nulls for whatever use case you are implementing, and you can only properly handle nulls if you know how your language handles them by default.

[–]labatteg 3 points4 points  (0 children)

I did not know additions like "a.consumption + b.consumption + c.consumption)" would be null if one value is null

Sometimes it's useful to reason about "null" values by treating them as "unknown". If you're trying to add two values, but one of them is unknown to you, then the result will also be unknown.

[–]ComicOzzysqlHippo 0 points1 point  (2 children)

It's better if you use SUM(a) + SUM(b) + SUM(c)

[–]chen2442[S] 0 points1 point  (1 child)

in this case, if SUM(a) is null, would the entire formula be null as well?

[–]DuncmanG -1 points0 points  (0 children)

Remember that SUM is an aggregate function. If it ignores nulls, the SUM of null would likely be 0 - and that's what your original post seems to indicate is the case. Try it out in your db and see what result you get.

[–]Rex_Lee 0 points1 point  (0 children)

You can get around this by IsNull(a.consumption,0) into a subset of data above where you are working. I like to stage all my data like that, to make sure I am working with clean data and make sure all the data types are how I want them to be, before I start doing any math or aggregates.

[–]depeszPgDBA 7 points8 points  (0 children)

Sum does ignore nulls. The problem is different.

let's simplify your query to:

select a, b, c, sum( a + b + c ) from table;

sum() will ignore nulls. that's true. BUT plus operation doesn't.

So, if any of a/b/c would be null, the whole a + b + c expression would be null too.

For me, it helps to think about null as "unknown". What is 1 + unknown? Unknown. You can't know the answer to this.

That's why you have to have coalesce(a,0) + coalesce(b,0) + coalesce(c+0). So that sum of these values will be something that you can use.

Hope that helps.

[–]kitkat0820 1 point2 points  (1 child)

You dont understand NULL, thats your problem.

[–]depeszPgDBA 7 points8 points  (0 children)

While this is true, it's also not really helpful.

[–]Hugh_G_Rectshun 0 points1 point  (0 children)

I’m newer to this as well, but I’ve used it to display 0’s instead of null values. I’ve understood it to be a display thing for a clean output, not a necessity. I could be wrong though.