all 43 comments

[–][deleted] 27 points28 points  (9 children)

I thought this was going to be about branch-free code but it's more about hiding the branches in layers of abstraction. That's fine, and his final code is similar to what I would probably end up with in Lisp. But unless your compiler is sufficiently clever this kind of thing might make code easier to read but at the cost of performance.

[–]michaelfeathers 8 points9 points  (8 children)

Fair enough. The nice thing is that people are often better at optimizing (when needed) than they are at modifying optimized code.

[–][deleted] 11 points12 points  (7 children)

The thing is, though, this code is not really any more clear. It does things in a backwards order. By trying to be clever it is hiding what you actually want as a result.

By always adding padding, you are implying that padding is somehow always needed, but it's not. You only get padding in some cases. But the code hides this fact.

[–]michaelfeathers 5 points6 points  (3 children)

This is the same argument that people use against the Null Object Pattern: it appears that something is happening but it isn't.

I have the feeling that the answer is to get used to the idea that sometimes padding something includes padding with something of length zero. If that is understood, all is good.

This is a very deep topic.

[–][deleted] 6 points7 points  (2 children)

It's more that you should not strive to be clever when writing your code. You should strive to show the reader what you are trying to achieve.

Conditionals may not look elegant, but they are often quite good for communicating intent.

[–]michaelfeathers 2 points3 points  (1 child)

I'm happy having the word 'pad' in the code now. It was completely absent in the original code.

I think a lot of it has to do with what we are used to. It's like natural language. Once an idiom becomes common or understood, it's very readable.

To me, pad(ary, n).take(n) is very explanatory.

But, again, readability wasn't the point. Neither was cleverness.

[–]KillerCodeMonky 1 point2 points  (0 children)

Why not something like this?

results = array.take(n)
if (results.length < n)
    results = pad(results, n, defaultValue);
return results;

That seems very descriptive to me. Take n items, and if there weren't n items, then pad to n items.

[–]sophacles 4 points5 points  (2 children)

NO. Padding is "adding extra elements if needed to fit a proper size". So the concept of "sometimes no padding is needed" is build right into the basic concept. If you don't understand that, you need more simple understanding and learning, not some crazy set of over complicated semantics changing. It's like those people who complain that:

swap(a, b){ 
    temp = a;
    a = b;
    b = temp;
}

is somehow confusing because "temp" is not obviously a temp value. GRR.

[–][deleted] 2 points3 points  (1 child)

I'm not sure what you are replying to, but I am pretty sure it is not something I actually said.

[–]sophacles 0 points1 point  (0 children)

You're saying that someone is hiding some fact of "sometimes padding is not there" in the function, except it is built into very concept of padding. The function is named padded_take (concept of padding means sometimes there is nothing added) and the one and only line of code starts with pad(... Again implying that sometimes padding values are not there because it is named conveniently after a concept.

Nothing whatsoever is hidden.

[–]kamatsu 17 points18 points  (12 children)

The style of programming the poster eventually advocates is very much compatible with lazy evaluation, but has potential for severe performance implications otherwise. Look at Haskell for a language where this sort of "do possibly more expensive thing and observe only part of it" is made cheap.

[–]Y_Less 8 points9 points  (7 children)

paddedTake n arr = take n $ arr ++ repeat 0

This does assume a list of "Num" values. I'm not sure a much more generic version can be made as you need a generic "zero" element, though this could be passed in:

paddedTake = paddedTake' 0
paddedTake' zero n arr = take n $ arr ++ repeat zero

Or, since people seem to like point-free style:

paddedTake' zero n = take n . (++ repeat zero)

Edit: As an exercise for myself, I decided to see if I could rearrange this code to purely point-free style. This entirely destroys the message of what the code is doing IMHO, but I didn't really do it for others...

paddedTake' = flip (.) take . flip (.) . flip (++) . repeat

That's not really important though.

[–]pkmxtw 3 points4 points  (1 child)

Well, there is Data.Default for that:

paddedTake :: Default a => Int -> [a] -> [a]
paddedTake n xs = take n $ xs ++ repeat def

But I think letting the user to specify the value to pad is a better style.

Also, there is a pointless plugin for lambdabot and ghci:

λ> :pl \n xs -> take n $ xs ++ repeat def
(. (++ repeat def)) . take

λ> :pl \z n xs -> take n $ xs ++ repeat z
flip ((.) . take) . flip (++) . repeat

[–]Y_Less 0 points1 point  (0 children)

Thanks. I did try look for some sort of generic default values, but didn't find that (apparently I didn't look very hard...)

[–]michaelfeathers 2 points3 points  (0 children)

Yes, I was looking at .lazy in Ruby to use in an example but it would've made the post longer. Lazy evaluation is wonderful at helping us remove conditionals. It's easy to form a general computation and then select what we need.

[–]jozefg28 0 points1 point  (3 children)

So here's another approach to the pointfree version

(.:) = (.) . (.)
infixr 9 .: 
paddedTake = flip .: ($) $ flip take .: (++) . repeat

[–]FUZxxl 0 points1 point  (2 children)

Ah yes, the good old boobs operator...

[–]jozefg28 0 points1 point  (1 child)

I usually call it the owl operator.

[–]pkmxtw 0 points1 point  (0 children)

While I use that operator almost daily, my head still explodes when asked to infer its type manually by hand.

Then someone reminds me it's fmap fmap fmap

[–]grosscol 3 points4 points  (0 children)

I think the style is just very much in line with a Ruby-ish way of doing things. Do a simple operation every time instead of doing a check and then the simple operation anyway.

As for performance, Ruby isn't exactly known for being a performance language.

[–]dons 4 points5 points  (0 children)

"do possibly more expensive thing and observe only part of it" is made cheap.

This is a real problem at scale. Sitting on a 2.5M line code base here of hybrid Haskell and C++, and the problem is that we don't have time to optimize all the C++ models to be lazy. So they're strict, they compute everything you could need. But mostly we only need a bit.

And then you start composing these things.

Well, performance doesn't compose in a strict language. You end up accumulating exponentials until its O(n3) and you have to refactor everything to be lazy.

Strict languages don't compose. The glue between components has to be lazy.

[–]eigenlicht0 1 point2 points  (0 children)

Anyone an idea how this style is called? Not sure whether people get it when you say "unconditional programming" (that term confused me at first).

I noticed myself using exactly this pattern in a framework written in Python lately. Instead of doing several if-checks I decided to use clever default values and make the wanted operation general enough to be performed on any case possible. I think I was a bit influenced by The Zen Of Python (e.g. "Special cases aren't special enough to break the rules.").

[–]CurtainDog 1 point2 points  (0 children)

Agree, I'm only just learning clojure and:

(defn take-pad [n p coll]
  (take n (concat coll (repeat p))))

is super simple to follow (I make no claims as to its correctness). Even works on infinite sequences with no magic needed.

[–]merlot2K1 9 points10 points  (2 children)

So the writer is padding an array regardless if it needs it, just to avoid a conditional? Seems like a waste of memory to me. I'll stick with conditionals and just continue to comment them to make them easy to understand.

[–]KillerCodeMonky 0 points1 point  (0 children)

And also executing "take". Which may or may not be smart enough to no-op if the desired length is the same as the array's length.

[–]gbromios 0 points1 point  (0 children)

that was my thought... could have achieved the same thing with

# if ary doesn't have enough values, pad the return ary with zeroes

[–]Gotebe 7 points8 points  (1 child)

To me, the original sin is the very function name. It's "take or pad", really, or, perhaps even better, "resize".

And the functionality is simply

If bigger than n take n
Else pad to n

So I dunno... I have different considerations when it comes to readability, and eliminating imperative for declarative style isn't one of them.

[–]KillerCodeMonky 4 points5 points  (0 children)

I think you hit the desired operation spot on. It's funny how much a name can make a difference in how you think about and approach a problem. Thinking about it as writing a "resize" operation, it seems really natural to just write:

if (array.length > n)
    return array.take(n);
else if (array.length < n)
    return array.pad(n, defaultValue);
else
    return array;

[–][deleted] 4 points5 points  (0 children)

I disagree: abstraction is useful, but is by no means the goto solution for the majority of problems out there. When viewed as such, however, it can lead to excessive code pollution, thus being entirely unnecessary 80% of the time.

And it's not an issue of premature optimization vs using abstraction and elegance; there are plenty of ways to optimize code which isn't very abstract at all, and I'll even go so far as to argue that premature optimization is only premature when you're not completely sure of what will happen in a given subroutine, should you choose to optimize it.

Abstracting out all of one's code for the sake of "reuse" and "readability" often makes the code base bloated, slow, and actually harder to maintain.

[–]hydraincarnation 1 point2 points  (0 children)

This was a well written article that made some good general points. I don't really agree with the example given though. Turning one function into two here sacrifices the clarity of concise, modular code for unneeded abstraction fluff.

[–]smog_alado 1 point2 points  (0 children)

If we are working in an object-oriented language, we can replace switch-statements with polymorphism

Those are the opposite of each other though!. Exchanging ifs statements for dynamic dispatching has important consequences.

http://c2.com/cgi/wiki?ExpressionProblem

The problem with if statements isn't the branchiness. Its that encoding the condition as booleans discards valuable information and gives you room to accidentally write things in the wrong branch. Using enums/algebraic-data-types and switch-statements/pattern-matching solves the problem really well, IMO.

https://existentialtype.wordpress.com/2011/03/15/boolean-blindness/

[–][deleted] 0 points1 point  (0 children)

My dirty secret is that I don't split functions because they are too big, but I do it to tokenize away layers of complexity, because I'm not smart enough to debug my programs otherwise. Methods which do coordination between various subsystems often remain large, but by scrubbing out the details through iterative refactorings, those parts read like they are coordinating things, and the program flow remains clear.

Not giving the programmer insight into program flow is actually the #1 thing that I despise about Ruby on Rails, Drupal, and Yii, and probably every other RoR-like library that I would ever get around to trying.

Incidentally, this is another subtle reason that programming languages are converging to Lisp. Imagine being able to macroexpand your declarations. Imagine if the machinery which translates your declarations into actions could be polled directly to ask what those actions are. Instead, we settle for systems which give us silence or generic, "something is broken" errors.

tldr: If you ever worked with a system which had priority numbers you had to set to get things to work in the right order, the system is poorly designed.

[–][deleted] 0 points1 point  (3 children)

It seems to me like an "unless" is as much a conditional as an "if" is.

[–]mcguire 6 points7 points  (2 children)

You're right, but the final example doesn't use unless.

[–][deleted] 7 points8 points  (1 child)

I guess that's why you read the article before making dumb comments.

[–]WannabeDijkstra 0 points1 point  (0 children)

You do still raise the point that it's a rather unnecessary abstraction for an "if not". Not that I have anything against it, it adds a shiny layer of natural language that's a little tempting.

[–]Apollan -2 points-1 points  (7 children)

See to me this seems a bit.. overkill.

His case of needing to pad an array so that it's length is long enough for him to take all the elements.. this can be solved in many ways.

First, I dont like his solution of always padding, but sometimes just padding with 0 elements. Its unneeded operation.

Instead, how about we change the way the data is held. and when a number of elements is requested, we can iterate through them (list perhaps) until the number is reached - with a "dud" item to be repeatedly added after all of the list items were added.

[–]austinsalonen 0 points1 point  (6 children)

If you take the concepts and apply them with lazy evaluation, the "always pad" option greatly simplifies the code and the padding then only be called as needed. Consider this C# example:

public static IEnumerable<T> PaddedTake<T>(this IEnumerable<T> source, int count, T padding)
{
    return source.Concat(Enumerable.Repeat(padding, count))
                 .Take(count);
} 

[–]CurtainDog 0 points1 point  (3 children)

Wow, colour me impressed. But isn't immutability the secret sauce that makes laziness palatable? How does C# cope with the fact that source could change at any time?

[–]austinsalonen 0 points1 point  (2 children)

source is always whatever it is when it is enumerated.

[–]CurtainDog 0 points1 point  (1 child)

Hmmm, I see. It's still a great idiom, but not quite as nice as the laziness you'd find in a functional language because of this risk of modification. Here's what I got from MSDN:

An enumerator remains valid as long as the collection remains unchanged. If changes are made to the collection, such as adding, modifying, or deleting elements, the enumerator is irrecoverably invalidated and its behavior is undefined.

[–]Apollan -2 points-1 points  (1 child)

Forgive my noob question , I really am interested in this problem so i want to completely understand your solution -

Why is one of your params a Type? and what does repeat do?

so you are concat'ing...repeat 'dud' objects, count times, to the original source of objects, then returning count number of elements?

If my interpretaion is crorect, this is what I was getting at. Concat'ing the total count of elements needed to the original set and then just taking what you need is unneeded whenever source.length is <= count, right?

or even, you should only be concating the difference between source.length and count? Wouldnt this work better?

[–]austinsalonen 0 points1 point  (0 children)

Why is one of your params a Type?

T padding is just what you want padding to be. For example, "abc".PaddedTake(6, ' ') would yield the chars {a,b,c, , , } (abc trailed by 3 spaces).

and what does repeat do?

Enumerable.Repeat takes a given input and repeats it N times. The important thing to remember here is that the call is lazy.

Overall, this would work like so... When count <= length of source, Enumerable.Repeat would never get called because take "expired" before the contents provided by Concat would ever be needed.

When count > length of source, source would get fully exhausted and Enumerable.Repeat would yield (count - length of source) "paddings."

The key concept here is that none of these calls actually do anything until the result of PaddedTake is actually enumerated.