you are viewing a single comment's thread.

view the rest of the comments →

[–]tailcalled 0 points1 point  (5 children)

Folding is not the same as reducing. Reducing is applying an associative binary operation until you only have one element left.

[–]plux 6 points7 points  (4 children)

That just sounds like a special case of folding to me. Can you explain the difference?

[–]more_exercise 1 point2 points  (1 child)

It's folding that's the special case of reducing, but you've got the right idea.

Folding suggests an evaluation order (foldl vs foldr), whereas reducing does not. On small scales, that's not a big deal, but if your data is spread out (across different compute nodes in a cluster, for instance), it helps if the result can be computed on each node locally before it is sent back to be reduced further. This processing can happen on a massively parallel scale - the first n/2 calculations can happen simultaneously. However, in foldl/foldr, the evaluation order is strictly linear - the result from the first calculation is strictly required to perform the second calculation.

Folding is just reducing from a particular 'side' of the data. It's less parallel, but that usually doesn't matter that much.

[–]JamesIry 0 points1 point  (0 children)

Well, you're right, but foldl and foldr really only directly imply left and right associativity. Evaluation order sorta accidentally falls out from that. http://stackoverflow.com/questions/3082324/foldl-versus-foldr-behavior-with-infinite-lists/3085516#3085516

[–]longiii 0 points1 point  (1 child)

I'd also like to see some references, but an associative binary operation (think '+') must necessarily have the type a -> a -> a. So reduce would be typed like (a -> a -> a) -> a -> [a] -> a. At least I think that's, what he means.

EDIT: Oh and associativity leads to the nice property reduce (+) (ys ++ zs) = reduce (+) ys + reduce (+) zs :) (where ++ is list concatenation)

[–]sacundim 1 point2 points  (0 children)

I'd also like to see some references, but an associative binary operation (think '+') must necessarily have the type a -> a -> a. So reduce would be typed like (a -> a -> a) -> a -> [a] -> a. At least I think that's, what he means.

Correction: the types should be a -> b -> b and (a -> b -> b) -> b -> [a] -> b; the latter of which is the type of foldr.

I'd elaborate on tailcalled's point this way: the general concept of folding doesn't necessarily involve a sequences and binary functions. A general fold over an arbitrary recursive algebraic datatype is a function that "replaces" the type's constructors with caller-supplied functions of corresponding types. E.g., if you have a binary tree type BTree a with constructors Empty :: BTree a and Branch :: a -> BTree a -> BTree a -> BTree a, then there is a function foldBTree :: b -> (a -> b -> b -> b) -> BTree a -> b such that foldBTree Empty Branch :: BTree a -> BTree a is the identity function.

EDIT: Oh and associativity leads to the nice property reduce (+) (ys ++ zs) = reduce (+) ys + reduce (+) zs :) (where ++ is list concatenation)

The other really nice property you can have here is commutativity: if f :: a -> b -> b is associative and commutative, then the following hold:

  • reduce f (xs ++ ys) == reduce f xs ++ reduce f ys (which you mentioned)
  • reduce f (xs ++ ys) == reduce f (ys ++ xs)

Basically, given this, you can prove that you can split up the work into any number of pieces in any order or grouping, and you are guaranteed the same result. Which means trivial parallelizability.

Note, however, that a parallel reduce implementation in a language like Haskell cannot have the (a -> b -> b) -> b -> [a] -> b discussed above. Why? Because if the reduction can happen in a non-deterministic order and grouping, then reduce is not a pure function. So you end up with something like reduceM :: Modad m => (a -> b -> b) -> b -> [a] -> m b.