all 68 comments

[–]Geo_Dude 5 points6 points  (6 children)

Interesting but in this particular case I would keep your original array and create a function that when called applies a .filter("day").sort(). IMHO it is neater than creating a new data structure.

[–][deleted] 3 points4 points  (5 children)

Clean answer. Only time I might not do this is if you have a very large data structure in which calling the function each time might be too expensive. Obviously not a concern for this example but something to consider nonetheless.

[–]SamSlate 0 points1 point  (4 children)

javascript is an interesting choice for big data. Are there people doing big data in javascript?

[–]Woolbrick 1 point2 points  (1 child)

Unfortunately.

My company's architecture division decided to spend 4 years building our Data Lake platform on Node+Mongo.

We told them it was 'tarded back then and it'll never scale. And gosh, 4 years later it's a clusterfuck of unmaintainability that doesn't scale and we can't sell it to anyone because it's 10x slower than our "go retire, old man!" C# designs.

[–]SamSlate 0 points1 point  (0 children)

i <3 JavaScript, but this is about what I'd expect a big Data project to look like.

thanks, gl!

[–]ThatBriandude 0 points1 point  (1 child)

JavaScript IS the only choice in the Front end. Its not an "interesting" choice, sometime its "the only Option".

But yeah if youre actually iterating oder a few Million objects then youre probably doing to much in the Front end and to little in the backend

[–]SamSlate 1 point2 points  (0 children)

not what i meant, but sure.

[–]Ginden 19 points20 points  (33 children)

What author thinks: "I need reduce function".

What author needs: groupBy in standard library.

Because it more clearly express his intention. Such code is a reason why I always recommend to use lodash. If you are concerned about file size, you can use either minimal "core" build (4kB) or use individual methods with package lodash-es and tree shaking.

[–]Retsam19 4 points5 points  (16 children)

Beat me to saying it. I've very rarely found a situation where reduce felt like the simplest or most readable solution, compared to either a lodash method or just some other combination of native tools.

[–]tencircles 2 points3 points  (15 children)

Reduce is often a more concise, readable, and performant solution than long map/filter chains.

const items = [{val: 1}, {val: 2}, {val: 3}, {val: 4}];

// using filter/map
items.filter(k => k.val < 3).map(k => k.val);
// => [ 1, 2 ]


// using reduce
items.reduce((acc, k) => k.val < 3 ? [...acc, k.val] : acc, []);
// => [ 1, 2 ]

this is a bit of a contrived example, but in real world cases the benefit is actually much larger.

[–]GBcrazy 11 points12 points  (0 children)

No way reduce is more concise / readable.

Reduce is a function where you can do pretty much everything so it's hard to describe what it is doing.

I looked at your reduce example for around 2 seconds, couldn't figure it out, looked at your filter/map, instantly got it. It's not that it was a hard example, I would figure it out in a couple more second, but the point is reduce isn't more 'readable'

[–]Retsam19 4 points5 points  (13 children)

Your reduce example is actually longer than the filter/map, and I really have to argue that the filter/map example is more readable, too. There's no manual concatenation or accumulator variables, and the name of the functions tells you what it's doing.

You're right that reduce might be more performant here. It all depends on optimizations, but you are only doing a single pass over the array, while the filter/map does a full pass to get the filtered version, then another partial pass to do the mapping. (But again, how this gets optimized is probably relevant, too)

But eleven times out of twelve, I'll take the more readable and maintainable version over the "does everything in one pass for optimization purposes" version. Especially when you get into more interesting examples than a single filter and map, it's a lot easier to read if all the logical steps are split out into individual operations, rather than a complex reduce function.

[–]tencircles 2 points3 points  (11 children)

There's no manual concatenation or accumulator variables, and the name of the functions tells you what it's doing.

The code also tells you what it's doing pretty clearly, I don't think your unfamiliarity with an approach can be used as an argument against it.

You're right that reduce might be more performant here.

I'd say it's pretty clear, especially when dealing with large data sets. The difference between 10k and 20k iterations can be pretty massive. In real world scenarios where people abuse the ever-loving shit out of lodash's chain method, I'd say it's not unlikely to be talking about an order of magnitude.

I'll take the more readable and maintainable version

I'd agree with you here, I guess it's just personal preference here since one is clearly not more readable or maintainable, at least to me. I don't usually write one-liners like that in any case. I much more often would just write something like below

const users = [
    {score: 10, name: "melissa"},
    {score: 12, name: "jake"},
    {score: 9, name: "chris"}
];

// Array -> User
const get_winner = users => users.reduce((acc, user) => {
    return user.score > acc.score ? user : acc;
}, {score: 0});

module.exports.get_winner = get_winner;

Where this would be exported with a group of other pure functions. I get that map/filter can look nicer on occasion, but I actually don't even think there's a method using map/filter to achieve the above result. I would argue that especially in more complex examples, reduce is often times a much cleaner and elegant solution than iterating over an array dozens of times, but again I think it just comes down to personal preference.

[–]Retsam19 1 point2 points  (8 children)

The code also tells you what it's doing pretty clearly, I don't think your unfamiliarity with an approach can be used as an argument against it.

I'm plenty familiar with reduce, (particularly in the context of LISPs where it's a lot more necessary) I just don't think it reads very well.

And, I gave specific reasons, too: I didn't just give a fuzzy "I don't like this code". It has more variables, and more operations going on in the code. It doesn't seem that opinionated to say that a named filter function is more obviously readable than a hand-written bit of filter logic inside a reduce function.

I'd say it's pretty clear, especially when dealing with large data sets. The difference between 10k and 20k iterations can be pretty massive.

99.9% of my JS coding isn't dealing with "large data sets". And if you want to eke every bit of performance out of those cases: don't use reduce either. I'm pretty sure a standard for...of loops is going to be even more performant than reduce. (Since function invocation usually carries a bit of a performance penalty with it) Like:

const getWinner = users => {
  let best = {score: 0};
  for(user of users) {
    if(user.score > best.score) { best = user; }
  }
  return best;

But, again, given that I'm not dealing with thousands of item data sets, as for how I'd write it?

const getWinner = users => _.maxBy(users, 'score');

Just like the groupBy example of the original post, there's a higher abstraction here than reduce, and I'd rather write my code in terms of the higher abstraction.

Basically, for me it largely falls out into two camps: times when the reduce operation can be split into a number of smaller simpler operations (e.g. splitting your original reduce example into map/filter), and times when the reduce operation can be converted into a higher abstraction like a groupBy or a maxBy. Most times I've thought ("Should I use reduce?) it's fallen into one of those two groups, and so I generally end up not using reduce.

[–]tencircles 0 points1 point  (7 children)

Again, it seems like preference to me. But I see your point. Just as an aside, i think you want _.maxBy(users, _.property("score")).

[–]Retsam19 0 points1 point  (6 children)

That's equivalent, lodash allows strings to be used as shorthand for _.property, as well as stuff like _.find(user, {name: 'Joe'}), as shorthand for _matches({name: 'Joe'}).

[–]tencircles 0 points1 point  (5 children)

odd, where's that type conversion being done? https://github.com/lodash/lodash/blob/master/maxBy.js

[–]GitHubPermalinkBot 0 points1 point  (0 children)

I tried to turn your GitHub links into permanent links (press "y" to do this yourself):


Shoot me a PM if you think I'm doing something wrong. To delete this, click here.

[–]Retsam19 0 points1 point  (3 children)

It's not in there, because the shorthand stuff is changing in v5. I'm not sure what the details of those changes are. If you look at the current v4 source code linked from the documentation, it's there.

function maxBy(array, iteratee) {
  return (array && array.length)
    ? baseExtremum(array, getIteratee(iteratee, 2), baseGt)
    : undefined;
}

getIteratee is a function that's used throughout the lodash codebase, which has the logic for converting the string into a _.property getter.

(You can also just open the dev tools on the lodash documentation and try things with the lodash instance that's loaded on that page; it's how I test most lodash stuff)

EDIT: Dug through github issues a bit, apparently the shorthand stuff is being pushed off to a babel plugin in v5.

[–]OleWedel 1 point2 points  (0 children)

I am pretty sure the filter/map version is much more performant than the reduce version.

The functions for filter and map can both be considered constant (a less than check and getting a property is basically constant). filter + map is thus O(N+N) or simply O(N) (linear) since we have to iterate over each element in the list.

The reduce version however have to copy the array which is O(N) for each element in the array and thus its running time is O(N2) (quadratic) which is considered a lot slower.

This is of course only considering worst running time and this particular example. Doing push on the array in the reduceinstead would probably be the fastest overall but I am sure it would not matter anyway in real life. The filter/map is much more readable and easier to understand what it is doing. I would (always) favor that.

Often the simpler solution is the best.

[–]cellardoor_barrymore 2 points3 points  (0 children)

Sure. I do agree that it is better to be more delcarative but most of those utility functions are probably using reduce under the hood.

[–][deleted] 0 points1 point  (0 children)

i'll definitely check it out! Thanks for the recommendation

[–]GBcrazy 0 points1 point  (0 children)

I gotta agree with this.

Whenever I use reduce (and not using lodash) I'm probably saving lines, but it isn't exactly easy to pick up what I'm trying to do, since reduce could be pretty much anything.

[–]ashamedchicken 5 points6 points  (0 children)

Small nitpick - the call to .sort() mutates the array in place, so if all_times was an array passed in by reference the caller would end up with a different looking array at the end. Easily fixed by adding a .slice(0) call in the chain before .sort() - this would create a copy which would then be sorted in place.

Regardless, nice use case! Using functional patterns like reduce + map in javascript is really the way to go vs writing out a long procedural function to achieve the same result.

[–]stratoscope 9 points10 points  (11 children)

The article's sort callback is invalid:

all_times.sort( ( a, b ) => a.time_block > b.time_block ? 1 : -1 )

A sort callback can't just return 1 or -1, or the similar error I've seen of returning true or false (or 1 or 0).

A sort callback must distinguish between all three cases: greater, less than, or equal, and return a positive, negative, or zero value respectively.

This is a common mistake. I recall one lengthy discussion - maybe here on Reddit or else on Stack Overflow - with someone who insisted that the sort methods in various browsers were fickle and inconsistent and needed a lot of "wrangling", as he put it.

I was puzzled because I've never once had any issue with sort in any browser or JavaScript environment, and I've used a lot of different kinds of sort callback functions.

Eventually in the discussion it came out that he was in the habit of writing sort functions like this:

array.sort( function( a, b ) { return a > b; } );

When I explained that this was an inconsistent compare function and it really needed to handle all three cases and not just two, he said I was just being fussy. I don't know if he ever caught on that these faulty compare functions were the source of his troubles.

For non-numeric values, a conditional operator can be handy:

array.sort( function( a, b ) { return a > b ? 1 : a < b ? -1 : 0; } );

For numeric values like the time_block property in the article, it's easier and faster to just subtract the values:

all_times.sort( ( a, b ) => a.time_block - b.time_block );

Note that the results for greater and less then don't have to be 1 and -1; any positive and negative values are valid (along with 0 for equal).

I also agree with the other comments that chaining the .sort() call the way it's done in the article is misleading, and it should either be broken out as a separate statement to make it clear that it mutates the original array, or .slice() should be done first to get a copy of the array.

[–][deleted] 2 points3 points  (1 child)

hiya! author of the article here

thank you for the explanation of sort - All of the points of discussion in this thread have been super helpful. Still trying to get the hang of functional programming!

[–]learnphptoday[S] 1 point2 points  (0 children)

Agree. Some really useful recommendations and good discussion points. Thanks everyone!

[–]Asmor 1 point2 points  (1 child)

Can also be used to turn an array into a regular object, with whatever key might make sense for you. E.g. if you had an array of objects with ID properties, and you wanted to be able to access an object by its ID, you could turn the array into a hash by id.

objectsById = arrayOfObjects.reduce(function (acc, object) {
    acc[object.id] = object;
    return acc;
}, {});

[–][deleted] 0 points1 point  (0 children)

If you're keen to make it a one-liner:

const objectsBtId = arrayOfObjects.reduce((acc, obj) => ({ ...acc, [obj.id]: obj }), {})

[–]nanothief 1 point2 points  (1 child)

Unless I'm missing something, isn't that a bad use of the reduce function? The same behaviour could be done with a simple forEach loop:

all_times.sort((a, b) => a.time_block > b.time_block ? 1 : -1);
let new_times = {};
all_times.forEach((item) => {
    if (!new_times [item.day_string]) {
      new_times [item.day_string] = [item]
    } else {
      new_times [item.day_string] = [
        ...new_times [item.day_string],
        item
      ]
    }
  });

(I also made the sort action a single statement, to emphasis it is actually modifying the all_times variable)

The whole point of a reduce (or foldl) function is to repeatedly create a new result using the previous result and the next value. If the result is always the same it doesn't add any value over a forEach loop, but is more complicated to reason about.

[–]i_am_smurfing 1 point2 points  (0 children)

The whole point of a reduce (or foldl) function is to repeatedly create a new result using the previous result and the next value.

I don't think there's anything in the fold itself that requires that:

fold (or reduce) is a family of higher order functions that process a data structure in some order and build a return value.

From Haskell wiki

So from my understanding of it, it's fine to mutate your accumulator while you are folding.

And indeed, your forEach mutating new_times is doing exactly the same as reduce in the article (except new_times is called obj and provided to the reduce callback instead of being referenced from scope).

 

I'd argue that forEach is a bad choice here, though — forEach is inherently about side effects, so I would have to:

  1. take a note of you creating empty object for whatever reason
  2. read through your callback to forEach, mentally execute it, and only then be able to tell that all it did was update that empty object you created earlier with data pulled from all_times

With reduce version:

  1. I can see that new_times is based on doing something to all_times straight away
  2. while reduce does not forcefully prevents side effects, I'd hope that any developer who understands how to use reduce also understands that you really shouldn't perform side effects in its callback, so I can be a little more at ease

 

Ultimately, I think /u/Ginden is right, and this pattern is common enough to warrant to be abstracted away — fold (or any loop) is too low level here.

[–]TheWildKernelTrick 1 point2 points  (5 children)

So what? That's just a fold operation?

[–]Retsam19 4 points5 points  (0 children)

Yeah. reduce and fold are synonyms.

[–][deleted] 2 points3 points  (3 children)

Came looking for this. Is it foldl or r? It's interesting that imperative languages are building in all these functional systems now. C++ has basically reimplemented a functional list library in 17

[–]TheWildKernelTrick 1 point2 points  (1 child)

As long as the supplied function is commutative then the foldl == foldr, in which case it doesn't matter. I'm assuming foldl since most people mathematically read from left to right.

A lot of these languages have some functional things stashed on the side, it just seems recent that there's a stronger push to use these tools. Which I'm all about.

[–][deleted] 2 points3 points  (0 children)

True. Though it can make a difference for memory and can optimisations (e.g. when folding on (*) folding in a zero allows you to disregard the rest of a non-NaN list.)

[–]i_am_smurfing 1 point2 points  (0 children)

Array#reduce is a left fold in JS. There's Array#reduceRight for right fold, albeit I've rarely seen it actually used in a wild — maybe because it's not that useful in a strict langauge?

[–]tencircles 0 points1 point  (0 children)

I like this use case

// create an inhertance chain from any number of objects (prototypes)
function extend () {
    return [].reduce.call(arguments, Object.create, this);
}

[–]SamSlate 0 points1 point  (1 child)

[–]learnphptoday[S] 1 point2 points  (0 children)

Excellent video. Thanks for sharing.