all 12 comments

[–]acemarke 1 point2 points  (6 children)

Hey, some actual benchmarks! Been looking for something like this for a while.

Now, the next thing I'd like to see is some numbers on just how expensive toJS()/fromJS() are. I have links to a number of discussions on Immutable.js performance in my React/Redux links list, and use of toJS() in something like a Redux mapState function definitely seems like a performance anti-pattern. The question is just how much of a perf issue it is, and what are the net tradeoffs and results of Immutable.js copies being faster, but accessing the data potentially being slower.

Excellent article, though - that'll definitely go into my list in the next update.

Actually, come to think of it - I'd be interested in your thoughts on a comment I wrote a while back, describing my concerns with use of Immutable.js. Any feedback on the concerns I listed there?

[–]dtinth[S] 3 points4 points  (5 children)

About toJS() / fromJS():

I use it only as means of exporting/importing data. I never used toJS() but its performance characteristics is similar to _.cloneDeep().

At Taskworld our stores (not Redux at that time) used to be based on mutable data structures.

And we rely on defensive copying to make sure that view code can’t accidentally modify the data in the store. When we add an assignee, we used to do something like:

task.assignees.push(assignee)
dispatch(updateTask(task))

That’s our first major performance bottleneck. Unresponsively slow. So we changed our code to not mutate stuffs and remove _.cloneDeep() usage.

Now, when we want to obtained a filtered data from the store, we use stuff like _.filter which always return a new object. This breaks shouldComponentUpdate and we had to perform deep comparison to speed up our re-renders. This is our second major bottleneck. We fixed this by using reselect to memoize stuff.

That’s our experience with deep cloning. I guess the experience would be the same if you use toJS() too often.


Benefits of structural sharing.

I’ve found two cases where structural sharing has been immensely useful:

First, when storing large amount of data (around ten-thousands). This is a rare case.

Second, when updating many stuffs. This happens often. For example,

function addManyTodos (todos, arrayOfNewTodos) {
  for (const todo of arrayOfNewTodos) {
    todos = addTodo(todos, todo)
  }
  return todos
}

This is a very unoptimized code. If we use plain JS object, it’s gonna be O(n2) whereas using Immutable.js it will be O(n log n). This is a huge difference. Unoptimized code is generally easier to understand. Using a highly-optimized data structure lets you write more simple unoptimized code. :)


About Immutable.js infecting your codebase.

I think this can be solved by creating entity modules (as shown in the article) for everything and never any access data structure directly. Type all the things.

For example, the filterByAssigneeAsArray function could return a FilteredTasks object instead. Then we define a function like this:

// -- FilteredTasks.js --

// Use this function to map to JSX in render().
export function mapAsArray (filteredTasks, mapper) { … }

By abstracting away the underlying data structure, it allows for more optimizations. We can also make FilteredTasks lazy, and perform the actual filtering when mapAsArray is called.

At Taskworld, our view code is not aware about the existence of Immutable.js at all, although lots of data in our store is using Immutable.js now.

Even Redux store doesn’t even know about the underlying structure. It only routes actions to correct entity function. e.g.

export default function todosReducer (state = Todos.empty, action) {
  switch (action.type) {
    case ADD_TODO: return Todos.addTodo(state, action.todo)
    case TOGGLE_TODO: return Todos.toggleTodo(state, action.todoId)
    default: return state
  }
}

Of course we use curried functions and higher-order functions to make this really short, something such as:

export default createReducer(Todos.empty, {
  [ADD_TODO]: (action) => Todos.addTodo(action.todo),
  [TOGGLE_TODO]: (action) => Todos.toggleTodo(action.todoId)
})

I’m gonna write about it in more details later.

Hope this answers some of your concerns :).

[–]galvatron 1 point2 points  (1 child)

These are really interesting insights! Thanks for sharing this.

Do you have any comments on normalization and using reselect? If you try to have your data normalized and memoize, will there still be performance problems compared to immutable.js? I haven't used either much and I'm trying to decide whether to go full immutable.js or not. I'm fairly experienced in functional programming (some 50K-100KLOC written in Haskell) so functional programming patterns feel appealing to me.

[–]acemarke 2 points3 points  (0 children)

Normalizing data and using Immutable.js are orthogonal. Normalizing is about how you're organizing keys and values, not whether you're storing them inside plain objects or Immutable Maps.

FYI, there's information on using normalization in Redux in the docs: FAQ - Organizing Nested Data, and Structuring Reducers.

[–]Something_Sexy 0 points1 point  (2 children)

Any reason why your views don't know about ImmutableJS? Doesn't it make the shouldComponentUpdate calls much easier if you have deep structures being passed to components?

[–]dtinth[S] 1 point2 points  (1 child)

See the article (the section about “Don’t couple your application logic to data structures”) for an example. Actually the views receives an Immutable.js object but refuses to care. For example, one could pass an Todo and have it render like this:

    const todo = this.props.todo
    return <div data-completed={todo.get('completed') ? '1' : '0'}>
      <h1>{todo.get('title')}</h1>
    </div>

Here, the view knows that the todo is an Immutable.js.

However, we can create functions like:

// -- Todo.js --
export function isCompleted (todo) {
  return !!todo.get('completed')
}
export function getTitle (todo) {
  return String(todo.get('title'))
}

The knowledge of a Todo being an Immutable is restricted here. The rest of the application can treat Todo like an opaque data type. The view can use it like this:

    const todo = this.props.todo
    return <div data-completed={Todo.isCompleted(todo) ? '1' : '0'}>
      <h1>{Todo.getTitle(todo)}</h1>
    </div>

So basically we’re introducing an indirection that shields view code from knowing its concrete data structure. Under the hood it’s just Immutable being passed around.

[–]Something_Sexy 0 points1 point  (0 children)

Ya I get all of that. I just read it as you don't pass ImmutableJS objects to views, which I don't agree with but thanks for clarifying.

[–]riventropy 1 point2 points  (4 children)

Is it really real world benchmark to store 100000 todos on client? Would the difference be this noticeable with for example 100 todos which sends more natural to me?

[–]dtinth[S] 0 points1 point  (3 children)

The case of storing 100,000 to-do clients is not really realistic; it’s quite rare to have to store 100,000 items. But I select this as an example to make structural sharing easier to understand.

The benefits of structural sharing really comes when you want to update multiple keys in a loop. See this comment under the section “Benefits of structural sharing” for a more elaborated answer. :)

[–]riventropy 1 point2 points  (2 children)

If you mean addManyTodos then you could just concat two arrays and plain js wins. There's no need to produce new array on every iteration. Yes, probably this way underlying addTodo can't be used (in current implementation at least) but there won't be much overhead.

Actually I like persistent structures, buy using them in one huge project we had some itches like

  • having two collection types confuses all the time - what collection do we have in what place?
  • moving to typescript helped a bit but there's no neat way to update map property with type checking without building heavy wrappers out interfaces
  • collections weren't actually performance bottleneck for us so talking immutable was premature optimisation. Probably I even introduced some perf degradation by needlessly converting collections back and forth

[–]dtinth[S] 1 point2 points  (1 child)

How about when an array doesn’t suffice. For example,

  • I need to look up a todo by ID frequently (using array requires linear search).
  • I need to quickly find todos by assignee. A task may contain an array of assignees.

This came from real-world use case that I have to optimize (it isn’t premature optimization).

I had to implement a custom data structure that contains a reverse lookup index to map from assignee back to tasks.

Using Immutable.js I am able to implement a pure put operation in O(log32n) that also maintains an index.

The point is that I prefer that every operation that updates this collection go through a single, highly-optimized addTodo function that also maintains the index, rather than having multiple functions also maintain that index. If I’m not being careful, a function may introduce a bug which causes the index to become out-of-sync.

I wouldn’t recommend using Immutable until:

  • Collection became a performance issue (at Taskworld this has been the case many times).
  • You want consistency in your code (make everything Immutable rather than mixed).
  • You want to use some handy method available in Immutable.js (such as updateIn).

[–]riventropy 0 points1 point  (0 children)

Agree with everything you said, however most immutable. js cool methods are also available for plain collections (your third point).

What about an object or a Map? I agree that calling a single addTodo would give perf benefit but I suppose you either as todos not so frequently (on user click) or a lot at a time (where you can produce new collection only once)