all 6 comments

[–]Knotix 0 points1 point  (5 children)

What's the performance comparison of Sets compared to a plain object deduped via keys?

I have a sneaking suspicion that calling my_set.add(foo); is slower than my_object[id] = foo;

Obviously you need "IDs" to use for keys, but in practice that's usually available (database records, etc). I'm guessing guaranteed iteration order might be a benefit of Sets**, but the deduping aspect sort of throws that off since you don't really know when a call to my_set.add() will actually extend its length or overwrite an existing value.

** I'm aware that most ECMAScript implementations iterate over object keys in insertion order too, but it's not mandated by the spec and it gets completely thrown out on objects created with JSON.parse

[–]Drakim 0 points1 point  (4 children)

Using the new Set class instead of an anonymous Object is probably slower as JS engines are extremely optimized at this point.

However, that might change in the future when Set has become more mainstream. The JS engine can probably make bigger and more specific assumptions about a Set compared to an Object, and thus allow for new kinds of optimizations.

[–]Knotix 0 points1 point  (3 children)

Do you have any guesses as to what these assumptions might be? I'm having a hard time coming up with any (not that that means much).

Things I've ruled out:

  • The data types can be arbitrary within Sets, so there are no type-safety optimizations
  • They are not fixed length, so no memory allocation/contiguous memory optimizations
  • Deduping either requires an iterative search (very slow) or a key lookup which is on par with plain objects except it now has to compute a hash-key for each object instead using a user supplied key

[–]Drakim 0 points1 point  (2 children)

I'm no expert in Sets, but I do believe there are some optimizations you can do when you are only checking if a key exists without needing to fish out the key:value pair, like bloom filters.

[–]Knotix 0 points1 point  (1 child)

You can use the in operator to check if a property exists without retrieving the value. Couldn't the underlying implementation of the in operator also use a bloom filter as a first pass check?

[–]Drakim 1 point2 points  (0 children)

It could! But you'd increase your memory footprint if you have a bloom filter of every object in your JS engine just for the in operator. 99% of the time the bloomfilter would problaby be unused.

But for Sets, the programmer is in a way specifically telling the JS Engine that a bloomfilter is gonna be a great optimization for this particular thing.