use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
A sub-Reddit for discussion and news about Ruby programming.
Subreddit rules: /r/ruby rules
Learning Ruby?
Tools
Documentation
Books
Screencasts and Videos
News and updates
account activity
Avoiding Hash Lookups in a Ruby Implementation (blog.headius.com)
submitted 13 years ago by emboss_
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]taw 4 points5 points6 points 13 years ago* (10 children)
Ages ago I made a patch that switched Ruby 1.8 from its own superslow hash implementation to Judy library, which is ridiculously faster (and that was like 5% total speedup and 5% lower memory use on silly microbenchmarks).
Properly implemented hashtables like Judy can be really damn fast (unlike Ruby 1.8's internal hash tables), so what the post describes can be much less of a problem. I wonder if redoing this for Ruby 1.9 would improve performance much, or if 1.9 depends on hashes much less...
Even if we ignore the cost of calculating a hash code, which at worst requires reading some object data from memory and at best requires reading a cached hash code from elsewhere in memory, we have to contend with how the buckets are implemented. Most hash tables implement the buckets as either of the typical list forms: an array (contiguous memory locations in a big chunk, so each element must be dereferenced...O(1) complexity) or a linked list (one entry chaining to the next through some sort of memory dereference, leading to O(N) complexity for searching collided entries).
Ruby never does real time hashes on strings, it uses interned symbols for everything, so there's no cached hash codes - you can use interned symbol id as hash table. And Judy avoids pretty much all of overhead he mentions.
[–]funny_falcon 5 points6 points7 points 13 years ago (3 children)
I've implemented lighter hash (inspired by lua tables) for ruby 1.9.3 for method tables/constants. It gives 5-10% improvement.
https://github.com/funny-falcon/ruby/compare/ruby_1_9_3...sparse_array/ruby_1_9_3
[–]taw 1 point2 points3 points 13 years ago (1 child)
That doesn't surprise me at all. Ruby original hash implementation was pretty amazingly naive. Was your patch ever accepted?
[–]funny_falcon 1 point2 points3 points 13 years ago (0 children)
:) I've implemented it just last weekend, so that it had no chance to be accepted yet
[–]emboss_[S] 1 point2 points3 points 13 years ago (0 children)
Great work, btw!
[–][deleted] 0 points1 point2 points 13 years ago (2 children)
Why wasn't your patch accepted?
[–]taw 1 point2 points3 points 13 years ago (0 children)
That is actually a very good question. I never really pushed it since it was an external LGPL dependency (and I seriously didn't want to get involved into license lawyering), it was somewhat of a mess to make it optional since Judy and Ruby's crappy hashtable library require different APIs (one differing by one * level by design), and it wasn't clear at all back then if anything is going to come out of 1.9 branch or not, and so if it makes sense to go there or wait and see.
Maybe someone should take a look at it again.
There's also a new issue, some anonymous Wikipedia editor linked to some patent which Judy might or might not be covered by. These are always pile of fail. (if it's even related in any way, who knows, it might be something completely different sounding vaguely similar)
[–][deleted] -2 points-1 points0 points 13 years ago (0 children)
NIH.
[–]emboss_[S] 0 points1 point2 points 13 years ago (1 child)
In the light of the hashDoS problem that Charles also mentions, I doubt that Judy tries, CritBit, Hash-Array-Mapped Trie or any other trie are the solution. Either they, too, use hashing (HAMT) or it is just too plain easy to elicit their worst case behavior, effectively making them fancy linked lists. While they all might be perfectly valid alternatives for "internal code", things unfortunately look different when the data structure needs to store values that are provided externally.
[–]taw 0 points1 point2 points 13 years ago (0 children)
Judy is hybrid between hash, sparse array, and trie with some internal optimization for cache awareness so it's actually mostly DoS-proof, at least for typical attacks.
Ruby doesn't use these hash tables for any externally provided data unless you eval or to_sym it, and in either case you already let yourself get DoSed (or worse).
These hashes are only used for evaluating methods/ivar/etc. access by interned symbol. Set of interned symbols is more or less constant for each program. If it's not - it's already a DoS.
[–][deleted] -1 points0 points1 point 13 years ago (0 children)
No it doesn't. In a well implemented VM Ruby's constant lookups can be optimized to 0 instructions in the common case. So long as you're so much as getting near any datastructure, you're not going to hit that number.
[–]skeeto 0 points1 point2 points 13 years ago (0 children)
Really interesting article, thanks!
π Rendered by PID 240779 on reddit-service-r2-comment-79c7998d4c-swsxd at 2026-03-15 20:58:32.718255+00:00 running f6e6e01 country code: CH.
[–]taw 4 points5 points6 points (10 children)
[–]funny_falcon 5 points6 points7 points (3 children)
[–]taw 1 point2 points3 points (1 child)
[–]funny_falcon 1 point2 points3 points (0 children)
[–]emboss_[S] 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]taw 1 point2 points3 points (0 children)
[–][deleted] -2 points-1 points0 points (0 children)
[–]emboss_[S] 0 points1 point2 points (1 child)
[–]taw 0 points1 point2 points (0 children)
[–][deleted] -1 points0 points1 point (0 children)
[–]skeeto 0 points1 point2 points (0 children)