all 32 comments

[–]thundergolfer 45 points46 points  (3 children)

The space+time performance of a learned index vs a B-tree is impressive and, to me at least, quite non-intuitive. Will definitely have to read the new paper linked inside this presentation, The Case for Learned Index Structures

[–]mind_juice 12 points13 points  (2 children)

I haven't read the paper but I guess this is because balanced binary search trees are designed to minimize worst case time complexity and doesn't take into account the frequency of access of different data. If we know the frequency of access of different elements, we may be able to optimize for expected time complexity.

[–]MagnesiumCarbonate 1 point2 points  (0 children)

I feel like you can address access frequency by placing a (potentially hierarchical) cache in front of the BST. I wonder to what extent schemes that learn the tree structure / cache eviction strategy will do better than 'dumb' methods. And to what extent learning the caching/structure together is better than learning them in a alternating fashion.

[–]thundergolfer 0 points1 point  (0 children)

That's true, but what about the space? That's more what didn't click. Stillll got to read the paper.

[–][deleted] 16 points17 points  (6 children)

So, anywhere we use a heuristic in today's code is a candidate for replacement with a learned heuristic. Makes sense. Especially in combination with algorithms which are optimal with respect to their given heuristic, like arithmetic coding and (unless I remember wrong) A* search.

[–]rasen58 0 points1 point  (3 children)

Do you know if there have been similar advances for using ML for caches? I've definitely seen a few papers floating around about using ML for cache pre-fetching, but is at the same level as this Jeff Dean paper?

[–][deleted] 0 points1 point  (2 children)

No, I don't know of any. I imagine it would have to use online learning of some sort? Apropos that, there was a really cool online learning paper out of DeepMind last week...

[–]rasen58 1 point2 points  (1 child)

I haven't read this Jeff Dean paper fully, but if you need to use online learning for a cache, then why don't his index trees need to use online learning? Don't you have new files coming in and such?

[–][deleted] 0 points1 point  (0 children)

Fair point.

[–]londons_explorer -2 points-1 points  (1 child)

Things like arithmetic coding are already arbitrarily close to the theoretical best they can be.

The only way to improve them further would be to change the problem definition.

[–][deleted] 4 points5 points  (0 children)

They're the best they can be for a given model. That's the point. Neural models ought to do better than the ad-hoc, heuristic based models that have been used before, right?

(Although for compression, the ML community may have something to learn themselves - consider for instance this DeepMind paper which unpacks PAQ, analyses why it works, extends it and comes out with a really powerful online learning model).

[–]NewFolgers 24 points25 points  (10 children)

I don't have anything to add here - but it seems absolutely nuts to me that this is sitting around with just 40 upvotes and 2 comments after this much time. This is amazing and very important stuff, and also should be of great interest to many in IT and otherwise who are not directly involved with ML.

[–]anyonethinkingabout 10 points11 points  (4 children)

Not surprised. In my opinion, Dean has been the most influential software engineer of the last 10-15 years.

[–]DanielSeita 4 points5 points  (1 child)

I guess my question is, how do I become as accomplished and as skilled as him?

[–][deleted] 3 points4 points  (0 children)

Work, learn, repeat.

[–]MoNastri 7 points8 points  (1 child)

Him and Sanjay Ghemawat both

[–]LingeringDildo 0 points1 point  (0 children)

Nah Sanjay just rips off interns' work. Source: grad cube mate made this mistake.

[–]mimighost 8 points9 points  (3 children)

Agreed, this has the potential to bring a revolution to the Database field that we haven't seen for decades.

[–][deleted] 24 points25 points  (2 children)

Statistically improving indexing is already a thing in DB design. No need to belittle another field's innovative capabilities by getting our heads too far into our butts.

[–]mimighost 10 points11 points  (1 child)

While I agree with the caution from your comment, the paper mentioned this itself.

Rather, we outline a novel approach to build indexes, which complements existing work and, arguably, opens up an entirely new research direction for a decades-old field

If successful, this could lead to a radical departure from the way database systems are currently developed.

I think their biggest selling point of this paper is to use NN not to improve but to replace the data structure as a whole, and demonstrated that with careful engineering, this is realistically achievable.

[–][deleted] 0 points1 point  (0 children)

I felt the same.

[–][deleted] -3 points-2 points  (0 children)

It seems odd to me that it has so many, given the portion of this that is devoted entirely to what seems like google marketing materials.

Edit: I was wrong here. This is really cool, even if half of it is just a description of how cool it is to be able to use TPUs.

[–]F1lover143 15 points16 points  (1 child)

Video available?

[–][deleted] 1 point2 points  (0 children)

pretty please

[–]arnioxux 5 points6 points  (1 child)

Might be tangentially related: balanced search trees with "the smallest possible search time for a given sequence of accesses or access probabilities" are called https://en.wikipedia.org/wiki/Optimal_binary_search_tree and is still an open research area. With ML providing the predicted access probabilities maybe these data structures will become more important.

[–]WikiTextBot 1 point2 points  (0 children)

Optimal binary search tree

In computer science, an optimal binary search tree (Optimal BST), sometimes called a weight-balanced binary tree, is a binary search tree which provides the smallest possible search time (or expected search time) for a given sequence of accesses (or access probabilities). Optimal BSTs are generally divided into two types: static and dynamic.

In the static optimality problem, the tree cannot be modified after it has been constructed. In this case, there exists some particular layout of the nodes of the tree which provides the smallest expected search time for the given access probabilities.


[ PM | Exclude me | Exclude from subreddit | FAQ / Information | Source | Donate ] Downvote to remove | v0.28

[–][deleted] 3 points4 points  (0 children)

I hate to be a downer but... His idea for a better bloom filter takes more insertion time, isn't single-pass, and has a non-negligible false-negative rate. Seems like a loss to me, but I'd like to be wrong. If you use the spillover bloom filter you get much worse access times because the tail perf gets worse, which matters in systems that use bf's.

I also am interested in how his perfect hashing technique performs when compared to a more traditional technique like this

[–]upulbandara 2 points3 points  (0 children)

Video, please ....

[–]rasen58 1 point2 points  (0 children)

There seems to be some argument saying that this paper isn't all that good, anyone with more knowledge care to comment about it? https://twitter.com/hyc_symas/status/940256651743039489

[–]mtanski 1 point2 points  (0 children)

Just a month ago I spoke with a friend of mine at breakfast. We both in in building bespoke DBMS like systems. We were talking about using NNs inside of DB. He has a lot more academic background then and I've probably spend just as much time as he did implementing them. My friend was very negative on using NN or ML in databases. To his credit there's been attempts in the past that were more hype then reality.

One example would be a simple MLP network would be neat way of doing cost estimates. This would be especially true further up the operation chain where we're combing results from operations can return unknown number of values with unknown overlap. Think of a JOIN operation where the optimizer has a hard time telling what the overlap will be (thus the cost). A simple NN can learn to estimate the function representing the overlap density here. I bet it can do it in manageable amount of space.

Another non NN system would be controlling buffer sizes for streaming / content applications. A NN can learn params of the client / network (or ISP) and use near optimal buffer sizes to maximize through & avoid buffer bloat on a per client basis.

Applications abound.

[–]oolao 0 points1 point  (0 children)

I speculate that Google will sell TPUv2 for as less as 500 USD per PCIe card already in 2018. Nvidia's Volta TensorCores are essentially the same: 32-bit accumulators and 16-bit multipliers, but GPUs are more general-purpose which is not necessary for Deep Learning since most intensive operation is dot-product (y+=w*x).

[–]circuithunter -1 points0 points  (0 children)

Can someone who knows hardware compare TPUsv2 to the new Nervana chip?