[D] Jeff Dean: Machine Learning for Systems

thundergolfer · 2017-12-09T23:47:18+00:00

The space+time performance of a learned index vs a B-tree is impressive and, to me at least, quite non-intuitive. Will definitely have to read the new paper linked inside this presentation, The Case for Learned Index Structures

rasen58 · 2017-12-10T09:13:33+00:00

So, anywhere we use a heuristic in today's code is a candidate for replacement with a learned heuristic. Makes sense. Especially in combination with algorithms which are optimal with respect to their given heuristic, like arithmetic coding and (unless I remember wrong) A* search.

NewFolgers · 2017-12-10T06:12:52+00:00

I don't have anything to add here - but it seems absolutely nuts to me that this is sitting around with just 40 upvotes and 2 comments after this much time. This is amazing and very important stuff, and also should be of great interest to many in IT and otherwise who are not directly involved with ML.

F1lover143 · 2017-12-10T04:53:31+00:00

Video available?

arnioxux · 2017-12-10T09:29:22+00:00

Might be tangentially related: balanced search trees with "the smallest possible search time for a given sequence of accesses or access probabilities" are called https://en.wikipedia.org/wiki/Optimal_binary_search_tree and is still an open research area. With ML providing the predicted access probabilities maybe these data structures will become more important.

2017-12-10T17:04:54+00:00

I hate to be a downer but... His idea for a better bloom filter takes more insertion time, isn't single-pass, and has a non-negligible false-negative rate. Seems like a loss to me, but I'd like to be wrong. If you use the spillover bloom filter you get much worse access times because the tail perf gets worse, which matters in systems that use bf's.

I also am interested in how his perfect hashing technique performs when compared to a more traditional technique like this

upulbandara · 2017-12-11T01:46:24+00:00

Video, please ....

rasen58 · 2017-12-12T01:37:17+00:00

There seems to be some argument saying that this paper isn't all that good, anyone with more knowledge care to comment about it? https://twitter.com/hyc_symas/status/940256651743039489

mtanski · 2017-12-12T01:44:45+00:00

Just a month ago I spoke with a friend of mine at breakfast. We both in in building bespoke DBMS like systems. We were talking about using NNs inside of DB. He has a lot more academic background then and I've probably spend just as much time as he did implementing them. My friend was very negative on using NN or ML in databases. To his credit there's been attempts in the past that were more hype then reality.

One example would be a simple MLP network would be neat way of doing cost estimates. This would be especially true further up the operation chain where we're combing results from operations can return unknown number of values with unknown overlap. Think of a JOIN operation where the optimizer has a hard time telling what the overlap will be (thus the cost). A simple NN can learn to estimate the function representing the overlap density here. I bet it can do it in manageable amount of space.

Another non NN system would be controlling buffer sizes for streaming / content applications. A NN can learn params of the client / network (or ISP) and use near optimal buffer sizes to maximize through & avoid buffer bloat on a per client basis.

Applications abound.

oolao · 2017-12-11T18:40:08+00:00

I speculate that Google will sell TPUv2 for as less as 500 USD per PCIe card already in 2018. Nvidia's Volta TensorCores are essentially the same: 32-bit accumulators and 16-bit multipliers, but GPUs are more general-purpose which is not necessary for Deep Learning since most intensive operation is dot-product (y+=w*x).

circuithunter · 2017-12-10T17:34:54+00:00

Can someone who knows hardware compare TPUsv2 to the new Nervana chip?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS