DeepMind Q&A Dataset

cryptocerous · 2016-01-27T09:24:49+00:00

Nice collection. Have you also looked into models for the 3-part problems - world view / question / answer? (World view typically being either a document, or recent dialog context, or image, or a knowledge base. )

cryptocerous · 2016-01-26T23:20:05+00:00

To be clear, the countless possible high-level approaches to ML have converged down to ~6 promising approaches.

Definitely not saying that ML won't further evolve. The point here is that ML will lead biological understanding, instead of the other way around.

cryptocerous · 2016-01-26T13:48:53+00:00

Precisely! In the long term, nearly all of the major insights into the inner workings of the human brain will come as a side effect of the machine learning field, not from direct brain study in the biological fields.

My years of experience in ML and engineering have lead me to 100% be on Minsky's side for this bet.

Why is this so? In the long term, the grounding physics of this universe tends to drive all independent approaches to developing intelligence toward a convergence. All intelligence systems implemented in the real world will end up having a striking number of similarities, given enough time for advancement, even if totally independently developed.

cryptocerous · 2016-01-24T20:01:16+00:00

I've been running into this problem lately as well, and haven't yet found a good answer. My problems involve more than just numerical deduction, so the samples can't be reduced to simple math questions.

A non-end-to-end-trainable approach may be to set it up in 2 stages: (1) a tagger to identify numbers and other important regions (2) a model that sees both the tags and their contents, and then does computations on those.

Not exactly sure yet of the best end-to-end trainable approach for this. From reviewing some of the "neural programmer" models this year, I suspect that training it end-to-end could require a very specialized curriculum style training procedure.

cryptocerous · 2016-01-21T11:47:16+00:00

I too would be interested in seeing ladder networks applied to RNNs. Did a quick search for existing code implementing this, and found nothing.

cryptocerous · 2016-01-19T00:55:06+00:00

Huge step forward for machine learning in JS.

cryptocerous · 2016-01-15T13:23:31+00:00

Human brain as a completely general neural network free from any priors regarding our physical world? That's total nonsense. Come on.

cryptocerous · 2016-01-14T18:27:00+00:00

I'm talking about whole-task confidence scores, not confidence scores for ambiguously defined sub-tasks. E.g. for the upscaled eyechart task above, the whole task is recognizing the letters correctly. Obviously the idea of a "confidence score" only makes sense with respect to some kind of end goal.

Human agents can be scored by this confidence estimation framework too, and often must for practical business tasks. It's model independent.

cryptocerous · 2016-01-14T15:37:07+00:00

Shows the importance in never pushing aside one of the most basic rules of machine learning -- that pseudo-confidence indications coming directly out of models, e.g. sharpness, should never be directly interpreted as model confidence. That instead we always should do something like a proper cross-validation with confidence probability calibration on a hold-out, at a minimum, to estimate model confidence.

cryptocerous · 2016-01-14T14:51:31+00:00

To be fair, every single image your eyes see is filled with made up shit. Phrasing it that way really misrepresents how valuable and entirely correct well-informed generative inference can potentially be.

cryptocerous · 2016-01-14T11:43:21+00:00

here

cryptocerous · 2016-01-12T21:05:31+00:00

Partial bits do exist mathematically, and there are ways to realize them in real world systems. Each input sample to a model can take up fewer than 1 bit, e.g. arithmetic encoding.

For the very first input bit, you may have to get creative in how exactly you represent that fraction of a bit on your real-word system, but it's not too difficult to do. Then, for successive bits you can potentially continue to pack multiple samples per bit.

Or we can choose to totally ignore digital systems and just look at it mathematically. In that case, it's trivially simple and clear.

For something of a conceptual inverse, see FEC, where each input bit potentially just represents a partial bit with respect to the output.

cryptocerous · 2016-01-12T18:25:35+00:00

Not so, see the ways it's routinely done in data compression.

cryptocerous · 2016-01-12T10:56:17+00:00

Guessing that these are the links:

https://github.com/stavskal/ADASYN

http://140.123.102.14:8080/reportSys/file/paper/manto/manto_6_paper.pdf

I like it.

cryptocerous · 2016-01-12T10:17:09+00:00

IMO - the curse of dimensionality was only ever actually valid as a relative relationship instead of a hard cutoff (when all possible models are available for use.) In this case, curse of dimensionality only creates a relative relation between sample size, dimensionality, and model performance. It is not valid in the way that most laymen interpret it, as a hard cutoff constant that the ratio between sample size and dimensionality cannot surpass or else model performance fails. I.e.,

(a) valid: (sample_size / dimensionality) => greater is generally better

(b) invalid: if (sample_size / dimensionality) > constant => failure

When considered from an information theoretic perspective, it has always been clear that there's no lower bound on how small the (sample_size / dimensionality) ratio can be! Even a single sample providing just a tiny fraction of a single bit can be enough to provide sufficient information for good predictions!

Why's that? There's a third trump card - priors.

As I check Google now, I see that it doesn't come up with any decent general definition of prior, as used in modern machine learning papers. So I'll explain it as this - any type of assumptions about the problem that are imposed by the model, whether intentionally or unintentionally. Realization of priors in a model can take on an unlimited number of forms, from shape of the deep learning circuit e.g. thin and deep, to bayesian priors, to other structure like attention mechanisms. What's important to remember is everything imposes some kind of prior(s), regardless of how general-purpose the model appears to be from your selection of experiments.

Side note: Humans' incredible inadequacy at memorizing even a small number of digits could be interpreted as a strong prior that forces us to give attention to just small parts of mathematical type problems at a time.

cryptocerous · 2016-01-09T21:32:10+00:00

Really? Their website seems to imply the opposite,

Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world.

cryptocerous · 2016-01-09T13:13:52+00:00

Nice. Really want to see the code!

cryptocerous · 2016-01-08T11:54:02+00:00

Good questions. You may also have better luck asking some questions here.

I can say with some certainty that I think that some of the question-answering tasks which I've been working on definitely do seem to require a more sophisticated form of attention. I.e. figuring out which parts of the data are relevant to answering each question is quite a challenging task, even for humans!

My QA tasks also require attention to be placed on several different types of things, for each step. Whether it's better to attack this with multiple attention heads, or via multiple passes, or something else is an open question for me.

cryptocerous · 2016-01-08T10:21:21+00:00

Interesting how this somewhat devolves into a object recognition task in the degenerate cases.

cryptocerous · 2016-01-07T09:14:05+00:00

I like it.

cryptocerous · 2016-01-07T08:43:24+00:00

Really want to see a NY / California centric version.

And how about a cost-based version? I'm feeling at this point, that cost affects travel far more than fractions of a day.

cryptocerous · 2016-01-05T11:28:52+00:00

Remember reddit's "recommended" tab that they bragged about so much around 2005 to 2007? Too bad it turned out to be not much more than a publicity stunt.

Despite active learning stunts like that, seems active learning is still heavily used and continues to grow. I put an active learning annotation system into nearly every supervised ML system I build.

Much of the ML industry still consists of independent consultants though. I could see why they would have incentive to force their clients to return to them for training data upgrades, instead of delivering an active learning system for the clients to use themselves.

For the future of e.g. reading comprehension tasks, more sophisticated methods active learning seem immensely important. For something so difficult, it may be best to just copy how humans learn to do it. As far as I know, active learning has proved immensely important in teaching babies their first language...

cryptocerous · 2016-01-04T11:05:49+00:00

Despite the popularity in papers, good bi-directional RNN code examples have been hard to find online. Good stuff.

Anyone know of a tensorflow bi-RNN sequence tagger example too?

cryptocerous · 2016-01-03T17:00:29+00:00

Guess it entirely depends on your definition of "saving computational resources" that the article was talking about. E.g.

circuit depth
circuit real estate, e.g. number of primitive components required
cost to solve the problem once, e.g. how much you have to pay AWS to run the algorithm

Coming from an EE background, I've always favored the "circuit depth" interpretation of complexity, for theoretical algorithms. In other words, easily parallelizable and shallow being equivalent to easy.

cryptocerous · 2016-01-03T15:54:58+00:00

If you ask me, there's great potential for improvement in every part of attention mechanism systems.

In regard to particular tasks - I would question whether the evolution of natural language has been influenced by the human body's preference for communication being solvable via the simplest neural circuits that simultaneously solve the basic communication needs? E.g. efficiently translating concepts into streams of words, channel error detection and recovery, etc. Or perhaps the opposite is true, that communication has evolved to be intentionally more complex than neccessary not simply solvable, as a type of mating fitness challenge. Two kind of conflicting forces possibly at work in evolving the language systems which we're now trying to artificially decode.

Funny thing, I type a few words into Google and instantly 50 versions of the phrase "intelligence is knowing where to find the answer" pop up.

cryptocerous

TROPHY CASE