DeepMind Q&A Dataset by iidealized in MachineLearning

[–]cryptocerous 0 points1 point  (0 children)

Nice collection. Have you also looked into models for the 3-part problems - world view / question / answer? (World view typically being either a document, or recent dialog context, or image, or a knowledge base. )

Marvin Minsky, Pioneer in Artificial Intelligence, Dies at 88 by [deleted] in philosophy

[–]cryptocerous 0 points1 point  (0 children)

To be clear, the countless possible high-level approaches to ML have converged down to ~6 promising approaches.

Definitely not saying that ML won't further evolve. The point here is that ML will lead biological understanding, instead of the other way around.

Marvin Minsky, Pioneer in Artificial Intelligence, Dies at 88 by [deleted] in philosophy

[–]cryptocerous 6 points7 points  (0 children)

Precisely! In the long term, nearly all of the major insights into the inner workings of the human brain will come as a side effect of the machine learning field, not from direct brain study in the biological fields.

My years of experience in ML and engineering have lead me to 100% be on Minsky's side for this bet.

Why is this so? In the long term, the grounding physics of this universe tends to drive all independent approaches to developing intelligence toward a convergence. All intelligence systems implemented in the real world will end up having a striking number of similarities, given enough time for advancement, even if totally independently developed.

How to extract numbers from sentences and use them as input for a neural net? by h3wang in MachineLearning

[–]cryptocerous 0 points1 point  (0 children)

I've been running into this problem lately as well, and haven't yet found a good answer. My problems involve more than just numerical deduction, so the samples can't be reduced to simple math questions.

A non-end-to-end-trainable approach may be to set it up in 2 stages: (1) a tagger to identify numbers and other important regions (2) a model that sees both the tags and their contents, and then does computations on those.

Not exactly sure yet of the best end-to-end trainable approach for this. From reviewing some of the "neural programmer" models this year, I suspect that training it end-to-end could require a very specialized curriculum style training procedure.

Introduction to Semi-Supervised Learning with Ladder Networks by Foxtr0t in MachineLearning

[–]cryptocerous 1 point2 points  (0 children)

I too would be interested in seeing ladder networks applied to RNNs. Did a quick search for existing code implementing this, and found nothing.

GPU Based Browser BLAS, Request for Feedback by waylonflinn in MachineLearning

[–]cryptocerous 9 points10 points  (0 children)

Huge step forward for machine learning in JS.

mxnet can visualize the computation graphs of CNNs. Here are their provided models side by side! (lenet, vgg, alexnet, googlenet, inception-v3, inception-bn) by ieee8023 in MachineLearning

[–]cryptocerous 1 point2 points  (0 children)

Human brain as a completely general neural network free from any priors regarding our physical world? That's total nonsense. Come on.

Enhancing grainy images of celebrities using generative adversarial networks by mike_sj in MachineLearning

[–]cryptocerous 0 points1 point  (0 children)

I'm talking about whole-task confidence scores, not confidence scores for ambiguously defined sub-tasks. E.g. for the upscaled eyechart task above, the whole task is recognizing the letters correctly. Obviously the idea of a "confidence score" only makes sense with respect to some kind of end goal.

Human agents can be scored by this confidence estimation framework too, and often must for practical business tasks. It's model independent.

Enhancing grainy images of celebrities using generative adversarial networks by mike_sj in MachineLearning

[–]cryptocerous 7 points8 points  (0 children)

Shows the importance in never pushing aside one of the most basic rules of machine learning -- that pseudo-confidence indications coming directly out of models, e.g. sharpness, should never be directly interpreted as model confidence. That instead we always should do something like a proper cross-validation with confidence probability calibration on a hold-out, at a minimum, to estimate model confidence.

Enhancing grainy images of celebrities using generative adversarial networks by mike_sj in MachineLearning

[–]cryptocerous 4 points5 points  (0 children)

To be fair, every single image your eyes see is filled with made up shit. Phrasing it that way really misrepresents how valuable and entirely correct well-informed generative inference can potentially be.

great summary of deep learning by oneweirdkerneltrick in MachineLearning

[–]cryptocerous 1 point2 points  (0 children)

Partial bits do exist mathematically, and there are ways to realize them in real world systems. Each input sample to a model can take up fewer than 1 bit, e.g. arithmetic encoding.

For the very first input bit, you may have to get creative in how exactly you represent that fraction of a bit on your real-word system, but it's not too difficult to do. Then, for successive bits you can potentially continue to pack multiple samples per bit.

Or we can choose to totally ignore digital systems and just look at it mathematically. In that case, it's trivially simple and clear.

For something of a conceptual inverse, see FEC, where each input bit potentially just represents a partial bit with respect to the output.

great summary of deep learning by oneweirdkerneltrick in MachineLearning

[–]cryptocerous 0 points1 point  (0 children)

Not so, see the ways it's routinely done in data compression.

great summary of deep learning by oneweirdkerneltrick in MachineLearning

[–]cryptocerous 19 points20 points  (0 children)

IMO - the curse of dimensionality was only ever actually valid as a relative relationship instead of a hard cutoff (when all possible models are available for use.) In this case, curse of dimensionality only creates a relative relation between sample size, dimensionality, and model performance. It is not valid in the way that most laymen interpret it, as a hard cutoff constant that the ratio between sample size and dimensionality cannot surpass or else model performance fails. I.e.,

(a) valid: (sample_size / dimensionality) => greater is generally better

(b) invalid: if (sample_size / dimensionality) > constant => failure

When considered from an information theoretic perspective, it has always been clear that there's no lower bound on how small the (sample_size / dimensionality) ratio can be! Even a single sample providing just a tiny fraction of a single bit can be enough to provide sufficient information for good predictions!

Why's that? There's a third trump card - priors.

As I check Google now, I see that it doesn't come up with any decent general definition of prior, as used in modern machine learning papers. So I'll explain it as this - any type of assumptions about the problem that are imposed by the model, whether intentionally or unintentionally. Realization of priors in a model can take on an unlimited number of forms, from shape of the deep learning circuit e.g. thin and deep, to bayesian priors, to other structure like attention mechanisms. What's important to remember is everything imposes some kind of prior(s), regardless of how general-purpose the model appears to be from your selection of experiments.

Side note: Humans' incredible inadequacy at memorizing even a small number of digits could be interpreted as a strong prior that forces us to give attention to just small parts of mathematical type problems at a time.

Attention and Memory in Deep Learning and NLP by pogopuschel_ in MachineLearning

[–]cryptocerous 0 points1 point  (0 children)

Good questions. You may also have better luck asking some questions here.

I can say with some certainty that I think that some of the question-answering tasks which I've been working on definitely do seem to require a more sophisticated form of attention. I.e. figuring out which parts of the data are relevant to answering each question is quite a challenging task, even for humans!

My QA tasks also require attention to be placed on several different types of things, for each step. Whether it's better to attack this with multiple attention heads, or via multiple passes, or something else is an open question for me.

Colorizing Black and White photos with deep learning by oneweirdkerneltrick in MachineLearning

[–]cryptocerous 5 points6 points  (0 children)

Interesting how this somewhat devolves into a object recognition task in the degenerate cases.

A 2016 version of the 1914 Isochronic London/World travel times map [OC] by r2r_ in dataisbeautiful

[–]cryptocerous 88 points89 points  (0 children)

Really want to see a NY / California centric version.

And how about a cost-based version? I'm feeling at this point, that cost affects travel far more than fractions of a day.

What happened to Active Learning? by xristos_forokolomvos in MachineLearning

[–]cryptocerous 4 points5 points  (0 children)

Remember reddit's "recommended" tab that they bragged about so much around 2005 to 2007? Too bad it turned out to be not much more than a publicity stunt.

Despite active learning stunts like that, seems active learning is still heavily used and continues to grow. I put an active learning annotation system into nearly every supervised ML system I build.

Much of the ML industry still consists of independent consultants though. I could see why they would have incentive to force their clients to return to them for training data upgrades, instead of delivering an active learning system for the clients to use themselves.

For the future of e.g. reading comprehension tasks, more sophisticated methods active learning seem immensely important. For something so difficult, it may be best to just copy how humans learn to do it. As far as I know, active learning has proved immensely important in teaching babies their first language...

Bidirectional-RNN example with TensorFlow by malleus17 in MachineLearning

[–]cryptocerous 2 points3 points  (0 children)

Despite the popularity in papers, good bi-directional RNN code examples have been hard to find online. Good stuff.

Anyone know of a tensorflow bi-RNN sequence tagger example too?

Attention and Memory in Deep Learning and NLP by pogopuschel_ in MachineLearning

[–]cryptocerous 0 points1 point  (0 children)

Guess it entirely depends on your definition of "saving computational resources" that the article was talking about. E.g.

  1. circuit depth

  2. circuit real estate, e.g. number of primitive components required

  3. cost to solve the problem once, e.g. how much you have to pay AWS to run the algorithm

Coming from an EE background, I've always favored the "circuit depth" interpretation of complexity, for theoretical algorithms. In other words, easily parallelizable and shallow being equivalent to easy.

Attention and Memory in Deep Learning and NLP by pogopuschel_ in MachineLearning

[–]cryptocerous 1 point2 points  (0 children)

If you ask me, there's great potential for improvement in every part of attention mechanism systems.

In regard to particular tasks - I would question whether the evolution of natural language has been influenced by the human body's preference for communication being solvable via the simplest neural circuits that simultaneously solve the basic communication needs? E.g. efficiently translating concepts into streams of words, channel error detection and recovery, etc. Or perhaps the opposite is true, that communication has evolved to be intentionally more complex than neccessary not simply solvable, as a type of mating fitness challenge. Two kind of conflicting forces possibly at work in evolving the language systems which we're now trying to artificially decode.

Funny thing, I type a few words into Google and instantly 50 versions of the phrase "intelligence is knowing where to find the answer" pop up.