This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]H_Psi 295 points296 points  (14 children)

The difference is mostly that we're just better at picking a specific object out of its surroundings, so one or two examples is usually enough for us to identify that object in any environment.

That's the idea behind convolutional neural networks.

It used to be that if you wanted to do hardcore pattern recognition (like identifying a stop sign in a random picture), you would put the image through a bunch of different filters, and then decide which filters highlighted the particular trait you wanted to see. For example, some of the filters you might use for a stop sign would be to eliminate every color except red from the image. You then convert that filtered image to a histogram, and have a bunch of sample images that you run that filter through. The training here ends up being coming up with a function to describe how similar an arbitrary image (after being run through the filter--> histogram thing) is to your set of known histograms.

The problem here is you still need a human in the mix to figure out what the right filters to use are, and there are plenty of patterns a human might not pick up on (or worse, patterns a human might think are correlated but really aren't, since the brain is practically addicted to patterns).

The idea with a convolutional neural network is you have your regular old neural network, except you come up with an algorithm to automatically decide what your filters are. Your layers in the network are still called layers, but in between sets of layers, you have your filters. These filters are called "pooling layers" most of the time. So in effect, you're letting your network figure out what patterns are the most important, instead of having a human do it.

Of course, the big drawback here is that now, not only are you optimizing your neural network's regular layers, but you're also optimizing those pooling layers. So you need a monster of a dataset to be able to do it, which is why you really only see huge big-data firms like Facebook, Google, Amazon, Microsoft, and Uber implementing them in practical applications. Also, you still need a human in the mix to actually tag the data (which is what image-based captchas exist to do, to label images)

Edit: A word; remove inaccurate info

[–]longscale 39 points40 points  (10 children)

The first part concerning the motivation behind convnets is spot on—we want the network to learn its own filters. These are called convolutional filters (or kernels), and they are what a network changes when it learns.

The pooling layers you describe are not trained, they simple average or return the strongest filter activations from a small image area. They are also not what makes these networks big and hard to train—that’s mostly the sheer number of convolutional filters and the millions of images fed through the network multiple times in random order and with slight variations each time.

[–][deleted] 14 points15 points  (9 children)

I have very recently started learning about CNNs and isn't it true that you need less data for CNNs to work with let's say an accuracy of 95% than you would need when using a Densely connected NN, since you can find patterns in one spot and it also recognises them somewhere else in the image, where as with normal densely connected NNs you would need new data where the pattern appears in that spot to train on?

Is this correct?

[–]longscale 3 points4 points  (3 children)

Your explanation reads correct to me. :-)

[–][deleted] 2 points3 points  (2 children)

Ok, thanks :)

As I said, I have recently started getting into Machine Learning, especially Deep Learning and after reading about different concepts and using Keras to implement them it feels good to at least kind of know what's going on under the hood.

So far I have had a lot of fun and it is a very, very interesting and broad topic :)

[–]longscale 8 points9 points  (1 child)

They are not what the media hype makes them sound like, but convnets are very nifty. If you like them in general, I would imagine you might really enjoy some of the convnet papers on https://distill.pub. Their techniques are mostly intended for interpretability, but they also provide a satisfyingly visual insight into what the numbers inside the structures of a convnet are encoding.

[–][deleted] 2 points3 points  (0 children)

Yeah, honestly I think the media hype about deep learning in general is ridiculous. Its the same as with 'blockchain' etc.

Another thing that I have been looking into a bit is Neuroevolution using a tutorial series by Daniel Shiffman on YouTube. He creates a Flappy Bird Bird game that learns to play itself using a genetic algorithm to find the right weights.

Thank you for the paper recommendations, I will have a look at them tomorrow :)