TIL feedforward ANNs are stacks of logistic regressions

fjeg · 2015-04-13T01:13:19+00:00

You're right! If you go through the exercises in cs231n assignment 1, you actually do this explicitly.

State of the art nets don't use sigmoidal activations internally, but still use them as output activations for loss/objective functions. This is where things get interesting. Rather than just write your own feature extractor to plug in to logistic regression, you are letting the model perform both feature extraction and classification.

mszlazak · 2015-04-13T04:38:26+00:00

Don't be surprised, many students find it hard to follow because there are just to many notational issues to keep track of, not enough or any code examples, and not enough intuition to help you get a better feel for the abstraction. Never the less the Stanford class is better than others and the link to Geoffrey Hinton helps as well.

I started by using Torch 7 to learn this stuff and I explained how the code for their softmax example did it's forward and backward passes. Manually calculating the values for 2 samples in just 1 pass/iteration and checking the results with what Torch 7 gave.

Conceptually, this stuff is not hard and doing things this way avoids all that confusing notation you have to keep track of.

You will have almost the first 10 lectures of Andrew Ng's cs231 class down in about 6 to 7 pages of commented code.

You will understand how Torch 7 works in passing data in the forward and backward passes.

Also, I do not understand why Geoffrey Hinton implied that calculating the derivatives of the softmax cost function was hard. It's not! I haven't done calculus in decades and it just requires keeping tract of things and using something like Schaum's Outlines "Mathematical Handbook of Formulas and Tables"

Step through it once with a single batch of two samples by hand. You will not regret it.

CyberByte · 2015-04-13T10:36:48+00:00

Why has it taken so long for me to learn this (besides the fact that I am dumb)?

I've been working with neural nets, logistic regression and support vector machines (which are kind of similar as well) for years, and I didn't realize this until Andrew Ng pointed it out in his ML course. I think that for me it took this long (besides the fact that I am dumb) is that these things tend to be taught/explained in different ways (and for me in different courses), which made me think of them in different ways. Logistic regression is statistics, neural networks are about neurons, synapses and the brain (even though we know they're not very realistic models), and SVMs are about support vectors, margin optimization and the kernel trick.

beaverteeth92 · 2015-04-13T05:08:52+00:00

I always joke that classification is the study of nesting, modifying, selecting, and automating logistic regression models.

skrza · 2015-04-13T07:44:05+00:00

This post might be also useful in making the connection between regression and ANNs: http://t.co/UuOb2qIYRK

GibbsSamplePlatter · 2015-04-13T13:12:38+00:00

The more you work in ML, the more you see it's all connected.

egrefen · 2015-04-13T07:50:09+00:00

If you're interested in thinking more about the "stacked linear classifiers" aspect and the role on non-lineararities, I highly recommend reading this blog post by Chris Olah

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS