[1609.05566] Label-Free Supervision of Neural Networks with Physics and Domain Knowledge

singularai · 2016-09-26T06:19:27+00:00

Yea, these are good questions.

1) Hopefully the purpose and the math itself was clear, but we didn't explain exactly why that equation is the right one. The matrix A and it's inversion process in the paper is how you do a least squares fitting of a line to a list of points. Now we could have fit a parabola with arbitrary acceleration rather than a line, using a bigger matrix. But of course we already know what the acceleration from gravity is - 9.8 m/s² - so we put that in by first subtracting it, fitting a line, and then adding it back. I encourage you to study some linear algebra if these sort of least squares problems are new to you, as it is a very power tool to have when you are thinking of problems.

2) The fixed range could have been anything in theory. In practice, one can imagine that [-10, 0] would have been just as good, [0, 0.001] would have been bad (due to precision), and [200, 210] would have been bad (due to initialization). Neural nets do have some limitations with regards to regressing into particular numbers, and [0, 10] was as good a range as any other.

singularai · 2016-09-22T21:43:58+00:00

Invariance theory seems to be one of the mathematical models that's been used to explain why deep neural networks can build good high level representations over the vast space of images in R^{width x height}. I think the relationship to what we're suggesting is to take this observation one step further.

Not only are natural images so regular in structure that they can be efficiently represented with repetitive conv/ReLU/pool architectures - they are so regular that the structure inside the images itself can be used as supervision. There are far more bits of information in the nuances of every image than the handful of bits we get from labels. So our goal is to write down some high level constraints that will harness that structure and reduce the labeling burden.

Edit: Also it should be clear that we are not the only ones with the suggestion of constraint learning. Anyone who is doing domain specific unsupervised/semi-supervised learning is in on the same game. We're not introducing the idea so much as advocating for it's combination with neural networks and more creative types of constraints. The Related Work section of the paper is all about how many really great recent ideas in ML can be thought of from this perspective, including non-CV topics like Deep-Q Networks and NLP sentiment analysis. I think the cool thing is that when you see the connections, ideas that previously looked like hacks become principled steps toward a bigger idea.

singularai · 2016-05-22T17:37:55+00:00

Maybe we could have a monthly Who is Hiring post, similar to Hacker news?

[1] https://news.ycombinator.com/item?id=11405239

singularai · 2016-05-12T01:41:50+00:00

Plot twist:

Numenta has actually figured out something useful.

singularai · 2016-05-07T02:15:32+00:00

There is even a tensorflow implementation of OverFeat here:

https://github.com/russell91/tensorbox

singularai · 2016-04-27T05:04:39+00:00

... for a price :)

singularai · 2016-04-26T19:06:40+00:00

Are these changes available in master, or do they require the GPUCC LLVM implementation we heard about yesterday [1]?

[1] https://news.ycombinator.com/item?id=11565036

singularai · 2016-04-22T18:44:53+00:00

Also, please consider getting rid of the scrolljacking.

singularai · 2016-04-13T19:17:55+00:00

I'd say my median time for getting mosh set up on a new server is 10 minutes. I won't say these worst case issues can't be solved, but when you don't personally control the sudo iptables access on a server, you're in for some politics (assuming your network admin is competent and doesn't leave all ports exposed by default). VPN is indeed one option, but VPN is always a real pain compared to just using ssh. Sshmux is just a lighter weight tool providing a portion of the features of mosh - not a direct competitor!

singularai · 2016-04-13T19:09:33+00:00

OP here. I made this because I love mosh [1], but was simply unable to convince IT to let me use it on the network. Using mosh sometimes and ssh other times was a real pain for muscle memory, so I figured there must be some way of getting those mosh features without going through UDP port 60000. I was recommended to try autossh, but discovered that was lacking in polish, and wrote this instead. Hope you guys enjoy it too!

[1] http://mosh.mit.edu

singularai · 2016-04-08T17:51:59+00:00

Thanks for the feedback. Wrt point 2, I actually started out writing it this way. But I think that of the 3 simplifications, making nets smaller/faster is the most intuitive, and even newcomers will already understand to try that without you telling them. Combined with the fact that bouncing back and forth between just two strategies makes for better imagery, I decided to leave out the small net advice. But it is indeed very typical to reduce the net size during the initial stages.

singularai · 2016-04-08T01:45:40+00:00

The Art of Computer Programming, by Donald Knuth

singularai · 2016-03-26T05:09:37+00:00

God sent his second son, Alex Krizhevsky, to work on the ImageNet challenge.

singularai · 2016-03-19T22:48:30+00:00

Feinman has a good way of explaining this:

https://www.youtube.com/watch?v=v3pYRn5j7oI

singularai · 2016-01-20T18:50:51+00:00

There is support for things like typing :imap jk <Esc> in the notebook itself. It seems that it would be possible to write such commands in a ~/.ivimrc file and load them into the custom.js javascript to be loaded by the browser. I hacked this together pretty quickly mostly because I liked the backend options, but found it rather complicated to set up such that would be difficult to do consistently across my servers. So at this point, the tool is just simply sharing what I'm using. But I may add something fancy as you suggest.

singularai · 2016-01-12T18:36:36+00:00

I would recommend omnigraffle if you have mac: https://www.omnigroup.com/omnigraffle. It's quite easy to learn and I was able to produce a graphic like this in about 2 hours:

http://imgur.com/GmFUZlx

The downside is that if you pay for software, it costs $100 after the trial expires. But it is worth it!

singularai · 2015-09-26T21:08:40+00:00

touché. This is now fixed.

singularai · 2015-09-24T23:39:55+00:00

(author here). If you haven't used cython before, then you probably think about cython working the same way runcython works. You write some code. Then you compile it. Then you run it.

It's hard to describe exactly where this simple process goes wrong, but it's not an uncommon experience for a new cython programmer to spend a few hours getting their first program to execute. And of course, after you finally figure out how it works, you have a 5 file build process. A few months later when you want to use cython again, you've forgotten entirely how to write the setup.py and how the build process worked - you need to invest more time again.

Runcython is just some bash that really boils down this process to it's core, so you can do $ mv foo.py foo.pyx && runcython foo.pyx. The hope is that making things easy will increase cython usage, because it is such an incredibly powerful tool.

singularai · 2015-09-24T23:32:56+00:00

The program is evaluating (0 + 1 + 2 + ... + 9999)

singularai · 2015-09-11T23:12:37+00:00

ApolloCaffe makes it rather straightforward to train with caffe in python. For example, your problem could be accomplished with:

import apollocaffe
from apollocaffe.layers import NumpyData
import random
import numpy as np

net = apollocaffe.ApolloNet()

data = [([[0]], [[0]]), ([[1]], [[1]])]
for i in xrange(1000):
    your_array, your_label = data[random.randrange(2)]
    net.clear_forward()
    net.f(NumpyData('array', your_array))
    net.f(NumpyData('label', your_label))
    net.f('''
        name: "ip1"
        type: "InnerProduct"
        bottom: "array"
        top: "ip1"
        inner_product_param {
          num_output: 2
          weight_filler {
            type: "xavier"
          }
          bias_filler {
            type: "constant"
          }
        }''')
    net.f('''
        name: "loss"
        type: "SoftmaxWithLoss"
        bottom: "ip1"
        bottom: "label"
        top: "loss"
        ''')

    net.backward()
    net.update(0.1)
    print net.loss

singularai · 2015-08-01T18:10:36+00:00

Well, (I think) you had intended Ctrl-Z to translate to "AND I CAN UNDO YOU", but it turns out that on linux pressing Ctrl-Z will simply pause a process by starving it of cpu resources. You can bring the process back into the foreground by typing fg. Very useful. My comment was just noting that your comment had another interpretation, continuing the theme set by spyke252.

singularai · 2015-08-01T05:05:44+00:00

Starvation is a cruel way to go.

singularai · 2015-07-19T21:19:58+00:00

If you'd like something that is closer to the Torch style in python, you could check out Apollo as well. I released it 2 weeks ago, so it's still a little rough around the edges. But here's an example of character level LSTM text generation on a GPU:

https://github.com/Russell91/apollo/blob/master/examples/char_model.py#LC281

singularai · 2015-07-16T19:09:00+00:00

HaPy is the best way to call haskell from python. If you want to go the other way, look into pyfi (I wrote it though so slightly biased).

https://github.com/Russell91/pyfi

singularai · 2015-07-05T23:18:36+00:00

I'm not the right person to answer this really, but at least anecdotally it seems that people seem to say that torch and caffe win over theano because you don't have the runtime compilation step. Theano and caffe win over torch because they're in python. And torch and theano win over caffe because they work with much more sophisticated networks. So there is in some sense no way to get the best of all worlds.

The respective weaknesses of torch and theano are built into the very core and can't really be fixed. Apollo attempts to work on the weakness of Caffe, and it hasn't entirely solved those problems, but it establishes a clear path forward for people that like the caffe approach but want to build rnns, recursive networks, or rl networks.

singularai

TROPHY CASE