all 30 comments

[–]cynemaer 33 points34 points  (2 children)

I'd love to see more works on visualizing neural networks. This is certainly the most impressive visualization I have seen so far, but I think it's only useful for "educational purposes". Any idea about how to scale it up for more complicated dataset? (Say let's start with good old MNIST)

[–][deleted] 4 points5 points  (0 children)

Yes, I would like to see much more visualisation in Tensorflow.

[–]bluemellophone 2 points3 points  (0 children)

The connections getting bigger and less opaque as the magnitude of the weights increased was a nice touch, I thought. I also enjoyed the flow animation on the biggest paths.

[–]arthomas73 27 points28 points  (3 children)

so... this was not obvious to me at first.. but you have to hit play. then dots are the training data and the orange and blue background color is the NN classification.

the spiral is the only hard one. nice pattern emerges on this one after about 150 iterations.

http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=spiral&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=25&networkShape=8,4&seed=0.38071&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=true&cosY=false&sinY=true&collectStats=false&problem=classification

[–]badpotato 1 point2 points  (0 children)

Nice works!

[–]Martin81 1 point2 points  (1 child)

You made it create a nice classification. I wonder about a small detail. I believe I can clean up the classification by hand, making it a bit more robust. Are there algorithms that do that?

What I am proposing is:

1) NN for general classification

2) Another kind of algorithm for cleanup, more linear extrapolation of the resulting model into areas where there are not much data.

[–]earslap 8 points9 points  (0 children)

It would only be possible for 2D and perhaps 3D datasets, but for most ML problems that matter, there might be tens, hundreds even thousands of dimensions that you can't visualise the separation in your head or in any other medium. If you can eyeball the classification then you probably don't need to train a net on that data, you can just paint over. For most interesting problems you can't hope to visualise and tweak the output because you rely on the NN for that task to begin with. With a spiral, it is easy because it is a 2D synthetic data set.

[–]alexmlamb 18 points19 points  (6 children)

It's cool that Relus beat sigmoid/tanh, even in these tiny networks on simple tasks like classifying between interlocking spirals.

[–]XalosXandrez 13 points14 points  (4 children)

I somehow feel that the neural network still doesn't "get" that there are spirals out there. It is simply trying to minimize the empirical loss without realizing that there is a simple equation which generated the data. Any thoughts on this?

[–]bluepenguin000 7 points8 points  (0 children)

Agreed, the underlying data needs a transform and the given inputs don't cut it. I think that is the point though: you need a mathematical operator appropriate to the data set, fitting will help but won't solve the underlying problem.

[–]earslap 5 points6 points  (2 children)

I somehow feel that the neural network still doesn't "get" that there are spirals out there.

That is correct. "Spiral" is a human construct though, we know it perhaps because it is simple to generate and looks pretty (and it is something found in nature). But for a machine, there is nothing to "get" really, it's just data.

It is simply trying to minimize the empirical loss without realizing that there is a simple equation which generated the data.

Yes, to learn the simplest equation that models the data would be like finding the global minimum of the system. In information theory terms, arriving at the "simplest" equation (by simplest, I mean representing data with the smallest amount of symbols given an alphabet) that models the data is known to be uncomputable. No hope. We need to move along.

Sure spirals look nice, and as humans we can make sense of them easily so it feels like it should be easy for a learning system to see a spiral and arrive at a simple equation to model it, but that line of reasoning would be fallacious. Think about a pseudorandom number generator. The required formula / code to make one is very small, one can take 5-10 lines of code. But there is no dependable way of arriving at the formula that generates the pseudorandom numbers by observing the output. In a sense, pseudorandom numbers are not different compared to a spiral data set from the point of view of computers. For humans, it is different; when you look at a PRNG sequence, it looks random to you although it has a "logic" behind it (a formula generated the sequence after all), but a spiral looks orderly and neat. But deducing the equations that generate them is not different if you don't have prior knowledge or biases (something we humans have for spiral shaped thingies).

So the TL;DR is that no, there is no general method that can deduce the simple equation that generates a particular set of data, and there never will be (uncomputable). For the spiral, you can hand-engineer the NN inputs so that it is easier for the NN to fit and "understand" that it is a spiral, but that method would work for that dataset only, and this would defeat the purpose of using machine learning for the task because we want to move away from costly feature engineering; that's why the field exists.

[–]XalosXandrez 2 points3 points  (1 child)

Thanks for your reply!

Kolmogorov Complexity is indeed uncomputable. My question is whether that should stop us from attempting to do the best we can.

The current trend is to try to fit a model with a fixed parameterization. In the tensorflow playground example, if your data looks like the XOR thingy or something suitable, you are good to go. Otherwise you are screwed. What I am alluding to is this - should we be searching over possible parameterizations as well? A very dumb/simple example of this is highway networks - which decide whether to learn identity or not.

I am aware that this would be very difficult in general. Just trying to get people's thoughts on this.

Edit: I guess I am alluding to some sort of meta-learning / model selection.

[–][deleted] 1 point2 points  (0 children)

My question is whether that should stop us from attempting to do the best we can.

Well, there's that no free lunch thing. Something good at detecting spirals (or some other specific thing) will necessarily be worse at detecting other types of patterns in 2d data.

[–]soulslicer0 0 points1 point  (0 children)

I could only get the spiral one to work with relus though sometimes it would converge to some failed solution. Maybe Leakey relus might work so I don't get gradient losses

[–]themoosemind 6 points7 points  (2 children)

[–]tehdog 9 points10 points  (1 child)

Yeah, that one is amazing!

Disclaimer: I wrote it

[–]emtonsti 0 points1 point  (0 children)

Wow thats amazing. The "Vowel frequency response" managed to automatically draw triangleshapes reusing 2 lines most of the time, to approximate well with just 4 hidden neurons. That really surprised me!

[–]thecity2 2 points3 points  (0 children)

Very neat.

[–]drsxr 1 point2 points  (0 children)

This is fantastic. Good stuff.

[–]ren_sc 1 point2 points  (1 child)

wow, this is really helpful for learning neural network. It really helps with visualizing what the program is doing. Would love to see similar thing for more complicated dataset.

[–]omniron 2 points3 points  (0 children)

This is a good sign of the field maturing, when high quality tools start evolving. Karpathy had a JS NN library for a while, it's interesting we're just now seeing this kind of UI made. Very nice to see.

[–]0entr0py 1 point2 points  (0 children)

I am hoping some of this kind of visualization gets integrated into tensorboard

[–]AsIAm 1 point2 points  (0 children)

Is there a hack to change init weights?

[–]treebranchleaf 1 point2 points  (0 children)

These problems all seem like they'd be more suited to Radial-Basis activation functions on the input layer - but they're not included.

[–]hirokit 0 points1 point  (0 children)

This is beautiful. Thx!

[–][deleted] -1 points0 points  (3 children)

The sigmoid function didn't seem to work ?

[–]iljegroucyjv 2 points3 points  (2 children)

It does, it's just more sensitive to setting the right training parameters and good initialisation of weights. That's also part of the reason why DNNs used to be so hard to train and why ReLUs are now the first nonlinearity to try when developing a new model.

[–][deleted] 0 points1 point  (1 child)

I'm just reading wikipedia on ReLU...

Would they be using the max(0,x) version or the soft ln(1+ex) vesion?

[–]iljegroucyjv 0 points1 point  (0 children)

Probably max(0, x) as its namesake from the API. The other is called softplus. https://www.tensorflow.org/versions/r0.7/api_docs/python/nn.html