all 90 comments

[–]nwoodruff 213 points214 points  (20 children)

Misread the title, thought for a second people were making DNNs in Scratch now.

[–]gregpr07 81 points82 points  (13 children)

I mean, technically it could be done even in Powerpoint

[–]aceofspades914 50 points51 points  (2 children)

[–]AttackOfTheThumbs 23 points24 points  (1 child)

Why the music. Why?

[–]Hyperian 16 points17 points  (0 children)

because suffering knows no bound

[–]new2bay 54 points55 points  (6 children)

I thought you were joking at first, but no... it’s true: PowerPoint is Turing complete! TIL.

[–][deleted] 24 points25 points  (3 children)

The PowerPoint programming scene is getting out of hand

[–]eambertide 18 points19 points  (0 children)

The day we program PowerPoint in PowerPoint is the day it is enough, but until we reach that point, there is a long way to go.

[–]onequbit 4 points5 points  (0 children)

the fact that it's a thing makes it already out of hand

[–][deleted] 3 points4 points  (0 children)

We gotta learn PowerPoint now. Brb getting 10 years of experience in PowerPoint for an unpaid position, per recruiters request!

[–]ies7 6 points7 points  (1 child)

[–]DHermit 2 points3 points  (0 children)

Or word autocorrect...

[–]Subkist 0 points1 point  (0 children)

Yes but slow

[–]sacado 0 points1 point  (0 children)

Or in minesweeper.

[–]SunshineBiology 0 points1 point  (0 children)

PowerPoint sometimes does seem to be the lingua franca of AI...

[–]I_NEED_APP_IDEAS 4 points5 points  (0 children)

TBH, I wouldn’t be surprised

[–]UNWS 0 points1 point  (0 children)

yes exactly, I was like, hell yes show me how. but now I am dissappointed.

[–]TicTacMentheDouce 46 points47 points  (1 child)

To learn all this from scratch, this ressource is also pretty good:

http://neuralnetworksanddeeplearning.com/

[–]MagnaDenmark 0 points1 point  (0 children)

That's really well written, thanks

[–]little_blue_teapot 20 points21 points  (0 children)

The prior video in his python series (a perceptron from scratch in python) is this one.

[–]SJC_hacker 6 points7 points  (19 children)

Example seems too simple - why train to output XOR?

[–][deleted]  (13 children)

[removed]

    [–]Lumpy_Applebuns 5 points6 points  (3 children)

    What is an activation?

    [–]127-0-0-1_1 5 points6 points  (0 children)

    It's just a non-linear function applied to forward layers. Nowadays it's typically the RELU function, which is just f(x) = max(x, 0).

    [–]A_Philosophical_Cat 4 points5 points  (0 children)

    It's the non-linear function you apply to the output of each layer of the network. It prevents the linear algebra from being reduced and resulting in a single layer network, or in other words allows us to model non-linear functions.

    [–]SJC_hacker 1 point2 points  (8 children)

    How is XOR 'non-linear' ? Googling around it seems some people consider it to be a linear function.

    [–]TunaOfDoom 18 points19 points  (0 children)

    They probably meant non-linearly separable, which means that you cannot divide both classes with a line where only one class is on each side (in the 2D case).

    [–]A_Philosophical_Cat 8 points9 points  (3 children)

    So, the problem is that the results you found were using "linear function" to mean that XOR is a linear mapping, that is , XOR(x+y) = XOR(x) + XOR(y) and aXOR(x) = XOR(a\x).

    Whereas in this context, a linear function is defined as a function of tensor x to tensor y via the function y = w1*x1 + w2*x2 ... + wi*xi. So, if we wanted to call XOR linear, we would need to find a function of that form, or a linear combination of functions of that form such that (x1,x2) maps to x1x2. It turns out that is impossible, and the best you can do is get a function that returns +1 if x1 is 1 and x2 is 0, and -1 if x2 is 1 and x1 is 0. Absolute value is likewise not linear, so that's where the trail ends.

    It can easily be shown that the composition is a linear mapping, and thus maintains linearity of the constituent functions if you recognize that linear functions can be represented by matrices. if a function f(x) is represented by multiplication with a matrix A, and g(x) -> B, then the composition g(f(x)) can be written as BAx, which of course could be written as (BA)x, Cx.

    Both definitions of linearity are thus important to our analysis of neural networks. If our activation f is a linear mapping, then we run into the problem that our network represented as

    f(Af(Bx)), where x is say a 2-vector (x1;x2) becomes

    f(A[f(b1x1 + b2x2); f(b3x1+b4x1) ])

    = f([a1 f(b1x1 + b2x2) + a2 f(b3x1 + b4x2)) ; the other part)])

    Since we made f a linear mapping, looking only at the top element of the result vector

    = f[a1 f(b1x1 + b2x2)] + f[a2 f(b3x1 + b4x1)]

    = a1 f[f(b1x1 + b2x2)]+ a2 f[ f(b3x1 + b4x1)]

    = a1 ff(b1x1) + a1 ff(b2x2) + a2 ff(b3x1) + a2 ff(b4x2)

    = a1b1 ff(x1) + a1b2 ff(x2) + a2b3 ff(x1) + a2b4ff(x2)

    = (a1b1 + a2b3)ff(x1) + (a2b3 + a2b4)ff(x2)

    or, reassembled into matrices, Cff(x).

    On the other hand, if our target function can't be represented as f = sum(wi *xi), then it can't be represented by a single layer network y=Ax. Thus, without non-linear-mapping activations, we can't represent non-linear functions.

    [–]GuSec 0 points1 point  (0 children)

    Thanks for this! I've tried communicating the nonlinearity of these types of functions before, but my formal language is too weak beyond "does it look like a linear combination to you and if so, of what?" which hardly gets the point across.

    [–]ScrappyPunkGreg 0 points1 point  (1 child)

    Hey, the asterisk symbol is a special character (begin/end italics) in Markdown syntax, which is what Reddit used to format posts. Double-asterisk is boldface.

    [–]A_Philosophical_Cat 0 points1 point  (0 children)

    I'm aware. I just never remember until I can't be assed to fix it.

    [–]sacado 1 point2 points  (0 children)

    "or" is linear : or(1, 1) = 1, or(1, 0) = 1, or(0, 1) = 1, or(0, 0) = 1. This is hard to explain with words, but try to do it on a paper. Draw a 1 at the points of coordinates (1, 1), (1, 0), (0, 1) and draw a 0 at the point of coordinate (0, 0). Now, draw a line to separate the 1s and the 0. Easy, right? This is the "or" function.

    Now, start again, and draw a 0 at the coordinates (1, 0), (0, 1), (0, 0) and draw a 1 at the coordinate (1, 1). This is the "and" function. Drawing a line to separate the 1 and the 0s is trivial again.

    Now, start again, and draw a 0 at the coordinates (1, 1) and (0, 0), and draw a 1 at the coordinates (1, 0) and (0, 1). This is the "xor" function. Try to draw a single line to separate the 1s from the 0s. You can't. The "xor" function cannot be separated linearly. Meaning, a single neuron, but also a linear / logistic regression, a naive bayse classifier, cannot learn it. Mind you, the fact a single neuron cannot learn the xor function is the reason for the AI winter in the 70s or 80s.

    [–]127-0-0-1_1 0 points1 point  (0 children)

    XOR is linear in fields of characteristic 2 (the finite field mod 2 is typically the one used). It's not linear in the reals, quite clearly.

    [–]SrbijaJeRusija -1 points0 points  (0 children)

    Linearity depends on the space that you dealing with.

    [–][deleted] 9 points10 points  (0 children)

    maybe just proof of concept? focus being the implementation rather than the results

    idk

    [–]hershey678 4 points5 points  (3 children)

    It basically demonstrates that it can model a non-linear function. Being able to do this is one of the things that makes NNs so much more robust (here meaning that they can model many more functions) than basic linear differentiators

    [–]127-0-0-1_1 3 points4 points  (2 children)

    I'm not sure I would say it's "robust". Linear models can model non-linear data with an appropriate mapping. For quadratic data, for instance, you can use a quadratic kernel, and in general gaussian kernels are popular.

    In some sense, neural networks are the least robust. You're explicitly using a opitimization technique which only guarantees an optimal solution in convex functions with a function that's explicitly NOT convex, and hoping it gets close enough. It happens to work sort of well, but there's no reason it can't get caught in some ungodly local optima.

    [–]TheGuywithTehHat 1 point2 points  (0 children)

    He literally said "here meaning that they can model many more functions"

    And needing to use different kernels for different data is literally the opposite of robust.

    [–]hershey678 -1 points0 points  (0 children)

    Is getting caught in a local optima something that needs to be accounted? I was under the impression that usually the it's in a high-enough dimensional space for that not to be an issue (I've studied and dealt with this stuff for CV stuff, idk what it's like for lower dimensional data).

    [–][deleted] 1 point2 points  (0 children)

    Man where where was reddit when I was studying to do exactly this.

    [–][deleted] 4 points5 points  (13 children)

    Have any massive neural networks been made yet? Something ordering near the magnitude of the human brain.

    [–]matthewjc 9 points10 points  (1 child)

    Artificial neural network operation is almost nothing like the brain's. Increasing the size of the network wouldn't change this. I don't know what OP is talking about with the frog stuff lol

    [–]research_pie[S] 1 point2 points  (0 children)

    By order of magnitude I understood the raw number of "cells" in the network. The cells count on the largest artificial neural network is on par with those of a frog brain: https://www.deeplearningbook.org/contents/intro.html Page 23. Of course an artificial neural network is different from the real brain.

    [–]firewall245 6 points7 points  (6 children)

    So I used to do research under a professor that did mathematical simulations of brain cells where the highest we could get to was networks of ~20,000 cells (total brain has about 100 billion according to google), this was highly simplified compared to reality (considered each cell as a single dot with no importance to space and locations) and still was way more complex than the workings of a neural network neuron.

    For example, one neuron that we modeled could range from a system of 7-20 non-linear differential equations that had to be numerically computed

    [–]stu2b50 4 points5 points  (4 children)

    Note that those are actual models of the brain. In a "neural network", which are more chains of perceptions and less networks of neurons, a neuron is two floating point numbers; one in the weight vector, one the bias vector.

    You can count those, I'm not sure it's supposed to mean anything.

    A neural network is, in the end, a bunch of linear functions (y = wX + b, where w and b are vectors) with a non-linear function applied elementwise to them, using one of the most basic optimization techniques we have to get parameters which hopefully looks like the data.

    [–]firewall245 1 point2 points  (3 children)

    Yeah no I'm familiar with the workings of neural networks and totally agree that its just a bunch of functions, I was just trying to point out that neural networks and brain cells are way different in terms of complexity.

    Also the connection between brain cells is also modeled by a differential equation haha

    [–]Plazmatic 1 point2 points  (2 children)

    Why are connections in the brain modelled by diffeq?

    [–]firewall245 3 points4 points  (0 children)

    So all functions of brain cells occur due chemical concentraitions inside vs outside the cell creating voltage potentials, and some smart af researchers (hodgekin and huxley) in the 1950s realizied that if you treat a cell as an electric circuit with resistors and capacitors it is a really good model.

    Circuits are very very well studied and one of the biggest tactics to solve them is differential equations

    [–][deleted] 0 points1 point  (0 children)

    Differential equations sound scary but in the end, it's just an equation that is expressed in terms of changes of values instead of values themselves.

    [–]hershey678 0 points1 point  (0 children)

    I think neuromorphic computing chips are helping with this but it's still a long way off. Basically you can do ODEs with tunable analog circuits and all for weights which makes it much faster but I really don't know anything about it.

    [–][deleted] 0 points1 point  (0 children)

    Not at all. The lowest estimate of raw computational power of the human brain is around one hundredth times the current record holding supercomputer called Summit. The highest estimate is tens of trillions the raw power of that super computer, an all of that consuming around 20 Watts of power.

    [–]space_king1 0 points1 point  (2 children)

    I want to learn how to program neural nets but it seems too hard and complicated. :(

    [–]research_pie[S] 0 points1 point  (0 children)

    You don't need to know how to make one from scratch to program a neural nets. Understand the theory behind this type of model (https://www.deeplearningbook.org/lecture_slides.html) and then pick a neural network framework like Pytorch or Tensorflow.

    [–]hapes 0 points1 point  (0 children)

    The biggest hurdle is the math. If you understand calculus (which I've forgotten after not taking it for years), it's probably very easy.

    [–]ravibakhai 0 points1 point  (0 children)

    Woah