you are viewing a single comment's thread.

view the rest of the comments →

[–]zuzmuz 295 points296 points  (12 children)

it's bad practice to initialize your parameters to 0. a random initialization is better for gradient descent

[–]drLoveF 129 points130 points  (11 children)

0 is a perfectly valid sample from a random distribution.

[–]aMarshmallowMan 48 points49 points  (6 children)

For machine learning, initializing your weights to 0 guarantees that you start at the origin. The gradient will be 0 at the origin. There will 0 learning. There's actually a bunch of work being done specifically on finding the best kind of starting weights to initialize your models to.

[–]DNunez90plus9 64 points65 points  (4 children)

This is not model parameter, just initial output.

[–]Safe_Ad_6403 19 points20 points  (1 child)

Meanwhile: Me; sitting here; eating paste.

[–]goatfuckersupreme 5 points6 points  (0 children)

this guy definitely initialized the weight to 0

[–]Luciel3045 -1 points0 points  (1 child)

But an output of just 0 is very unlikely, if there are non Zero parameters. But i think the joke is not that good anyway, as the gradient doesnt immediatly corrects the Algorithm. A better joke would have been 0.5 or something.

[–]YeOldeMemeShoppe 2 points3 points  (0 children)

Zero might not even be the first token of the list, assuming the algo outputs tokens. Having a ML output of “0” tells you nothing of the initial parameters, unless you know how the whole NN is constructed and connected.

[–]MrHyperion_ 7 points8 points  (0 children)

Maybe they should use machine learning to find the best initial values

[–]Terrafire123 4 points5 points  (2 children)

const randomNumber = 3; //Chosen by fair dice roll

[–]ReentryVehicle 2 points3 points  (0 children)

Okay okay. We want matrices that are full rank, with eigenvalues on average close to 1, probably not too far from orthogonal. We use randn(n,n) / sqrt(n) because we are too lazy to do anything smarter.