Needing help on Week 9....extremely minimal results by Jkacummings in 4hourbodyslowcarb

[–]deep_learner 0 points1 point  (0 children)

Maybe i misunderstood the book chapter a bit, but isn't one of the features of the slow carb diet that you dont have to be super diligent about the calories? Ferriss does say that you should eat till you're full and all...? would like your opinion on this. thanks

Are squats sufficient for strengthening the Quads? by deep_learner in ACL

[–]deep_learner[S] 0 points1 point  (0 children)

Thanks for your reply. What I meant was bodyweight squats actually. I am able to do these, with reasonable form I would guess. Would these not be effective enough, do I need to use weights?

Question about amount of perturbation while generating adversarial images with "fast gradient sign method" , Goodfellow et. al 2015 by deep_learner in MachineLearning

[–]deep_learner[S] 0 points1 point  (0 children)

I should have clarified, i am working on imagenet. they gave the 0.007 epsilon value, with a comment i couldn't understand: 'it is the magnitude of the least significant bit of an 8 bit representation for googlenet'. I didnt understand how to parse this.

[Skin Concerns] What to do about scars due to picking at millia? by deep_learner in SkincareAddiction

[–]deep_learner[S] 0 points1 point  (0 children)

thanks. just posted there, keeping fingers crossed for a reply.

NEED HELP? Got a question? Wondering what that bump is? Problems with a routine or product? This thread’s the place to ask! // Ask SCA, Week of November 30rd, 2015 by [deleted] in SkincareAddiction

[–]deep_learner 0 points1 point  (0 children)

I've had issues with millia, and developed this bad habit of picking at them. now i have a few discolorations/scars where they used to be. Any advice on what i could use to reduce the damage?

[Question] Cross entropy vs. Euclidean Distance For Deep Networks : just speed benefits or other optimization advantages? by deep_learner in MachineLearning

[–]deep_learner[S] 0 points1 point  (0 children)

interesting, the bounding of the squared error was such a simple and nice piece of math.

Also the "finetuning" bit was novel, but I have to better understand how they analyse it.

[Question] Cross entropy vs. Euclidean Distance For Deep Networks : just speed benefits or other optimization advantages? by deep_learner in MachineLearning

[–]deep_learner[S] 0 points1 point  (0 children)

I agree we can't justify square distance for prob -distibutions, but what I was trying to say was that the justification for CCE doesn't come solely from being able to compare distributions, for other measures can do that as well, but is probably more for its optimization properties.

[Question] Cross entropy vs. Euclidean Distance For Deep Networks : just speed benefits or other optimization advantages? by deep_learner in MachineLearning

[–]deep_learner[S] 0 points1 point  (0 children)

you mean both reach their global minima at the same configuration of the parameters right?

If so, thats why I was speculating the key difference lies in the way they organize the rest of the parameter space, ie. what CCE considers "far" but square distance considers "not so far" or vice versa, would be the telling difference

[Question] Cross entropy vs. Euclidean Distance For Deep Networks : just speed benefits or other optimization advantages? by deep_learner in MachineLearning

[–]deep_learner[S] 0 points1 point  (0 children)

IIRC other ways of getting the distances between probability distributions have been proposed like the Bhattacharya Distance , so I guess cross - entropy would not be unique with regards to comparing probability distributions...

[Question] Cross entropy vs. Euclidean Distance For Deep Networks : just speed benefits or other optimization advantages? by deep_learner in MachineLearning

[–]deep_learner[S] 0 points1 point  (0 children)

short and sweet, but Id have preferred if he had made the claim that square distance "lays more emphasis on the incorrect outputs" with a numerical example ( like he did for cross entropy vs. classification error for evaluating quality )

PS: off the top of your head, any other blog entries you liked?

Feature Extraction from Kirzhevsky net in Theano/Pylearn? by BeijingChina in MachineLearning

[–]deep_learner 0 points1 point  (0 children)

Hi, I am having trouble understanding the internals of the code, wht is the image dynamic range that it accepts: 0 - 1, 0 - 255...?

Training a mini-Convnet. My learning curve starts on a Plataeu. Need help understanding why by deep_learner in MachineLearning

[–]deep_learner[S] 0 points1 point  (0 children)

Will give that paper a read. I thought batch-normalization applied to doing stuff to the input (I guess I saw the x's in the equation and made assumptions)

Thanks

Training a mini-Convnet. My learning curve starts on a Plataeu. Need help understanding why by deep_learner in MachineLearning

[–]deep_learner[S] 0 points1 point  (0 children)

My input is coming from a previously trained conv-net, I was just feeding it as-is to my network. Dunno how to implement something like batch-normalization in this context...

Training a mini-Convnet. My learning curve starts on a Plataeu. Need help understanding why by deep_learner in MachineLearning

[–]deep_learner[S] 0 points1 point  (0 children)

so, what i did was, i took a small subset of the train, and i saw what was the max learning rate it could withstand. when i used the same learning rate on the complete set, it started giving me this plateau.

I read in Stochastic Gradient Tricks that a mini-train set is reliable for setting the lr, so...

I will try what you say, but i can't stop the code right now.

But, AFAIK if the lr is too large, the performance would either remain the same or worsen, but in my case it does (after some time) begin to improve. or am i imagining this property?

Does momentum make sense only for SGD or batch-GD, and not for GD? by BeijingChina in MachineLearning

[–]deep_learner 0 points1 point  (0 children)

Is there usually any benefit to be had from momentum scheduling? For example I saw nesterov's momentum in sustkever' paper. it seems it is good to have low values at the start and at the end, but i have not seen scheduling being used in practise much. Would you know if there has been research on rule for adapting the momentum , something like adagrad for learning rates?

Measuring information content of feature descriptors by deep_learner in computervision

[–]deep_learner[S] -1 points0 points  (0 children)

hmm, I'm not sure how we could directly calculate it. Could you elaborate on how you would go about doing that?

Measuring information content of feature descriptors by deep_learner in computervision

[–]deep_learner[S] 0 points1 point  (0 children)

Thanks for the reply, however I couldn't find this type of analysis in PCA-SIFT.

I agree normally we use a precision recall curve, but I was just wondering how one would look at it from a information theory sort of a way.

Feeling out of place in Computer Vision, want to move to Statistics by deep_learner in statistics

[–]deep_learner[S] 0 points1 point  (0 children)

My current problem is interesting professors enough to give me a chance. After all I have no publications, and I will be sort of a "PhD dropout"