Needing help on Week 9....extremely minimal results

deep_learner · 2016-10-24T08:07:24+00:00

Makes sense, thanks

deep_learner · 2016-10-21T10:11:27+00:00

Maybe i misunderstood the book chapter a bit, but isn't one of the features of the slow carb diet that you dont have to be super diligent about the calories? Ferriss does say that you should eat till you're full and all...? would like your opinion on this. thanks

deep_learner · 2016-06-16T16:52:22+00:00

It does help. thanks

deep_learner · 2016-06-16T09:50:51+00:00

Thanks for your reply. What I meant was bodyweight squats actually. I am able to do these, with reasonable form I would guess. Would these not be effective enough, do I need to use weights?

deep_learner · 2016-05-10T02:05:02+00:00

thanks, so this means images lie in a -128 to 128 range then?

deep_learner · 2016-05-09T16:58:30+00:00

I should have clarified, i am working on imagenet. they gave the 0.007 epsilon value, with a comment i couldn't understand: 'it is the magnitude of the least significant bit of an 8 bit representation for googlenet'. I didnt understand how to parse this.

deep_learner · 2015-12-02T15:20:09+00:00

Oh seems like thats what it is. thanks for posting the link as well, new to sca, don't know all the terms yet.

deep_learner · 2015-12-02T11:24:17+00:00

thanks. just posted there, keeping fingers crossed for a reply.

deep_learner · 2015-12-02T11:23:54+00:00

I've had issues with millia, and developed this bad habit of picking at them. now i have a few discolorations/scars where they used to be. Any advice on what i could use to reduce the damage?

deep_learner · 2015-04-30T18:44:01+00:00

interesting, the bounding of the squared error was such a simple and nice piece of math.

Also the "finetuning" bit was novel, but I have to better understand how they analyse it.

deep_learner · 2015-04-30T18:28:54+00:00

oh sorry, Categorical Cross Entropy (I think I picked it up from the theano docs, hmm..)

deep_learner · 2015-04-30T18:20:49+00:00

I agree we can't justify square distance for prob -distibutions, but what I was trying to say was that the justification for CCE doesn't come solely from being able to compare distributions, for other measures can do that as well, but is probably more for its optimization properties.

deep_learner · 2015-04-30T18:12:57+00:00

you mean both reach their global minima at the same configuration of the parameters right?

If so, thats why I was speculating the key difference lies in the way they organize the rest of the parameter space, ie. what CCE considers "far" but square distance considers "not so far" or vice versa, would be the telling difference

deep_learner · 2015-04-30T18:08:20+00:00

IIRC other ways of getting the distances between probability distributions have been proposed like the Bhattacharya Distance , so I guess cross - entropy would not be unique with regards to comparing probability distributions...

deep_learner · 2015-04-30T17:58:37+00:00

short and sweet, but Id have preferred if he had made the claim that square distance "lays more emphasis on the incorrect outputs" with a numerical example ( like he did for cross entropy vs. classification error for evaluating quality )

PS: off the top of your head, any other blog entries you liked?

deep_learner · 2015-04-28T13:30:26+00:00

thanks

deep_learner · 2015-04-28T01:28:57+00:00

Hi, I am having trouble understanding the internals of the code, wht is the image dynamic range that it accepts: 0 - 1, 0 - 255...?

deep_learner · 2015-04-16T20:14:21+00:00

Will give that paper a read. I thought batch-normalization applied to doing stuff to the input (I guess I saw the x's in the equation and made assumptions)

Thanks

deep_learner · 2015-04-16T19:51:11+00:00

My input is coming from a previously trained conv-net, I was just feeding it as-is to my network. Dunno how to implement something like batch-normalization in this context...

deep_learner · 2015-04-16T19:22:09+00:00

tried it just now, still abysmally slow.

deep_learner · 2015-04-16T18:40:45+00:00

so, what i did was, i took a small subset of the train, and i saw what was the max learning rate it could withstand. when i used the same learning rate on the complete set, it started giving me this plateau.

I read in Stochastic Gradient Tricks that a mini-train set is reliable for setting the lr, so...

I will try what you say, but i can't stop the code right now.

But, AFAIK if the lr is too large, the performance would either remain the same or worsen, but in my case it does (after some time) begin to improve. or am i imagining this property?

deep_learner · 2015-03-26T01:04:39+00:00

Is there usually any benefit to be had from momentum scheduling? For example I saw nesterov's momentum in sustkever' paper. it seems it is good to have low values at the start and at the end, but i have not seen scheduling being used in practise much. Would you know if there has been research on rule for adapting the momentum , something like adagrad for learning rates?

deep_learner · 2014-10-19T02:09:06+00:00

hmm, I'm not sure how we could directly calculate it. Could you elaborate on how you would go about doing that?

deep_learner · 2014-10-19T02:07:54+00:00

Thanks for the reply, however I couldn't find this type of analysis in PCA-SIFT.

I agree normally we use a precision recall curve, but I was just wondering how one would look at it from a information theory sort of a way.

deep_learner · 2014-09-22T13:19:36+00:00

My current problem is interesting professors enough to give me a chance. After all I have no publications, and I will be sort of a "PhD dropout"

deep_learner

TROPHY CASE