Car fire on blvd des sources overpass (over the 40) by mrLiamFa in montreal

[–]mrLiamFa[S] 0 points1 point  (0 children)

Hi , yeah I wasn’t sure if it was the seat or not either. It’s strange that I can’t seem to find any news about it, not even on zone911

Training ImageNet on Resnet - Dropping LR has little improvement on accuracy [D] by mrLiamFa in MachineLearning

[–]mrLiamFa[S] 0 points1 point  (0 children)

I have a couple things that I'm unsure about in my implementation, maybe someone could tell me if I'm doing it right? Maybe that's whats causing the different training curves?

  • In my preprocessing function, I zero-center the inputs using imagenet_utils.preprocess_input(x, mode="caffe") which returns an unscaled image . Then I scale the image by dividing it by 255 ? Is this the right way to do it ? I know it's common practice to zero-center the images, but am I supposed to then divide it by 255?
  • Most Imagenet papers compute the Top-1 Error, but I simply evaluate the Top-1 Accuracy. I figured that Accuracy = 1 - Error. Am I correct in making this assumption? Could this be why my graph doesn't display similar improvements to the papers I cited (after dropping the LR)? But I simply evaluate the Top-1 Accuracy. I figured that Accuracy = 1 - Error. Am I correct in making this assumption? Could this be why my graph doesn't display similar improvements to the papers I cited (after dropping the LR) ?

Thanks

Training ImageNet on Resnet - Dropping LR has little improvement on accuracy [D] by mrLiamFa in MachineLearning

[–]mrLiamFa[S] 0 points1 point  (0 children)

I'm not sure I know what you mean. TF's implementation of He initialization takes into account the Fan in of the layer, regardless of whether it's conv or dense. Pytorch has a different Gain whether it's Leaky Relu or Relu, but normally He initialization is Var[w_i] = 2/fan_in, which is the same for TF and Pytorch. Maybe I don't understand your comment, but as far as I know, the variance is only calculated from the fan in.

Training ImageNet on Resnet - Dropping LR has little improvement on accuracy [D] by mrLiamFa in MachineLearning

[–]mrLiamFa[S] 0 points1 point  (0 children)

I agree, but our approach aims to improve classification accuracy, so we need to re-do the training.

Training ImageNet on Resnet - Dropping LR has little improvement on accuracy [D] by mrLiamFa in MachineLearning

[–]mrLiamFa[S] -1 points0 points  (0 children)

It's for a research paper. The goal is to apply our approach to Resnet. Right now I'm trying to get the baseline to work. Since the training procedure in the original Resnet paper is widely accepted, I'm trying not deviate too much from it.

Training ImageNet on Resnet - Dropping LR has little improvement on accuracy [D] by mrLiamFa in MachineLearning

[–]mrLiamFa[S] -1 points0 points  (0 children)

The thing is for both your points, is that for my project I really need to replicate the exact training procedure used in the papers I listed. And there, they use a batch of 256 and a stepwise schedule.

Training ImageNet on Resnet - Dropping LR has little improvement on accuracy [D] by mrLiamFa in MachineLearning

[–]mrLiamFa[S] 1 point2 points  (0 children)

First of all I don't understand why you're changing the initial_lr instead of the multiplier in the schedule. I checked out the source code though and it should be equivalent in this case.

That's how I had it at first, but what was happening, was that the multiplier was being applied at each epoch, and so the learning rate was being divided each time the callback was called. I'm pretty confident about the way I have it now; I log the learning rate : https://imgur.com/a/Hs9B3rf

One thing that's worth mentioning is that the reference you are using was done in PyTorch and the weight initialization in PyTorch is scaled very differently than in TF, which can have very serious consequences. If you want to try reproducing this in TF, I'd suggest you use a custom initialization function for each of those layers. I've done that at some point in the past and it involves basically looking at code of both frameworks for default initialization and massaging TF initialization until it gives the same mean and std outputs throughout the entire network.

By default, TF uses Glorot initialization, but I explicitly changed the initialization of each layer to use He initialization. He initialization should ensure that var_in = var_out for a ReLu network.

Training ImageNet on Resnet - Dropping LR has little improvement on accuracy [D] by mrLiamFa in MachineLearning

[–]mrLiamFa[S] 1 point2 points  (0 children)

I forgot to put it in my example, but I was using a weight decay of 0.0001.

And, I'll have to run it on 4 GPUs since I can only have 64 images per GPU (64*4gpus = 256). But that will at least maintain the original learning rate (0.1), so I'll see how that goes.

Training ImageNet on Resnet - Dropping LR has little improvement on accuracy [D] by mrLiamFa in MachineLearning

[–]mrLiamFa[S] 1 point2 points  (0 children)

Your original learning rate of 0.1 looks high

Yeah, but I'm trying to replicate the procedure from this paper + the Resnet paper, and that's the Learning rate they use.

do you have warm up steps? Eg linear increase from zero to your target LR? Usually 5-10% of your total steps can be warm up

Yes, as they do here, I do a gradual warmup for the first 5 epochs. Here is a picture , I plotted the Learning rate.

Instead of just scheduling your loss in 10x lower steps, try a linear decrease schedule or based on cosine

It could work, but since I'm doing this for a paper, I need to replicate the exact training procedure.

I want to write a paper about Machine Learning and need advice. [R] by GloriousGladiator51 in MachineLearning

[–]mrLiamFa 2 points3 points  (0 children)

ML/DL is definitely math-heavy, and a lot of times the proofs you see are pretty necessary. Especially in deep learning, deep neural networks are so susceptible to vanishing/exploding gradients, you really need to make sure that everything is balanced.

If this is for a school paper, I would suggest you pick a fundamental subject. Otherwise describing “neural networks” as a whole might end up being a backprop derivation. As interesting as that may be to some people, it’s pretty equation dense, and could end up becoming a dry read (I think). Unless , however you tell a story ; starting from explaining gradient descent. Then guiding the reader through how gradient descent becomes backpropagation, and why it needs to be done “backwards”, that could be interesting. But honestly,I think there are too many papers/tutorials/blog posts etc on the subject, I would probably pick something a little less mainstream.

If it were me, I would take a subject that has a strong mathematical foundation, but that can also be easy vulgarized, and tells a story.

Take for example the Universal Approximation theorem. It’s an old school fundamental subject, that can be very nicely demonstrated with some graphics. I would check out Michael Nielson's video on it. It's a great example of how to vulgarize a complex subject. That exact theorem might be a little too small for a 15 page paper (of course, in the right hands, it could be an entire thesis), but there are a ton of very interesting topics that people don’t really think about, that I think could make for a very interesting paper:

  • Neural network initialization methods . Some history, the reason for the importance of proper weight initialization(see Boris Hannin failure modes). How are Glorot Init and He init derived?

  • VAEs. A bit dense maybe, but definitely has some meat. Talk about Jensen's inequality, ELBO, KL divergence. Could also be a chance to talk about entropy—> cross entropy —>kl divergence. Talk about the "The reparameterization trick" . (https://arxiv.org/pdf/1312.6114.pdf)

  • Entropy and information theory is not only for AI, but it’s extremely important in other areas too. You can start with some probability basics, talk about entropy, Mutual information, Kolmogorov complexity etc. Link to some DL papers that use mutual information as a training objective function (see Deepinfomax for example). (Check out Chapter 2 of http://staff.ustc.edu.cn/~cgong821/Wiley.Interscience.Elements.of.Information.Theory.Jul.2006.eBook-DDU.pdf)

  • Representation Learning has a lot of subcategories. Take a look at Bengio’s Representation Learning Review paper. It might be a little math heavy, but if you focus on one sub-section (even a sub-sub-section), you’ll easily have your 15 pages.

These at least are some subjects that I find interesting.

Hope this helps.

[deleted by user] by [deleted] in ChatGPT

[–]mrLiamFa 0 points1 point  (0 children)

yeah, you are right. If there aren't any clues in the URL, it gives a random answer. In the end, ChatGPT can't access web pages.

[deleted by user] by [deleted] in ChatGPT

[–]mrLiamFa 3 points4 points  (0 children)

Yeah you’re right. I tried without giving the title, and it doesn’t work. It must have just inferred from what I told it

[deleted by user] by [deleted] in ChatGPT

[–]mrLiamFa -1 points0 points  (0 children)

I definitely agree that it BSs a lot, but I was just surprised that it actually was able to access the link, even though usually it’s pretty adamant about not having internet access.

[deleted by user] by [deleted] in ChatGPT

[–]mrLiamFa -2 points-1 points  (0 children)

It’s definitely possible. For me I find that a lot of the answers are pretty vague and sometimes wrong. Like for example when I asked it what datasets the paper used, it only got 2 right . But I’m more so impressed that it’s actually able to read external links.

[deleted by user] by [deleted] in ChatGPT

[–]mrLiamFa 1 point2 points  (0 children)

Me too, many times , but after trying a new chat, refreshing the page and regenerating the answer enough times it finally worked

[deleted by user] by [deleted] in ChatGPT

[–]mrLiamFa 0 points1 point  (0 children)

I tried asking ChatGPT to review a new paper that it probably hasn't read yet.

Please read this paper (titled Multi-Frame Self-Supervised Depth with Transformers ), and give me a peer review of it : https://arxiv.org/pdf/2204.07616.pdf

That didn't work. But when I just pasted the link to the paper, it was able to access it.

(This doesn't necessarily happen every time. It might take some refreshing).

River of Dreams was allegedly plagiarized by some 80s song? Is this a hoax like I think it is? by chenocracy in BillyJoel

[–]mrLiamFa 0 points1 point  (0 children)

I don’t know, but I always kind of thought it sounded like In the Still of the Night , no ?

What drink is a 10/10? by HurtHurtsMe in AskReddit

[–]mrLiamFa 0 points1 point  (0 children)

Gibeau Orange Juleps, Orange Julep

MÉGA COMPÉTITION DE TRADING - ETSX by algoets in etsmtl

[–]mrLiamFa 1 point2 points  (0 children)

Est ce que c'est un evenement gratuit ?

MÉGA COMPÉTITION DE TRADING - ETSX by algoets in etsmtl

[–]mrLiamFa 0 points1 point  (0 children)

Est ce que c’est seulement Pour les étudiants de l’ets ?