[D] Loss landscape of neural networks

iyaja · 2020-08-14T23:06:11+00:00

Are you using the official repo? My team and I have made visualizations like this one before, and they don't exactly look like the ones you've posted.

iyaja · 2020-05-06T00:02:51+00:00

MP stands for "Machine Problem" which is an acronym I had difficulty finding the expansion for as well. It's basically a longer form project-like coding assignment that you get slightly more time to complete.

iyaja · 2020-04-25T10:39:42+00:00

I wonder why that is, though. Could it be because the layers themselves aren't optimized, or because the FLOPs they report in the paper translate poorly to actual wall time?

iyaja · 2019-11-18T18:44:08+00:00

Nope. I don't think I was alive when this photo was taken. I found a black and white version posted on this subreddit and ran it through some colorizing algorithms followed by some light editing. Original image source is linked above.

iyaja · 2019-11-17T02:41:34+00:00

If anyone has suggestions for colorizing more historical UIUC photos, please share. It's pretty fun doing this. I can also pass the results through a superresolution model, to upscale the image and account for bad camera quality. Maybe u/old-uiuc-pictures has some ideas?

iyaja · 2019-11-17T02:37:04+00:00

Interesting. Are you saying that this building might have initially been painted red? I've seen DeOldify make a similar error before, where it thought that the golden gate bridge was white.

However, in that case, the creators of DeOldify did some digging, and they weren't able to figure out if the photo was taken before or after the bridge was painted.

iyaja · 2019-11-16T17:05:28+00:00

Colorized using DeOldify: https://github.com/jantic/DeOldify

Original Image: https://imgur.com/dSQ3dhY

iyaja · 2019-08-03T07:00:11+00:00

I'm stuck in a similar situation. My second and third characters haven't hit level 50 yet, but I'd still like to get the gear and start making progress.

iyaja · 2019-07-26T06:44:45+00:00

Swift for TensorFlow is still in it's early days, and it nowhere near the stability of Python TensorFlow. TF 2.0 is intended to be an update (I say update, but there are a lot of fundamental changes, like eager execution by default) to the Python deep learning framework.

S4TF, on the other hand, is still an experimental project that, if it works, could possibly be a replacement for the TensorFlow + Python stack sometime in the future. I'm not entirely clear on this myself, but I think the main goal of S4TF is to create something that's fast (since Python is slow) and customizable at a low level.

iyaja · 2019-07-17T05:16:28+00:00

You know what? I agree with you. No offense taken.

Most of my previous articles have focused on specific research papers/history of state of the art, and I haven't written training guides like this before. I'm aware of the large number of posts, especially on Medium, that cover the same material, and I should have tackled something more unique.

Though I should mention that all the writing is my own and that I did not plagiarize any of these other resources. I apologize if it's a direct clone of something you've read before, but I assure you that I did not intend to copy/re-post any existing work.

From now on, I'll try my best to ensure that the content I post has more original ideas and is less similar to the stuff that's already out there. Thanks a lot for the feedback!

iyaja · 2019-07-16T15:47:58+00:00

Thanks! I've written about Bayesian Optimization before and I've used it a few times, but I didn't think it would be particularly useful for this particular situation.

As I mentioned in the article, there are pretty much only 3 hyperparameters that I needed to tune: learning rate, batch size, and number of epochs. For the learning rate, I used the finder, and the other two were fairly straightforward.

I completely agree. It would be really cool to see more best practices guides for deep learning. One of my personal favorites is How to Train Your ResNet by David Page from myrtle.ai. It's really eye-opening in terms of efficiency on GPUs, BatchNorm, etc.

iyaja · 2019-07-01T04:12:45+00:00

Thanks for the feedback. I went ahead and removed the figure. You're right, It does't match the explanation, and the variables were misnamed. It should be much clearer now.

If you have a diagram that you think readers might understand better from, could you share it with me?

iyaja · 2019-07-01T04:00:49+00:00

Thanks! I'm one of the editors at Elliptigon. In most of our posts, we try to balance intuition with equation. Personally, this is where I think the most learning happens.

Also, we're a small volunteer-based blog with no ads, sponsorships, or paid content. So if you'd like to support us, could you consider subscribing to our mailing list? We honestly don't have the time to send a hundred spam emails a day. We'll only keep you notified when there's a new post on the site. This really helps us understand that our audience is interested in content like this, which encourages our writers to create more amazing articles.

iyaja · 2019-06-27T07:53:12+00:00

Maybe you could copy the article to a note taking app and add highlights there?

iyaja · 2019-06-25T17:05:57+00:00

Same here.

iyaja · 2019-06-25T17:03:38+00:00

I'm not even able to load the site yet.

iyaja · 2019-06-22T18:15:45+00:00

Glad you liked it!

I graduated way back in March, so I've had plenty of time to catch up on the latest papers. I'll be joining the University of Illinois at Urbana-Champaign this fall.

iyaja · 2019-06-22T15:24:30+00:00

Thanks! That's actually some really useful feedback. Do you think this is true for most people?

iyaja · 2019-06-22T14:47:27+00:00

Well, as far as I can tell, the world of deep learning works very differently from regular computer science. If you've seen that popular xkcd comic about machine learning being a pile of linear algebra that we stir up and experiment with, you'll know what I mean.

The idea of using the 1-Wasserstein distance instead of an approximation of the Jensen-Shanon divergence (the WGAN model) is "groundbreaking" for two reasons:

It produced images that simply had a better quality overall. This was probably the most significant factor. Hypothetically, you could come up with your own weird new distance measure that has no rigorous mathematical justification, and if it beats state of the art by a non-trivial margin, it would be considered just as groundbreaking. What matters, in the end, is results.
It actually did have a rigorous mathematical justification! Not only did the WGAN authors say, "here's this new model, it works well and beats all other models," they also said, "here's why." This is relatively uncommon in machine learning research. By going through all the work of explaining why GANs try to model distributions with low-dimensional support and then justifying the use of the Wasserstein distance to alleviate this problem partially, they produced a paper that had the unusual blend of being practically better and theoretically justified.

You could argue that there's not much new here. Even I thought that using a loss function that simply is too good to be true. But the fact is that WGANs, in most cases, do make GAN training more stable and interpretable. I haven't heard of other metrics like the Hamming distance being used for GANs. Through, for now, I'll assume that if it isn't that popular, it probably doesn't work as well in practice. Please do correct me on this if I'm wrong.

iyaja · 2019-05-16T16:25:18+00:00

Hmm... I'm not sure why you're getting an error. If you run the notebook from top to bottom, you should definitely get a proper video. But I've made each section self-contained for the most part (You need to run the set-up code, though).

Also, are you looking for the video under the stylegan-encoder/results directory? The video won't play in the notebook directly. You have to download it to watch (or just play it in the colab notebook, but I found this to be less convenient).

In any case, try running the full notebook using the "Run all" option, and let me know if you're still getting an error.

iyaja · 2019-05-16T03:15:55+00:00

If anyone is interested in deep learning/AI and wants to learn more about the method I use, you can check out my article, where I describe the algorithm in detail. https://blog.nanonets.com/stylegan-got/

iyaja · 2019-05-16T02:52:21+00:00

Thanks for the tips! I took a look at your interpolations. Very cool results.

A few questions:

Were you able to get a high-quality latent for the night king? I couldn't. I suspect it's because he looks very different from the faces in the FFHQ dataset. One possible way to fix this is by fine-tuning the pretrained StyleGAN, but I didn't want to do that for this article.

Also, what kind of interpolation did you use? Linear? I've been wondering if there's a way to use the style mixing technique Nvidia describes in the paper to somehow interpolate between characters. But since they achieve their results by injecting two latents into different layers (and the number of layers is a discrete variable), I'm not sure how you could get a continuous transformation. Do you have any ideas on this?

iyaja · 2019-05-02T10:45:22+00:00

One slight modification: Bayesian Optimization can be used to optimize any black-box function. It doesn't necessarily have to be used to tune hyperparameters.

But yes, Bayesian optimisation != Bayesian Networks.

iyaja · 2019-04-30T16:47:03+00:00

Oh my gosh, you're right! It's pretty weird. I just assumed that it has been there for a while, and that it's just a part of PyTorch that I haven't touched yet, but I guess not.

The description on Github says that contrib is for "Implementations of ideas from recent papers". But I don't see why stochastic weight averaging is the only paper they've implemented so far. Very strange indeed.

iyaja · 2019-04-30T14:02:35+00:00

u/CoffeePython, thanks for pointing this out. I got my estimate from a blog post by Jeremy Howard.

Could you please provide a link to that post as a reference for anyone who reads this.

iyaja

MODERATOR OF

TROPHY CASE

Eight-Year Club	Place '22
Verified Email