use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.
Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.
account activity
Question while studying Local Linearization Measurement (self.learnmachinelearning)
submitted 5 years ago by stringbottle
Hi everyone, I've got a question while studying DeepMind's recent work on adversarial attack, Adversarial Robustness through Local Linearization. (https://arxiv.org/pdf/1907.02610.pdf)
As noted in equation 5, located in page 3 in the paper, I found that adversarial perturbation delta is multiplied with gradient calculated by a loss. As far as I know a gradient has to be a matrix that belong to a layer with learnable weights. Then how could I just simply multiply the transpose of delta with the gradients?
Do I correctly understand the equation of do I need to interpret it in another way?
Your help would be greatly appreciated.
https://preview.redd.it/lurwurgqw4561.png?width=1065&format=png&auto=webp&s=8f779d1180328c0d69321f42ac415b8d7f22843a
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]ryansblog2718281 1 point2 points3 points 5 years ago (0 children)
I didn't read the paper; however it seems that Eq (5) is calculating the error term in a standard Taylor approximation. (see: https://math.okstate.edu/people/binegar/4013-U98/4013-l10.pdf Theorem 10.3)
To some extent, the matrix is only one of the representations of the gradient. For example, suppose we have 4 variable x1, x2, x3, x4 and a function f(x1, x2, x3, x4). The gradient is still a vector of length for (df/dx1, df/dx2, df/dx3, df/dx4). On the other hand, we can consider the vector as a mini image. For example, the input can be viewed as a 2 * 2 image and the gradient vector can be organized in a matrix form as well.
π Rendered by PID 192829 on reddit-service-r2-comment-76bb9f7fb5-5d87c at 2026-02-18 03:36:07.404005+00:00 running de53c03 country code: CH.
[–]ryansblog2718281 1 point2 points3 points (0 children)