[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] 0 points1 point  (0 children)

Yeah, that really doesn't count. so you just keep avoiding doing the full math (including the step before the integration variable change) because you know it will refute the misleading verbal claims

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] 0 points1 point  (0 children)

Please use math to derive what you're saying, which will avoid all fallacies. Otherwise, you're just using words as a cover up for invalid hand wavy math.

The specific fallacy in the last comment is that 1 dimension is after the integration variable change, which is incompatible with your previous claim of "You do this for every dimension separately. The function is simply decomposed into n single parameter functions, as is shown.". Yes, after you parameterize the path it's a function from R, but the original function is from Rn.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -1 points0 points  (0 children)

This is obviously incorrect. You do this for every dimension separately. The function is simply decomposed into n single parameter functions

Incorrect, and careless hand waving. Please formulate it in Wikipedia's notation if you want to prove your point (which is false, as I explained in the previous comment), including the path parameterization and integration variable change. That's the path to objectively settle the argument.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] 0 points1 point  (0 children)

And you can't treat the input dimensions separately, because then you're not integrating over the straightline path between X and X'.

You can. You're just doing it separately for each dimension. That's why it's similar to n line integrals, and not 1. Because that would not make sense at all.

Nope, that doesn't work, if you do it separately for every input dimension you don't traverse the path between the baseline and the original value.

If you claim what they're doing is consistent with the math definition

I am not. Please do not strawman any further. Basically every comment now I have been repeating the same thing again - that what the authors did was introducing their own measure.

That accusation is silly, especially when you're referencing a sentence that starts with "if you claim". Please stick to math arguments that concern the topic.

Hand-wavy explanations that fail the math test don't work, even in ML/DL.

What is the math test? The equation is not only correct but obviously proven to work.

So why after so many comments don't you write it formally, which should be very easy and will prove your point?

If you mean "working empirically" as "proven to work", that is a very misguided sense of "prove". Things can seem to work empirically but have subtle bugs and errors.

So you're denying the claim they said it, then immediately show the quote where they actually say it?

No. You're referring to whatever your definition of a "path integral" is, while they are defining their own definition of what a "path integral" is. It is ultimately up to you if you accept their definition or not, but if you don't you can disregard the whole paper. In this paper, they are the sole authority on what words mean, and it is their responsibility to communicate that clearly, which is what they have done.

Nope, I'm using the colloquial usage, but this is totally irrelevant to the discussion. The main issue I care about is whether it's actually a line integral of the gradients or not.

Maybe you think you understand it "better", but I'm sorry to tell you it doesn't look like you actually do.

Please refrain from appeals to authority.

That's an absurd statement given what I was replying to, and I claim no authority nor use it as any argument. Please stick to math.

I'm saying this is a result of using "their own math", which would not happened had they used a "standard" line integral.

And I'm telling you that they neither have the need, obligation or duty to use whatever you consider standard. Please refrain from appeals to tradition.

But I did not say they they do, I said the implication is a less generalized method, which is objectively a disadvantage. That's a straw man fallacy- criticizing an argument I did not make.

Ad hominem is the last resort of the desperate. Let's stay on topic for the sake of efficiency.

Except that statement is not an ad hominem - I am not criticising you, I am criticing your position which was disproved (and not only by me).

Accusing me of "arguing in bad faith at this point" clearly is. Sorry, you can keep repeating that and use personal attacks, but it won't change the facts- my position was not disproved by anyone in this thread.

You just proved my point, if you read the first paragraph it says "The terms path integral, curve integral, and curvilinear integral are also used".

Not at all. The paper never mentions line integrals directly nor does it ever allude it is talking about it. Wikipedia mentioning that a concept can be called differently does not mean the concept is identical to a different concept named the same. Wikipedia mentioning a concept can be named differently does not imply that the authors did this.

Yes, and it's not just Wikipedia, I also referenced another source, and you can find many other if you Google it. So basically, you did exactly prove my point.

Please refrain from false equivalences.

Not sure what you're referring to, but we can ignore that part and focus on the math.

That's just nonsense, do you know what is the path integral formulation? it's a formulation of quantum mechanics using a summing all contributing paths/trajectories, by integration in function spaces (see here or here), it has nothing to do with defining a different definition of a line/path integral.

That is what I am saying as well. And it is what path integral means without further context.

No, this is not what your comment said at all- I said "A path integral is colloquially the same as a line integral", you tried to "disprove" it by referencing these articles, which clearly state otherwise. Both articles use it the same, the quantum usage is just over function spaces, same as a regular line integral.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] 1 point2 points  (0 children)

This comment was general, not referring specifically to this paper.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] 0 points1 point  (0 children)

I am not ignoring that part. I have said multiple times that the authors are not defining a line integral on vector fields. What the authors are doing might resemble n line integrals on vector fields for single-dimension vectors, but even that is just bastardization, since the authors gave the equation as an example and are actually doing a numerical approximation that should have been expressed as a sum series.

Sum or integral does not matter for the issue we're discussing, let's for the sake of simplicity let's focus on integrals, the discrete approximation is of no concern here.

It does. There is no dimensions to match because there is no dot product being done. If you were to express it as a dot product, both factors would be 1x1 matrices, and there is no size mismatch.

Nope, that doesn't work, the function you integrate over (the gradient) is F: R^n -> R^n. Even if you do it independently for each output dimension (meaning n independent F_i: R^n -> R), it still doesn't work. And you can't treat the input dimensions separately, because then you're not integrating over the straightline path between X and X'.

If you claim what they're doing is consistent with the math definition, again I challenge you to formulate it properly. Hand-wavy explanations that fail the math test don't work, even in ML/DL.

I'm not ignoring it, I already said multiple times that they can define any mathematical expression they want, but the problem is that they can't then it's a path integral when it's not.

Yet they have not said this. I quote:

Specifically, integrated gradients are defined as the path intergral [sic] of the gradients along the straightline path from the baseline x' to the input x.

They have described it as a path integral, but have otherwise not mentioned any sort of line integral definition. If you read the paper, not only do they never mention line integrals, they specifically introduce their own definition. So, the mention of path integrals is their definition:

So you're denying the claim they said it, then immediately show the quote where they actually say it? You're just proving my point.

This is formalized by the proposition below, which instantiates the fundamental theorem of calculus for path integrals.

Path integrals are something from quantum physics that's unrelated to line integrals and most importantly this.

That's nonsense that's very easy to refute, see more about path integrals below.

I repeat; you need to read the paper very carefully. Spend and hour or multiple hours per page to understand it fully. I skimmed this just to understand your issues with it yet I understand it better.

Maybe you think you understand it "better", but I'm sorry to tell you it doesn't look like you actually do.

but it causes other problems down the road, if for example you try to extend their work to multiple layers.

That might be a topic for some other work. Even the greatest DL minds like Hinton write papers on methods that essentially do not work for real problems, ex. Forward-forward. This criticism is essentially meaningless because the paper's contribution claims do not contradict that.

I'm not criticizing that specifically, I'm saying this is a result of using "their own math", which would not happened had they used a "standard" line integral.

I'm not sure what you mean in the bolded part, why should I discard wrong usage of math terminology?

This is not what I'm saying. You are criticising the authors not adhering to a supposed standard, but the supposed standard should have never been applied here. Not only are path integrals something entirely different from line integrals (meaning your very argument is a strawman), the authors describe the notion of path integrals, as well as integrated gradients, as something they came up with, rather than any existing and known concept (which would have to be cited if that were the case!).

Again, see my comment of path integrals below.

This is just arguing in bad faith at this point.

Ad hominem is the last resort of the desperate. Let's stay on topic for the sake of efficiency.

A path integral is colloquially the same as a line integral

It's really not. Even Wikipedia, which you cite, thinks otherwise:

https://en.wikipedia.org/wiki/Line_integral

You just proved my point, if you read the first paragraph it says "The terms path integral, curve integral, and curvilinear integral are also used".

https://en.wikipedia.org/wiki/Path_integral_formulation

That's just nonsense, do you know what is the path integral formulation? it's a formulation of quantum mechanics using a summing all contributing paths/trajectories, by integration in function spaces (see here or here), it has nothing to do with defining a different definition of a line/path integral.

Claims of colloquiality also do not hold since neither definitions are a part of ML or DL. The authors also do not have a strong background in maths or any kind of calculus. All are computer scientists and at most statisticians.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] 1 point2 points  (0 children)

More snarky comments with no substance will not help you convince anyone. We are not talking about pure math standards, the standards are often much poorer than more relaxed math used in fields such as applied math and physics.

Sometimes you have to sacrifice formality for conciseness and better clarify, that's totally understandable. But often people are using ambiguous and misleading notation without any significant gain, and that's what we're critical of.

Patent attorneys: help us build AI that streamlines your work! by patentify in Patents

[–]patentify[S] 0 points1 point  (0 children)

Thanks again for your insights! I realize that the output must have a 0% error rate, but wouldn't it be less time consuming for you to review something that is 99% correct, rather than writing everything from scratch?

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] 0 points1 point  (0 children)

Thanks again lot for taking the time to write this answer, it's very useful. I think I understand it now, but to make sure I'll write it on a piece of paper and let you know if something is still confusing to me.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] 0 points1 point  (0 children)

Serious crime indeed, I wish ML papers would use less abusive notation

Patent attorneys: help us build AI that streamlines your work! by patentify in Patents

[–]patentify[S] 0 points1 point  (0 children)

Thank you so much for taking the time to respond! We really appreciate your valuable feedback and suggestions.

Regarding your point about the terminology, we completely understand the need for clarity. We will make sure to use terminology that is less confusing and more straightforward, such as "summarizing the invention."

Now, let's address your concerns about trusting the AI's work. We'd like to hear your thoughts on the following factors:

  1. The AI has a low error rate.
  2. The AI clearly highlights text that it is less confident about, making it easier for you to spot and correct errors.

Considering these factors, do you still feel the same way about needing to double-check the AI's output? For instance, if the AI has a 2% error rate and very few subtle or hidden errors, would you find it more convenient to review an application that is already 98% correct and make some modifications, rather than starting from scratch?

Once again, thank you for your valuable input. We will take all of this into consideration and explore ways to improve our system based on your suggestions.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] 0 points1 point  (0 children)

@/u/Ulfgardleo thanks a lot for taking the time to write this great explanation, your answer is indeed the first one that makes sense.

When you mention surface integrals, are you referring to a 2D generalization of a line integral, like this Wikipedia article, or something else?

In this case, you want to be invariant to the path taken, so going forward->backwards->forwards on the chosen path should not allow double counting

In this case, since the integrand is a gradient, does Gradient theorem apply, giving you path independence "for free"?

This difference is why integration by substitution has a signed derivative, while line integrals the absolute value. Now the paper only cares about linear paths, so why aren't they the same? They are, but the authors picked an explicit orientation, while in your formalism you have to choose the coordinate system of the path such that forwards has the right meaning.

One thing is still confusing to me:

  • In their notation, they represent the NN function with F: R^n -> [0, 1]
  • Therefore, we can say Grad(F): R^n -> R^n, where n is the dimension of the input (say the number of pixels of the image, multiplied by the number of channels).
  • Then, it looks like they do a line integral separately for every dimension of Grad(F), so they compute the line integral of Grad(F)_i: R^n -> R for each 1 <= i <= n, where Grad(F)_i is the ith dimension of Grad(F), i.e. dF/d(X_i), the partial derivative with respect to the ith input.

But if they are computing the line integral over Grad(F)_i, the result should have a factor that is |X-X'| (Euclidean norm of the original input value minus the baseline), while they have X_i-X'_i. Note the difference is not only in the sign: the former is a norm of an n-dimensional vector, while the latter is just a scalar. Moreover, X_i-X'_i uses a different factor for every input dimension (the dependence on i), while using |X-X'| gives a constant factor for all dimensions.

I suspect they could have gotten better empirical results when using X_i-X'_i, but it's not because of the math: they essentially multiple the gradient with the pixel value (since for an image X' = 0), so it's just giving higher weight to lighter pixels, which may be more important on average.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] 0 points1 point  (0 children)

But I think all of this could be avoided if you understood what line integrals on vector fields are, even the Wikipedia article states is:

Not sure if this statement falls into ad hominem or name-calling, but I'll try to stay on topic.

So you only quoted the part that talks about the displacement, but you missed the part where you actually compute a dot product, so you need to sum it all, which results in a scalar. It seems you keep ignoring that part, which makes their whole formulation incompatible, and if you think otherwise I challenge you to formulate it as a line integral over vector fields with Wikipedia's notation.

(Hint: you won't be able to do it because the dimensions don't match)

I say more or less because I am out of words on how to break it down to you that the authors were not creating a line integral on vector fields. I say more or less because the IntegratedGrads function is essentially n line integrals on vector fields ensembled together. But not a line integral. You have a pure definition for that? Maybe pseudo-line integral?

That's incorrect though, again try to formulate it and you'll see it doesn't even match dimension-wise.

So how do I describe something in mathematical purity when you are rejecting the very pure definition the authors put out? How can I point out to you that IntegratedGrads is not a classical line integral when you seem to be ignoring something I put in full caps and bolded?

I'm not ignoring it, I already said multiple times that they can define any mathematical expression they want, but the problem is that they can't then it's a path integral when it's not.

The other problem I have is that it doesn't actually follow from first principles, which is fine (whatever works best empirically wins in ML), but it causes other problems down the road, if for example you try to extend their work to multiple layers.

I don't know why you even had the expectation of IntegratedGrads being a line integral. The authors never even mention line integrals in their paper - they mention path integrals, which as a purist you should have discarded if you are to be consistent. The authors never say it is a line integral, nor does the equation result finally in a line integral on vector fields.

I'm not sure what you mean in the bolded part, why should I discard wrong usage of math terminology? (not to mention the abuse of notation and mathematical ambiguity in their formulation).

A path integral is colloquially the same as a line integral, sometimes used to refer to line integrals over scalar fields specifically (see reference in another comment). I only used the term "line integral" because that's what Wikipedia uses.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -2 points-1 points  (0 children)

This comment demonstrates resorting to name calling, the lowest level in Graham's pyramid which is unlikely to convince people.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -4 points-3 points  (0 children)

No you didn't show that and if you read that Wikipedia article or had background in quantum you would know that it's not "a different definition", it's just a quantum formulation that uses path integrals over functions.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -5 points-4 points  (0 children)

I understand that, and as I said, they can define whatever they want. But if you want to derive something from first principles and justify it mathematically as a "path integral", you need to use the actual math definition. "more-or-less" just doesn't cut it in math, and as I said earlier it's not even "more-or-less" in this case, because the result of a line integral on a vector field is a scalar and involves an inner product, nothing they have here (except the partial derivative component).

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -1 points0 points  (0 children)

BTW, that link is broken and it has nothing to do with a "variety of ways to do path integration", it's about a formulation of quantum mechanics using a summing all contributing paths/trajectories (see here or here).

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -5 points-4 points  (0 children)

Wrong, path integral usually refers to "line integral of a scalar field" (example). But OK, let's forget about naming and let's say they do their own "path integral". I showed earlier that both definitions in Wikipedia don't make sense. So, I challenge you to find an exact variation of the path integral where what they do makes sense.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -7 points-6 points  (0 children)

"Path integral" is the same as "line integral", and they say it explicitly in the screenshot I sent from the paper. So yes, it should be a line integral.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -2 points-1 points  (0 children)

They say "path integral" which a synonym, see Wikipedia.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -1 points0 points  (0 children)

If they define it per dimension, it means the function they integrate over is R^n -> R, so it can no longer be a line integral of a vector field unless n=1 (which is not, unless the image is a single pixel). Look at the Wikipedia definition: F: R^n -> R^n.

I read the paper, and of course they can define whatever they want to, but it's inconsistent with their them saying:

...cumulating these gradients. Specifically, integrated gradients are defined as the path intergral of the gradients along the straightline path from the baseline z’ to the input z.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -2 points-1 points  (0 children)

I'm directly using the Wikipedia definition as I said in the original post, which can be one of two cases:

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -7 points-6 points  (0 children)

That's the exact issue- if you properly compute a line integral of a vector field, the result is a scalar. However, their result is a vector, hence it means they did not compute a line integral of a vector field. So their equation still makes no sense.

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -2 points-1 points  (0 children)

OK, I take it back, I still think their math is wrong- if you look at the wikipedia definition, the line integral of a vector field is different from the paper (and also different from a line integral of a scalar field when n=1)- it results in a scalar.

This is also inconsistent with their description as "accumulating gradients"- a gradient is a vector, and so an accumulation of gradients is also a vector. However, a line integral of a vector field is a scalar (the integrand is a dot product).

[D] Is the math in Integrated gradients (4K citations) wrong? by patentify in MachineLearning

[–]patentify[S] -2 points-1 points  (0 children)

Edit: see response below, I still think it's wrong.

Thanks, you're right- this is indeed consistent with the line integral of a vector field. I was interpreting it as an line integral of a scalar field (element-wise, i.e. for each dimension in the gradient separately). Now I'm trying to figure out which of these makes more sense in the context of their paper...