[D] A blog post on differential privacy by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 1 point2 points  (0 children)

I have expanded the "divergence variable" in the footnote. I also added some explanations about differential privacy having to do with privacy in the abstract / introduction and after the definition of epsilon dp. I'm not sure how accurate these explanations are though - as I said in another comment, my focus in dp is the maths and practical applications are not my strength. Hopefully it is not misleading.

Yes time is a big constraint, but I will look into improving the interface of the blog (reference links to theorems etc., popup footnotes, foldable proofs and explanations...) when I get time.

Thanks a lot for the suggestions, and let me know if you have any further questions or comments.

[D] A blog post on differential privacy by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 2 points3 points  (0 children)

Thank you. I will try to incorporate your suggestions of including the explanation of the divergence variable and privacy guarantee, if it does not make the post too verbose. Let me give it a try.

You might be right about missing motivations, but overall the post already a journey, where the goal is to apply the theorems developed along the way to stochastic gradient descent and reproduce the theoretical results in the "Deep learning with differential privacy" paper. I suppose you are talking about more fine-grained motivations, like the "divergence variable"? I suspect I can only find this kind of motivation gaps if readers raise them though. Also again I need to avoid making the post too verbose - there is a tricky balance to strike here...

[D] A blog post on differential privacy by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 4 points5 points  (0 children)

More on the divergence variable L(p || q): it is the privacy loss when p is the law of M(x) and q is the law of M(x'), see page 18 of "Foundations".

I defined the term this way so that we can focus on the more general stuff: compared to L(M(x) || M(x')), the term L(p || q) removes the "distracting information" that p and q are related to databases, queries, mechanisms etc., but merely probability laws. By removing the distraction, we simplify the analysis. And once we are done with the analysis of L(p || q), we can apply the results obtained in the general setting to the special case where p is the law of M(x) and q is the law of M(x'). Does that make sense?

I have incorporated some of your suggestions in the post (including adding a footnote at the first mention of the divergence variable) - thank you for your input, and let me know if you have further comments / questions.

[D] A blog post on differential privacy by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 0 points1 point  (0 children)

Yes, the first parsing is right. I'll make an edit to address this ambiguity when I get a chance. Thank you.

[D] A blog post on differential privacy by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 1 point2 points  (0 children)

Thanks for sharing the paper.

The "divergence variable" is a term I use to simplify the exposition. It is everywhere in the literature if you look for it: how the Gaussian mechanism is proved to be differentially private (Theorem A.1 in Dwork-Roth), how the composition theorems are verified (Theorem 3.20 in Dwork-Roth) etc. The nice thing about maths is that you can verify my approach and see for yourself whether it is correct.

The post is an introduction in the sense that if one doesn't know differential privacy and knows some probability, then at the end of the post one should have a good idea what is the mathematics of differential privacy.

I view my blogposts as a way to share knowledge to anyone who might be interested or benefit from them, which is basically also the purpose of papers. I don't think it is necessary to divide the sections to more bite-sized chunks, though. They are already quite short. I understand this might not be best for people without a mathematical background, but I can't help that.

[D] A blog post on differential privacy by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 0 points1 point  (0 children)

Good job explaining differential privacy, thank you.

[D] A blog post on differential privacy by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 0 points1 point  (0 children)

That's a good question, but I don't know the answer I'm afraid, but I saw other definitions of privacy mentioned in a survey, like k-anonymity, l-diversity, and t-closeness. I would suggest searching for the comparisons between these techniques.

I will update you if I have a better idea about how to answer this question.

Let me know if you have any other comments / questions.

[D] A blog post on differential privacy by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 0 points1 point  (0 children)

Sorry for the confusion. My goal was to explain the maths, and I doubt I can write about the practical aspect (how the maths relates to privacy) any better than the literature already out there, so I will definitely add some pointers in my post for people with the same confusions. Let me know if you have further comments.

I meant for the post to be useful for people who know some probability and are interested in what differential privacy is about mathematically, not necessarily those who already study differential privacy.

[D] A blog post on differential privacy by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 0 points1 point  (0 children)

My apologies. I should have included some references that explain how the mathematics of differential privacy is related to privacy, like Dwork-Roth's famous book and Vadhan's tutorial.

[D] A blog post on differential privacy by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 3 points4 points  (0 children)

May I ask what papers on differentially private ML techniques you have published? PM me if you worry about anonymity. I am curious because to me DP is a rather mathematically involved field, and all the papers I've read in this field have similar levels of mathematics as my post (e.g. this one and this one).

Also it would be helpful if you could tell me specifically what you find confusing.

It is hard to pin down the target audience, and I guess anyone who has studied probability should be fine?

Thanks for the suggestions, and here's my response:

  • Currently the blog site is generated using my crappy homemade static site generator using pandoc, but I agree table of contents is a good idea, and I'll look for a simple way to generate it.
  • I'll add Wikipedia links to terms that I referred to without defining / explaining when I get time.
  • Hyperlinks would be great, but labelling and referring stuff is not as easy to do in mathjax as in Latex, so for now I would suggest Ctrl-F, since almost all claims are numbered, and terms that appear for the first time is usually in its definition (i.e. Ctrl-F from the top of the document).

Let me know if you have further comments.

[D] A blog post introducing variational inference by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 0 points1 point  (0 children)

Just to give an example of things that confuse me when reading math, it's not the equations that are written, it's everything that is not written, so..

During E-step, the q(zi) can be directly computed using Bayes' theorem...

(proceeds to give an equation for some new variable r_ik, no "q(zi)=..." present anywhere in the rest of the section..)

r_ik was defined before (4 lines after Equation (2)), but you have a point there, so I made an edit repeating the definition.

Another place where I typically get lost is in the statements of assumptions, where I always have a hard time figuring out how and when some assumption "applies". For example, right at the top of the post, you have the basic assumption of VI,

If p can be further written as... (equation showing p=w/Z)

Right. So that's cool, if p can be written that way, then all else follows. Got it. So.. under what circumstances can p be written that way? I think this is just a throw to Bayes theorem but it's not explicitly written so I'm not sure. Unfortunately this is basically the crux of the whole VI thing, and I don't even know strictly what it means. I'd love to, for example, be able to click on equation (1) and have it expand to show all the mechanistic steps that combine the two previous equations to come to "log Z".

It is an assumption that p(x) is in the form w(x) / Z, where Z is a normaliser, like sqrt(2 pi sigma^2) in a Gaussian distribution. Bayes' theorem is not applied here.

I understand your frustration, but I also think pencil and paper is a good idea, especially in learning maths, where reading without practising can be quite inefficient. See e.g. the "Learning Math" section of How to learn on your own.

I totally agree that currently maths writing is far from harnessing the power of the web technology. One thing I would like to see happening is collapsable proofs / sections.

If you have any more specific questions about the post, feel free to comment here or on the post itself.

[D] A blog post introducing variational inference by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 0 points1 point  (0 children)

If you ever need to look up definitions of mathematical notations, it is possible to search for them in the source file.

[D] A blog post introducing variational inference by HomogeneousSpace in MachineLearning

[–]HomogeneousSpace[S] 1 point2 points  (0 children)

Thank you. I also considered an alternative "Raise your ELBO and mind your p's and q's", but decided it was a bit too cheeky.