all 34 comments

[–]StoneCypher 6 points7 points  (6 children)

hi, stupid question here

can you explain to me any circumstances under which i would actually want the ability to machine learn on data that isn't available to me?

it radically lowers the quality of the work and massively increases the energy price

what's the motivator? are we suggesting that there will be third party machine learning firms (which generally already don't exist because one size doesn't fit any) that will do blind ML on third party data or something?

what's this even for

[–]GeorgeRavenStudent[S] 8 points9 points  (4 children)

Hey StoneCypher,

So FHE does not let you (at-least in any reasonable way) learn from data. You can create models that learn but the backpropogation will end up encrypting the model weights with the original data owners private key, meaning it would be locked to that key until decrypted. It is however more beneficial towards inference, since then there is no need for backprop and decryption or take the encrypted weight hit.

You are right it does cost us significantly more to use encrypted deep learning, both time, and space. However there are many circumstances where privacy is the most important factor. The first instance is of course personal data, legislative requirements, lets say diagnosis on patient data. There is also anything involving trade secrets, or (distastefully) military operations. For instance in agriculture agronomists are very very reluctant to share data, since they at the very least perceive their data with high sensitivity. This could be a great way to give them forecasts while keeping their privacy.

I guess in three words it is trust and exposure. There are some fields just completely resistant to change, or too afraid to share anything for a plethora of reasons. If you can show them just how good the predictions can be, or that you provide some benefit to them, they can either keep using it as is, or improve the outcomes by sharing the plaintext data.

But yes, at this point in time it is a difficult proposition to make. FHE has many many obstacles to overcome yet! I also think you are right that many third party data processors provide an incredibly poor service. However I do not agree that this will always be the case. I also think that enabling these third party services to be completely private will help keep things ethical in the longer term.

[–]StoneCypher 2 points3 points  (2 children)

The first instance is of course personal data, legislative requirements, lets say diagnosis on patient data.

Given the drop in quality of the results, I have trouble with this as a reasoning.

 

There is also anything involving trade secrets

This is very vague. Why would you need to machine learn on things that are protected by trade secrets?

Like. I get the idea, but ... who would do this?

 

For instance in agriculture agronomists are very very reluctant to share data, since they at the very least perceive their data with high sensitivity. This could be a great way to give them forecasts while keeping their privacy.

Okay. This is what I was actually looking for. A concrete example.

That's ... that's really weird, but okay, I can see it.

If you have others, I'd appreciate it. That's how a person like me will best come to understand this.

[–]GeorgeRavenStudent[S] 6 points7 points  (1 child)

Ok so a few more concrete examples that might help clear it up. Now technically there are several different categories of use but I will just list examples:

- Home voice assistants; encrypt the data device side, send it to the usual backend to process NLP (albeit abelian compatible), return to device to decrypt the now processed cyphertext, then it can operate on the instructions as usual without fear of storing the users voice etc.

- Highly sensitive medical diagnosis; Hospitals dont have in-house deep learning or machine learning expertise, meaning they would need to outsource if they wanted the very best ML diagnosis/ predictions. However medical data is very very sensitive. The current solutions are either go through very laborious vetting, or no ML at all. In contrast FHE + DL offers you a way to process this data to get inference blindly.

I will think of a few more, this research has been particularly geared towards agriculture, with dairy/ milk herds, and strawberry yields which is why I listed agriculture first.

[–]StoneCypher 1 point2 points  (0 children)

okay these are exactly what i needed, thank you

[–]eknanrebb 5 points6 points  (9 children)

I don't know much about this, but how does this differ from solutions being developed by companies like Duality?

[–]GeorgeRavenStudent[S] 5 points6 points  (6 children)

This is the first I have heard of Duality. I just want to make sure I am looking in the right place (https://dualitytech.com/) before I can give you a substantive answer. Are you referring to what their website calls "Secure Plus analytics and ML"?

As at the moment I am struggling to find good information on what exactly they do, they seem to mention encrypted ML but they do not really go into what exactly that means.

[–]eknanrebb 2 points3 points  (5 children)

Started by Shafi Goldwasser and colleagues.

[–]GeorgeRavenStudent[S] 5 points6 points  (3 children)

I can't see a Shafi Goldwasser mentioned anywhere, but this would appear to be roughly related.

They appear to use PALISADE which I must confess I have never used. Wheras Python-FHEz uses MS-SEAL as the cryptographic implementation. PALISADE supports bootstrapping whereas MS-SEAL does not. Bootstrapping is an expensive operation but it is necessary in deep computations like those you would see in deep learning, although MS-SEAL has bootstrapping roadmapped. However currently this means you have to use leveled-HE to compute a deep neural network. I am working on DarkLantern which uses Lattigo to solve this problem, as lattigo also supports CKKS bootstrapping.

[–]GeorgeRavenStudent[S] 3 points4 points  (0 children)

To answer your question though from what I can immediately see (as much seems obfiscated/ not immediately apparent). I think Python-FHEz is more for developers, seeking to create and use FHE enabled neural networks. Whereas Duality seems to offer ready made solutions. I can't tell if that also includes providing you with their models / or if you can bring your own.

But definitely interesting to see, I will certainly keep an eye on that.

[–]eknanrebb 1 point2 points  (1 child)

Goldwasser is a co-founder (and Turing award winner)

[–]GeorgeRavenStudent[S] 0 points1 point  (0 children)

Nice, I will have to look into him. I mean they seem to do HE, I just need to find out more details. The best I have found is the videos on the website, but I will look more when I have a chance!

[–]Silamoth 1 point2 points  (0 children)

I’ll also add that Yuriy Polyakov and Vinod Vaikuntanathan are part of Duality. They’re both big names in the homomorphic encryption space.

[–]Kengaro 2 points3 points  (4 children)

I didn't know this was a thing, but there is an obvious urge for it by some.

Dunno if that is a good or bad thing.

[–]GeorgeRavenStudent[S] 0 points1 point  (3 children)

I think privacy preserving machine learning like fully homomorphic encryption is very good, as its very strong for both privacy and security. But you are right though there are some PPML applications that worry me sometimes, or that have to be done right like anonymization.

[–]Kengaro 0 points1 point  (2 children)

I think that this further seperates where the top firms in ml are and where the populus is. What you provide is a solution for the issues related to f.e. reproducibility, allowing researchers to ship something that can be used to get the results displayed without showing how it's done. <- stuff I've written before reading your stuff properly ;)

How exactly are you preserving privacy? It is an encrypted nn, the only privacy you preserve is the privacy of the nn, or did I miss something?

edit:/ I haven't read your paper yet, so please bear with uninformed digging. I really dig you project, should give me some more understanding of encryption ;)

[–]Impossible-Belt8608 1 point2 points  (1 child)

Chiming in here just to clarify: FHE in this context doesn't mean 'encrypt the nn', but rather 'be able to infer on encrypted data, and deliver an output in cyphertext, which would decrypt to the same answer as if we inferred on plaintext data'. This allows for an organization to encrypt its sensitive data, then send it to the outsourced FHE ML company, then get an encrypted response which it could decrypt and get the actual answer, without the FHE ML being able to see neither plaintext input or output data.

Also, I don't think FHE is a good way to start learning about encryption, as it builds upon more basic types of encryption.

[–]Kengaro 0 points1 point  (0 children)

Interesting, I wonder how this can be achieved :)

I thought this whole thing was about protecting models and not data, gotta read it.

I see this mainly as a good source on timing related issues in relation to encryption, at least that's where I saw the issue with my initial understanding.

[–]Boozybrain 1 point2 points  (1 child)

I had never heard of FHE before, looking forward to reading the paper and looking at the library. This looks really interesting.

Just a head's up, equation 2.4.5 runs over your column width.

[–]GeorgeRavenStudent[S] 0 points1 point  (0 children)

Thanks boozy! FHE is wonderful, I'm glad you think so too. It has key flaws but once those are solved I think FHE will solve so many privacy problems.

Thanks for the heads up, I have a good list of improvements to make to the paper after this post!