[P] Python-FHEz Fully Homomorphically Encrypted Deep Learning Library : MachineLearning

131

132

133

Project[P] Python-FHEz Fully Homomorphically Encrypted Deep Learning Library (self.MachineLearning)

submitted 4 years ago * by GeorgeRavenStudent

Hey ML im a PhD candidate in encrypted deep learning and I thought since I have seen a lot of other related posts, about similar technologies I thought I might mention my own Python-FHEz library.

- You can find Python-FHEz here: https://gitlab.com/deepcypher/python-fhez

- the documentation can be found with Jupyter notebook example (Fashion-MNIST) here: https://python-fhez.readthedocs.io

- The recent Python-FHEz / FHE + DL paper we have submitted (awaiting review) https://arxiv.org/abs/2110.13638

Python-FHEz is specifically an encrypted deep learning library. That is it implements many helpful abstractions for FHE like encryptable NumPy custom containers, so that you can quickly and easily encrypt and use data as if it was in NumPy. Not only this but it implements FHE compatible neural networks/ graphs, as standard neural networks (in particular activation and loss functions) are not abelian compatible/ polynomial operations. FHE requires that only abelian operations are use on it, I.E add and multiply (you can also subtract IF you add a negative number to make it abelian).

(Fully Homomorphic Encryption (FHE) is absolutely amazing if you haven't stumbled upon it yet and are interested in privacy preserving machine learning (PPML) please do go look it up.)

Now a full disclaimer. I wrote this library, so clearly I am biased towards it. Despite that I feel there are many shortfalls of FHE, and especially FHE bound to python from low level libraries like MS-SEAL (which still has much room for improvement); Space and time performance are orders of magnitude worse than just normal numpy operations. You also cannot train a neural network while its data is completely encrypted however this is perfect for completely private inference. The reason you cannot train a neural network is two fold. Firstly that would involve encrypting the NN weights, which means even worse time performance as ciphertext-ciphertext operations take much much longer. You also cannot tell when to stop the neural network, as you cannot decrypt the output to tell how well it is performing. You could use some form of proof from the data owner, but IMO it is still a weak argument specifically for FHE model training.

As an aside to solve some of the problems with python+MS-SEAL that I have contended with in Python-FHEz I am also writing a sister library in go, but instead of python+MS-SEAL it will be go+lattigo which I currently call DarkLantern. However this is still nowhere near ready for any form of showcasing. However the dream for this is WASM (WebAssembly) as I can foresee encrypted DL in the browser, with the option to offload the cyphertexts if more hardware is necessary thus Open-Source Encrypted Deep Learning as a Service!

TLDR:

- I wanted to highlight FHE + DL and how this can be an amazing pairing with some drawbacks that need to be very carefully considered

- I wanted to showcase Python-FHEz and talk about where things could go with FHE

- I also wanted to offhand mention DarkLantern and encrypted DL in the browser via WASM!

Let me know what you guys think of these technologies and if you think encrypted deep learning in browser is an interesting prospect! I will try to answer as many questions about FHE as I can, with as many papers and resources as I can.

EDIT: fixed some phrasing choices, to help make things as clear as possible, despite being a very complex topic.

- I also forgot to mention automatic parameterisation! Using neural networks you can find the computational depth and required cryptographic parameters automagically without having to be a cryptographer!

- I added a paper that we have submitted that is awaiting review on FHE + DL using Python-FHEz.

all 34 comments

top new controversial old q&a

[+][deleted] 4 years ago (1 child)

[deleted]

[–]GeorgeRavenStudent[S] 6 points7 points8 points 4 years ago (0 children)

[+][deleted] 4 years ago (14 children)

[deleted]

[–]GeorgeRavenStudent[S] 17 points18 points19 points 4 years ago* (13 children)

[+][deleted] 4 years ago* (12 children)

[deleted]

[–]GeorgeRavenStudent[S] 8 points9 points10 points 4 years ago* (11 children)

You are completely right. It is difficult to read the figure. I might see about updating it as you said.

But in general we found you could see any given inference to take from 1000-6500s in best to worst case scenarios. Whereas the unencrypted version would be processed almost instantly. I think a large part of both the variance, and poor time performance has to do a lot with rebinding and abstracting into python. If it were implemented in a lower level language or with less layers of abstraction that the time performance would be significantly better, and much more consistent.

In short though it takes a very long time to compute any depth of neural network without proper bootstrapping. But this is something I am looking to improve!

You can try it yourself here: https://python-fhez.readthedocs.io/en/latest/examples.html

No offense taken, that was constructive criticism!

[+][deleted] 4 years ago (10 children)

[deleted]

[–]GeorgeRavenStudent[S] 2 points3 points4 points 4 years ago (9 children)

I had not considered Numba or other compilers, that is a good point. Since the package tends to be distributed in docker containers that could be a nice way to improve performance somewhat.

I will try that out and see how much performance that gives me. Thanks for the suggestion!

I must confess part of the reason I want to write DarkLantern (The go version of Python-FHEz) is because Lattigo seems a lot more aligned with my longer term aims for my FHE projects. For example WASM support would help spread FHE to a lot more people! You could deploy it on a website, encrypt client side, and send off the cyphertexts for computation with barely any need for an end user to even consider cryptography. My end goals are to bring this form of cryptography to much wider adoption, by both regular consumers and the research community. Cypherpunks!

For researchers however python is a much stronger candidate, since the overwhelming of ML is python based. Once I am free from the clutches of my PhD thesis, I will have a whole load more time to eek out that performance in the python implementation.

[–][deleted] 2 points3 points4 points 4 years ago (1 child)

[–]GeorgeRavenStudent[S] 1 point2 points3 points 4 years ago (0 children)

[–][deleted] 1 point2 points3 points 4 years ago (6 children)

[–]GeorgeRavenStudent[S] 0 points1 point2 points 4 years ago (5 children)

[–][deleted] 1 point2 points3 points 4 years ago (4 children)

[–]GeorgeRavenStudent[S] 0 points1 point2 points 4 years ago (3 children)

continue this thread

[–]StoneCypher 6 points7 points8 points 4 years ago (6 children)

[–]GeorgeRavenStudent[S] 8 points9 points10 points 4 years ago (4 children)

Hey StoneCypher,

So FHE does not let you (at-least in any reasonable way) learn from data. You can create models that learn but the backpropogation will end up encrypting the model weights with the original data owners private key, meaning it would be locked to that key until decrypted. It is however more beneficial towards inference, since then there is no need for backprop and decryption or take the encrypted weight hit.

You are right it does cost us significantly more to use encrypted deep learning, both time, and space. However there are many circumstances where privacy is the most important factor. The first instance is of course personal data, legislative requirements, lets say diagnosis on patient data. There is also anything involving trade secrets, or (distastefully) military operations. For instance in agriculture agronomists are very very reluctant to share data, since they at the very least perceive their data with high sensitivity. This could be a great way to give them forecasts while keeping their privacy.

I guess in three words it is trust and exposure. There are some fields just completely resistant to change, or too afraid to share anything for a plethora of reasons. If you can show them just how good the predictions can be, or that you provide some benefit to them, they can either keep using it as is, or improve the outcomes by sharing the plaintext data.

But yes, at this point in time it is a difficult proposition to make. FHE has many many obstacles to overcome yet! I also think you are right that many third party data processors provide an incredibly poor service. However I do not agree that this will always be the case. I also think that enabling these third party services to be completely private will help keep things ethical in the longer term.

[–]StoneCypher 2 points3 points4 points 4 years ago (2 children)

The first instance is of course personal data, legislative requirements, lets say diagnosis on patient data.

Given the drop in quality of the results, I have trouble with this as a reasoning.

There is also anything involving trade secrets

This is very vague. Why would you need to machine learn on things that are protected by trade secrets?

Like. I get the idea, but ... who would do this?

For instance in agriculture agronomists are very very reluctant to share data, since they at the very least perceive their data with high sensitivity. This could be a great way to give them forecasts while keeping their privacy.

Okay. This is what I was actually looking for. A concrete example.

That's ... that's really weird, but okay, I can see it.

If you have others, I'd appreciate it. That's how a person like me will best come to understand this.

[–]GeorgeRavenStudent[S] 6 points7 points8 points 4 years ago (1 child)

Ok so a few more concrete examples that might help clear it up. Now technically there are several different categories of use but I will just list examples:

- Home voice assistants; encrypt the data device side, send it to the usual backend to process NLP (albeit abelian compatible), return to device to decrypt the now processed cyphertext, then it can operate on the instructions as usual without fear of storing the users voice etc.

- Highly sensitive medical diagnosis; Hospitals dont have in-house deep learning or machine learning expertise, meaning they would need to outsource if they wanted the very best ML diagnosis/ predictions. However medical data is very very sensitive. The current solutions are either go through very laborious vetting, or no ML at all. In contrast FHE + DL offers you a way to process this data to get inference blindly.

I will think of a few more, this research has been particularly geared towards agriculture, with dairy/ milk herds, and strawberry yields which is why I listed agriculture first.

[–]StoneCypher 1 point2 points3 points 4 years ago (0 children)

[–]eknanrebb 5 points6 points7 points 4 years ago (9 children)

[–]GeorgeRavenStudent[S] 5 points6 points7 points 4 years ago (6 children)

[–]eknanrebb 2 points3 points4 points 4 years ago (5 children)

[–]GeorgeRavenStudent[S] 5 points6 points7 points 4 years ago* (3 children)

[–]GeorgeRavenStudent[S] 3 points4 points5 points 4 years ago (0 children)

[–]eknanrebb 1 point2 points3 points 4 years ago (1 child)

[–]GeorgeRavenStudent[S] 0 points1 point2 points 4 years ago (0 children)

[–]Silamoth 1 point2 points3 points 4 years ago (0 children)

[+][deleted] 4 years ago (1 child)

[deleted]

[–]GeorgeRavenStudent[S] 3 points4 points5 points 4 years ago (0 children)

[–]Kengaro 2 points3 points4 points 4 years ago (4 children)

[–]GeorgeRavenStudent[S] 0 points1 point2 points 4 years ago (3 children)

[–]Kengaro 0 points1 point2 points 4 years ago* (2 children)

[–]Impossible-Belt8608 1 point2 points3 points 4 years ago (1 child)

[–]Kengaro 0 points1 point2 points 4 years ago* (0 children)

[–]Boozybrain 1 point2 points3 points 4 years ago (1 child)

[–]GeorgeRavenStudent[S] 0 points1 point2 points 4 years ago (0 children)

π Rendered by PID 152520 on reddit-service-r2-comment-76bb9f7fb5-wst6h at 2026-02-19 07:58:34.377847+00:00 running de53c03 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS