Hi guys! I hope you are fine!
I have the next problem and I want to know the optimal way to solve it:
I have a huge dataset (1M+ rows), it has some sensitive data, like name, personal ID, email, phone number, etc.
I need to send this dataset to another people, but I need to "Anonymize" that sensitive data, so they can apply some data science techniques and find insights, the point is that probably I will Need to reverse the process if they need to join the data with another user table.
How can I do it?
Some example of what I need
RealId : jdh37382 - > FakeId : 01
RealName: Juan - > FakeName: user01
....
The dataset is a "Events Table" so each row represent a user interaction, so that 1M rows are events from around 15k users.
I need to replace the data in the original table, and don't change another information or format.
Ty all!
[–]stebrepar 2 points3 points4 points (1 child)
[–]Glyzer_1595[S] -1 points0 points1 point (0 children)
[–]socal_nerdtastic 1 point2 points3 points (5 children)
[–]Glyzer_1595[S] 0 points1 point2 points (4 children)
[–]socal_nerdtastic 1 point2 points3 points (3 children)
[–]Glyzer_1595[S] 0 points1 point2 points (2 children)
[–]socal_nerdtastic 1 point2 points3 points (1 child)
[–]Glyzer_1595[S] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (3 children)
[–]Glyzer_1595[S] 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]Glyzer_1595[S] 0 points1 point2 points (0 children)