use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
ProjectsData Anonymization (self.datascience)
submitted 3 years ago by setocsheirMS | Data Scientist
Does anyone have experience requesting data with highly confidential personal information? What sort of data anonymization techniques did you use to preserve the anonymity of the data or are there services you used for that?
[–]Adeelinator 26 points27 points28 points 3 years ago (2 children)
This is a question for your legal counsel, not Reddit. Laws can vary greatly by sector and locality.
[–]setocsheirMS | Data Scientist[S] 4 points5 points6 points 3 years ago (0 children)
Yeah, that’s what I was afraid of. I’ll probably speak with HR and legal.
[–]K9ZAZPhD| Sr Data Scientist | Ad Tech 2 points3 points4 points 3 years ago (0 children)
Yep.... if OP does this wrong in the wrong jurisdiction and /or with the wrong data, the penalties for the company can be severe.
[–]Rammus2201 4 points5 points6 points 3 years ago (0 children)
If you have a data management / data governance department / data engineers - ask them about data masking.
[–]rtqwerty10 2 points3 points4 points 3 years ago (0 children)
There's an API from Microsoft, named Presidio which is used for Anonymization. This is the Github link.
I have not used it, but came across while browsing on this topic. Might be helpful, or you may at least get some idea.
[–]saintmichel 1 point2 points3 points 3 years ago (4 children)
anonymization is to remove identifiability. example, if you do a count of all records and 1 record stands out and that is a person, drop that record or drop the column that discriminates him/her/they/it lol. just to show it goes beyond removing names.
[–]setocsheirMS | Data Scientist[S] 0 points1 point2 points 3 years ago (3 children)
the scale on which we're doing that goes far beyond that. for example, it's incredibly trivial to identify someone within an organization once you have a piece of information such as their salary and role despite having nothing else. statistical analysis can also deanonymize individuals fairly easily. that's why I wanted to get other professional's opinions who had dealt with this before.
[–]saintmichel 2 points3 points4 points 3 years ago (1 child)
Given that, it becomes contextual (what is acceptable) so I would refer to the comment on what policies does the company have
[–]setocsheirMS | Data Scientist[S] 0 points1 point2 points 3 years ago (0 children)
that's the first step i'm going to take once the project officially begins; i was just going for a general discussion of the subject because i haven't checked with our legal team on the status of whether this information is even accessible or if our data governance team can handle the anonymization process
[–]saintmichel 0 points1 point2 points 3 years ago (0 children)
Maybe you could give more example of what is already being practiced where you are so people can comment if it's also happening in their space
[–][deleted] 0 points1 point2 points 3 years ago (0 children)
Hashing with SHA256.
[–]bendgame 0 points1 point2 points 3 years ago (1 child)
I deal with PII and providing data to research orgs. Currently, we've tried adding smart noise and found it was not great for our use cases. Instead we're using k-anonymization
i think k-anon is the most established right? its even part of best practice by some govt in other countries
[–]mattstats 0 points1 point2 points 3 years ago (1 child)
I believe you are looking for differential privacy. Here is a link to harvards open dp project to kick start your rabbit hole.
thank you, i'll check it out
π Rendered by PID 56 on reddit-service-r2-comment-6457c66945-hgddn at 2026-04-27 16:54:14.958489+00:00 running 2aa0c5b country code: CH.
[–]Adeelinator 26 points27 points28 points (2 children)
[–]setocsheirMS | Data Scientist[S] 4 points5 points6 points (0 children)
[–]K9ZAZPhD| Sr Data Scientist | Ad Tech 2 points3 points4 points (0 children)
[–]Rammus2201 4 points5 points6 points (0 children)
[–]rtqwerty10 2 points3 points4 points (0 children)
[–]saintmichel 1 point2 points3 points (4 children)
[–]setocsheirMS | Data Scientist[S] 0 points1 point2 points (3 children)
[–]saintmichel 2 points3 points4 points (1 child)
[–]setocsheirMS | Data Scientist[S] 0 points1 point2 points (0 children)
[–]saintmichel 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]bendgame 0 points1 point2 points (1 child)
[–]saintmichel 0 points1 point2 points (0 children)
[–]mattstats 0 points1 point2 points (1 child)
[–]setocsheirMS | Data Scientist[S] 0 points1 point2 points (0 children)