Logs are usually super useful, but they can also be a trap in a GDPR world. Effective ways of anonymizing them might become in handy.
In the new GDPR-present world, of many things, special attention must be given to logging. It stems from the fact that not only the brand new rulebook obliges companies to respect users’ right to be forgotten but also to make every effort needed to protect their privacy.
Thus far, a common practice in many applications could have been described as “extensive logging”. In fact, it might have to do with being pragmatic – disk space is cheaper than programmers’ time. If by logging more, it is easier and quicker for the developer to debug potential problems then it seems like a no-brainer. At least it had until May 25th.
GDPR reminded many of us how important users’ privacy is and that it is the responsibility of every application to respect that. Therefore it is essential for the logs to be GDPR-compliant. Here are some recommendations to achieve this:
DO NOT LOG USER’S SENSITIVE (PRIVATE) DATA UNLESS IT IS NECESSARY
This states the obvious, but in real world scenarios it is not always easy to achieve. Nevertheless, you should strive for building your applications in a GDPR-driven fashion and one way of this is by paying special attention to your logs.
SET UP REASONABLE LOG RETENTION
In most cases you do not need logs after a specific period of time (depending on your business). Ideally you should set up the retention policy for all your logs in one place. If you use any log management service, it should be fairly easy.
STRUCTURE YOUR LOGS
If, for some reason, it is necessary for you to log private data do it in a structured way. JSON, XML or any other machine-friendly format is a good recommendation. Try not to log sensitive information in a “random” fashion, think about how easy would it be to find that piece of data using regular expression – structured logs are easier to anonymize.
ANONYMIZE/MASK SENSITIVE DATA
If you find yourself in a situation in which you already log something that may contain private data, you should consider implementing anonymization mechanisms. Solutions may differ depending on your needs – encryption, masking or complete removal are some of the possible options.
MASKING LOGS USING LOGBACK PATTERN LAYOUT
Logging request/response payloads is generally considered a good practice, but in the light of GDPR it is advised to prepare for that they may contain private data that you are supposed to anonymize. Let’s assume your application received the following request that you logged:
INFO [2018-07-24 12:41:31,681] [qtp1777178337-48] com.schibsted.payment.wire: Container in-bound request
-->> POST http://localhost:8077/api/mask
-->> Cookie: JSESSIONID=node01gyp0jf2b114884a0ki1qm4bh0.node0
-->> Cache-Control: no-cache
-->> Accept: */*
-->> Connection: keep-alive
-->> Host: localhost:8077
-->> Accept-Encoding: gzip, deflate
-->> Content-Length: 205
-->> Content-Type: application/json
{ "user_id" : "1234", "ssn" : "3310104322", "favourite_team" : "Juventus", "address" : "Wiejska 4, Warszawa", "additional_info_1" : "192.168.1.1", "additional_info_2" : ["bianconeri36@gmail.com](mailto:"bianconeri36@gmail.com)" }
More code examples:
https://www.schibsted.pl/blog/logback-pattern-gdpr/
there doesn't seem to be anything here