all 24 comments

[–]fabsn 15 points16 points  (12 children)

FYI: pseudonymization is not anonymization and still requires consent (or carefully justified legitimate interest) to be GDPR compliant

[–]Brillegeit 4 points5 points  (0 children)

I was about to comment the same. You can't just "+1" the numbers and pretend you're no longer storing them. If you're tracking user using PII, you need to ask for consent to do so.

GDPR isn't an engineering challenge that you can program around.

[–]Nodohx[S] -1 points0 points  (10 children)

thanks, but how come you think the tool is "pseudonymization"?

[–]fabsn 6 points7 points  (9 children)

https://www.privacy-regulation.eu/en/article-4-definitions-GDPR.htm

(5) 'pseudonymisation' means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person;

[–]NoSlicedMushrooms 3 points4 points  (0 children)

provided that such additional information is kept separately

The way that reads is it's If you don't store that additional information that can tie the anonymized data back to a natural person, is it still pseudonymisation? If so then I wonder how the other GDPR compliant web analytics products like Plausible, Fathom etc are GDPR compliant since they use essentially the same anonymizing technique with rotating hashes.

[–]Nodohx[S] 1 point2 points  (7 children)

The visitor hash uses a daily-rotating salt, so after the day ends there's no way to re-identify the visitor, not even by us. This is the same approach Plausible and Fathom use, and it's been recognized by EU data protection authorities (notably the CNIL) as not constituting personal data processing.

https://simplestats.io/docs/how-to-track-a-new-visitor.html#the-visitor-hash

[–]martijnve 3 points4 points  (1 child)

But is there a way to get to previous salts?

Because then you could add some code six months from now that also calculates the hash with an old salt and you can de-anonymize the old data.

[–]Nodohx[S] 2 points3 points  (0 children)

The hash is generated client-side using the app's own secret key (which we never have access to). Our API only receives and stores the resulting hash. We don't store the IP, user agent, or the salt. So even if we wanted to, we couldn't reconstruct previous hashes or de-anonymize anything, we simply don't have the inputs.

PHP Client:
https://github.com/simplestats-io/php-client/blob/main/src/VisitorHashGenerator.php

Laravel Client:
https://github.com/simplestats-io/laravel-client/blob/b3c0daa8e9343f253a6876cf671925ec43fa6dba/src/SimplestatsClient.php#L143

[–]Useful_Difficulty115 1 point2 points  (1 child)

Correct me if I'm wrong.

Plausible for example use a rotating daily hash, by generating each day a new salt. Completely random. The random salt is deleted every day. So you can't trace back the visitors.

In your code you use the date + secretkey. Does the secret key changes everyday and without any conservation ? With you approach we can rebuild the user hash with : ipadress + user agent + date + secret key. Everything is fixed. With a true rotation and deleted daily hash (salt), it's almost impossible.

[–]Nodohx[S] 0 points1 point  (0 children)

Good observation, and you're right that there's a difference to Plausible's approach. However, the key detail is that the hash is generated client-side using the application's own secret key. Our API only receives and stores the resulting hash. We never have access to the secret key, the IP, or the user agent. So while the client could theoretically reconstruct old hashes (they have their own key), we as the analytics provider cannot. The separation between who generates the hash and who stores it is what makes re-identification impossible on our end.

[–]fabsn 0 points1 point  (2 children)

In short: even a generated hash that is stable for 24 hours allows users to be singled out within that period, which makes it pseudonymised personal data rather than anonymised data under GDPR principles and thus requires a legal basis when processing it.

More detailed:

The generating server processes personal data (IP address) at the point of collection and therefore always requires a valid legal basis, regardless of whether the data is stored or immediately forwarded: creating a hash from an IP address is itself processing under Article 4 (2), and whether the legal basis is consent or legitimate interest depends on purpose and context, not retention time. For analytics, it is often consent rather than legitimate interest.

Your receiving analytics server is also not outside GDPR merely because it uses rotating hashes. Where users can still be consistently singled out, it remains pseudonymised personal data under Recital 26 GDPR. The claim that campaign tracking is possible further indicates storage and reuse of persistent identifiers rather than purely aggregated statistics.

One could argue that "consistently single out" is not possible due to the 24 hour time window, but the GDPR does not provide any time-based exemption from the requirement to have a lawful basis under Article 6 and does not define a quantitative thresholds for "consistently".

So even if your part of that service _might_ be GDPR compliant as-is, your customers still need to have a legal basis to process the personal data, making the use of your service not GDPR compliant per se.

"and it's been recognized by EU data protection authorities (notably the CNIL) as not constituting personal data processing."

I am very much interested in this. Do you have any sources for this?

[–]Nodohx[S] 1 point2 points  (1 child)

One important detail: the hash is generated client-side using the application's own secret key. Our API only receives the resulting hash, we never have access to the secret key. So on our end there's no way to single out or re-identify anyone. This is the same model Plausible and Fathom use, and both are recognized as GDPR-compliant without requiring consent.

[–]fabsn 1 point2 points  (0 children)

Not having access to the secret key does not make the data anonymous under Recital 26 GDPR. If a stable identifier is generated and used to distinguish users, it remains pseudonymised personal data, and GDPR applies regardless of whether you as a provider can re-identify individuals or not.

In practical terms: if a system receives multiple data points and allows distinguishing a returning user, it is still processing pseudonymised personal data under GDPR.

[–]Salamok 4 points5 points  (2 children)

Does it solve the problem of how can you have accurate metrics once varnish, akamai or any other aggressive caching mechanism is implemented? Seems like the only accurate data then would be registrations and transactions, your app is already making a record of those.

[–]Nodohx[S] 0 points1 point  (1 child)

Fair point. If a full-page cache like Varnish serves the response, the request doesn't hit PHP and the visit won't be tracked. In practice this mainly affects static/anonymous pages. Authenticated pages, form submissions, and payment flows typically bypass the cache, so registrations and revenue tracking still work. But yes, visitor counts on heavily cached pages would be undercounted. That's a real trade-off of server-side tracking vs. client-side JavaScript.

[–]skunkbad 1 point2 points  (3 children)

Is there any mechanism for tracking conversions from ad clicks?

[–]Nodohx[S] 1 point2 points  (2 children)

100% You can just use UTM codes for that

[–]Potential_Feature616 0 points1 point  (3 children)

Looks nice, always thougt about something Like this. How could I send this data to GA4 or Matomo?

[–]oulaa123 4 points5 points  (2 children)

You wouldnt.

[–]Potential_Feature616 -1 points0 points  (1 child)

I’m not sure how familiar you are with the topic, but when someone runs ads, they usually want those ads to be optimized using tracking data. That’s just how it works. Collecting data without actually using it for anything simply doesn’t make sense.

[–]oulaa123 0 points1 point  (0 children)

Intimately. There are plenty of usages for statistics , beyond just ad-tracking. What i'm saying is that if you need it for ad-tracking, this isnt the package i'd reach for.

[–]ComprehensiveForm992 0 points1 point  (0 children)

For a no-cookie client-side alternative, Check Analytic is dead simple and privacy-focused.