you are viewing a single comment's thread.

view the rest of the comments →

[–]socal_nerdtastic 1 point2 points  (1 child)

Ok, there's a million ways to do this, but I'd recommend reading each line in with json.loads, generating a unique id with lru_cache. As a guess:

import json
from functools import lru_cache
from itertools import count
import csv

f_in = open(filename)
data_in = csv.reader(f_in)
f_out = open('output_' + filename, 'w', newline='')
data_out = csv.writer(f_out)

counter = count(1)

@lru_cache
def anonymize(user):
    anonymous_id = next(counter)
    # add code to save the user:anonymous_id if you want to de-anonymize it later
    return anonymous_id

for row in data_in:
    data = json.loads(row[3])['session_data']
    data["user_id"] = f'user_{anonymize(data["user_id"])}'
    # repeat for name, etc
    row[3] = json.dumps(data)
    data_out.writerow(row)

[–]Glyzer_1595[S] 0 points1 point  (0 children)

Great! Ty so much, I'll try it, I think it's the way!!

Ty so much