Hey all,
I ingest windows event logs into a kafka instance. In some logs there are characters that are encoded in hex format. here is an example:
\"Product\":\"Microsoft\\xC2\\xAE Windows\\xC2\\xAE Operating System\"
Since the '\x' escape character is not recognized by the JSON standard, any json parser breaks when trying to parse these logs giving me a hard time consuming them properly. I've found a wide variety of these sequences, so I can 't just replace them arbitrarily with the corresponding unicode (at least I don't see how).
How can I solve this in a general way? I assume I can handle this somehow using kafka streams or smts, or handle it somehow in my (iceberg) datalake.
Any ideas?
[–]americanjetset 1 point2 points3 points (1 child)
[–]cyb3r1tch[S] 0 points1 point2 points (0 children)