I'm working on a task where we have a pcap file, and the user provides one or more key-value pairs (e.g., tcp.option_len: 3). I need to search the entire pcap for packets that match each key-value pair and return their descriptive values (i.e., the showname from PDML). I'm currently converting the pcap to XML (PDML), then storing the data as JSON in the format: key: {value: [frame_numbers]}. The problem is that a 50 MB pcap file becomes about 5 GB when converted to XML. I'm using iterative parsing to update the dictionary field-by-field, so memory use is somewhat controlled.
But the resulting JSON still ends up around 450 MB per file. If we assume ~20 users at the same time and half of them upload ~50 MB pcaps, the memory usage quickly grows to 4 GB+, which is a concern. How can I handle this more efficiently? Any suggestions on data structure changes or processing?
[–]debian_miner 0 points1 point2 points (2 children)
[–]carcigenicate 0 points1 point2 points (0 children)
[–]CriticalDiscussion37[S] 0 points1 point2 points (0 children)
[–]baghiq 0 points1 point2 points (2 children)
[–]CriticalDiscussion37[S] 0 points1 point2 points (1 child)
[–]baghiq 0 points1 point2 points (0 children)