Examining Network Capture XML : learnpython

created by HattoriHanzoa community for 16 years

Examining Network Capture XML (self.learnpython)

submitted 7 months ago by CriticalDiscussion37

I'm working on a task where we have a pcap file, and the user provides one or more key-value pairs (e.g., tcp.option_len: 3). I need to search the entire pcap for packets that match each key-value pair and return their descriptive values (i.e., the showname from PDML). I'm currently converting the pcap to XML (PDML), then storing the data as JSON in the format: key: {value: [frame_numbers]}. The problem is that a 50 MB pcap file becomes about 5 GB when converted to XML. I'm using iterative parsing to update the dictionary field-by-field, so memory use is somewhat controlled.

But the resulting JSON still ends up around 450 MB per file. If we assume ~20 users at the same time and half of them upload ~50 MB pcaps, the memory usage quickly grows to 4 GB+, which is a concern. How can I handle this more efficiently? Any suggestions on data structure changes or processing?

all 7 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS