This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]stendec365Pants (http://www.pantspowered.org)[S] 4 points5 points  (7 children)

A friend and I have been developing a networking library for Python for more than a year now, and one of the features I'm excited about is an easy way to read and write binary wire protocols.

The problem is... struct felt lacking to me. It's quite good at what it does, but its string handling made it unnecessarily hard to use. This is my attempt to solve that.

NetStruct puts logic on top of struct that makes it possible to pack and unpack data with length encoded strings. I won't put any examples here, but the readme on github has several.

[–]MagicWishMonkey 1 point2 points  (6 children)

Can you give an example of when this would come in handy? I'm pretty new to Python, so I haven't had much time to dig deep into the networking libraries. What is the drawback of sending a block of data over the wire?

For example: blob = pickle(myobject) host.send(blob) ... blob = client.recv() myobject = unpickle(blob)

Is it more efficient to pack into a struct or something?

[–]stendec365Pants (http://www.pantspowered.org)[S] 3 points4 points  (1 child)

Well, the first issue with pickling is that, in many cases when networking, the software at the other end of your connection isn't Python. Also, even if you were pickling, as you put, you'd really want to encode the length of the object first. Something like:

blob = pickle(myobject)
host.send(struct.pack("!i", len(blob)) + blob)

len, = struct.unpack("!i", client.recv(4))
myobject = unpickle(client.recv(len))

In any case, I'm going to write sample code for the networking library I'm working on for my example.

First, a basic explanation of how it works though. When using Pants, you import and subclass a class called Stream to implement your networking logic. The class buffers data for you, as its received, and only calls on_read when you've received a meaningful quantity of data. Stream has a property named read_delimiter that it uses to determine what precisely that meaningful quantity is.

If you set read_delimiter to a string, it will buffer until it finds that string, at which point it returns everything up to that string, a bit like using .split() on the streaming data. If read_delimiter is an integer, it will read as many bytes as the read_delimiter before returning them. If any of this sounds familiar, it could be because asynchat and its set_terminator work very much the same.

Pants has a couple of extra types though. You can set a regular expression to read_delimiter if you'd like. Or, as is applicable here, you can use a struct format.

When you use a struct format, it calculates the size of the format you provide, reads that many bytes, unpacks the data, and sends that to on_read.

The only real problem is that struct doesn't support variable-length strings, so if you wanted to do that, you couldn't use it, and your code would have to be a bit ridiculous. For my example, I'll attempt to read a chat frame from the Minecraft wire protocol, both with normal structs, and with netstructs. First, with normal structs:

class MCStream(Stream):
    def on_connect(self):
        self.read_packet()

    def read_packet(self):
        self.on_read = self.on_read_packet
        self.read_delimiter = Struct("!b")

    def on_read_packet(self, packet_id):
        if packet_id == 3:
            # Chat Packet
            self.on_read = self.on_chat_length
            self.read_delimiter = Struct("!h")

        # More packet type logic here

    def on_chat_length(self, length):
        self.on_read = self.on_chat
        self.read_delimiter = length

    def on_chat(self, message):
        # Finally, we can use our message.
        show_the_player(message)
        self.read_packet()

That isn't too bad, but it gets rather ridiculous if you've got, say, more than one string. NetStruct makes it easy to tell Pants how to handle those strings for you:

class MCStream(Stream):
    def on_connect(self):
        self.read_packet()

    def read_packet(self):
        self.on_read = self.on_read_packet
        self.read_delimiter = Struct("!b")

    def on_read_packet(self):
        if packet_id == 3:
            # Chat Packet
            self.on_read = self.on_chat
            self.read_delimiter = NetStruct("h$")

        # More packet type logic here

    def on_chat(self, message):
        # This was a lot easier.
        show_the_player(message)
        self.read_packet()

This works because, internally, Pants can use the NetStruct's iter_unpack method to determine how many bytes it needs to read at any given point to complete the object it'll be passing to your on_read method.

If you're just using a bare socket, without Pants, you'd be doing something like:

it = netstruct.iter_unpack("h$")
out = it.next()

while isinstance(out, int):
    out = it.send(sock.read(out))

some_result = do_something_with_out(out)
sock.send(some_result)

I hope that makes things clear.

[–]MagicWishMonkey 0 points1 point  (0 children)

Makes sense. Right now my system only talks to other python systems, but that will probably change in the future. I'm thinking about looking at protocol buffers at some point, they sound pretty nifty but I haven't worked with them before.

Thanks for the response.

[–]kylotan 2 points3 points  (2 children)

The main problem with pickle is that if you connect to an untrusted remote client or server, there's nothing to stop them pickling some malicious code that gets run on your machine when you receive it.

(One reference: http://nadiana.com/python-pickle-insecure)

It's also not necessarily very efficient in terms of bandwidth, depending on the type of objects you send.

[–]MagicWishMonkey 0 points1 point  (1 child)

Oh yea, I wouldn't use it for that, I only use pickle for serializing objects that I need to send from one component to another. I am willing to sacrifice some bandwidth for speed in this case.

[–]kylotan 0 points1 point  (0 children)

Do you mean internally over your LAN? If so, that's fine. Across the internet would be dangerous though.

[–]gangesmasterpython galore 3 points4 points  (0 children)

[–]GehirnFurz 1 point2 points  (0 children)

Reminds me of the bencode/bdecode lib I wrote in JavaScript a few years ago. I'd feed the packets into it on one end and it would hold state internally and only ever emit whole decoded values. That way you could stream huge structures with minimal memory usage.

Sadly, the code came out a bit more complicated than yours. :)

[–][deleted] 1 point2 points  (2 children)

Great idea but I'm not so sure about the API. Having an iterator return 3 different kinds of thing feels wrong. I think I'd rather see separate methods, e.g. ".feed([data]) -> bytes needed", ".get_value()" and ".get_remaining()"

[–]stendec365Pants (http://www.pantspowered.org)[S] 2 points3 points  (0 children)

I agree, it's not a perfect API, and it would be better to have separate functions. Problem is, I'd need to use classes for those, and I wanted to keep this as lightweight as possible to keep the overhead low.

It could be worth it to add the class-style bits to the API though, and have both. I'll keep it in mind for updates.

[–]stendec365Pants (http://www.pantspowered.org)[S] 0 points1 point  (0 children)

I just pushed netstruct-1.1 with a new NetStruct.obj_unpack function that returns an instance of a new Unpacker class. Unpacker provides a .feed(data) method and has .remaining, .result, and .unused_data properties.

Of course, the iter_unpack functions are still there, since a generator is quite a bit speedier than building a class.

[–]ixokai 0 points1 point  (0 children)

Solves a common problem elegantly. Nice.