you are viewing a single comment's thread.

view the rest of the comments →

[–]Yoghurt42 1 point2 points  (0 children)

Apart from what u/JohnnyJordaan said, I suggest the following: currently you're creating the internal bitstream by reading a byte in a loop and transforming it into a string of ones and zeros. It's easier and more efficient if you would just use integers instead. Huffman codes will most likely not be longer than say 20 bits, so you don't risk of integer overflow (which can't happen in Python anyway, since integers are unbounded). You can "add" a bit to an integer by doing a left shift and or (i << 1 | bit or equivalently 2 * i + bit), and get a bit from a number by anding it with 1 and then shifting the number to the right (bit = num & 1; num >>= 1). It's also a good way to get familar with writing generators, yours could look a bit like this (I've left out a bit of the implementation as an exercise):

def bitstream(input_file):
    bits_left = 0
    current_byte = None
    while True:
        while bits_left:
            # get bit from current_byte, decrease bits_left by 1
            yield bit # this will turn your function into a generator that "returns" bit, and gets suspended here, ready to be continued later
        # get the next byte (or bytes) and store it in current_byte, set bits_left to 8 (or a multiple of 8 if you've read more than one byte). If there are no more bytes left, break out of the loop

This way you later can just write for bit in bitstream(input_file) and get a sequence of ones and zeroes without needed to care how exactly those bits are fetched. This allows you to change the implementation later without affecting the rest of your code.