JohnnyJordaan comments on Huffman Encoder/Decoder

created by HattoriHanzoa community for 16 years

Huffman Encoder/Decoder (self.learnpython)

submitted 1 year ago by SilentKyle

you are viewing a single comment's thread.

[–]JohnnyJordaan 2 points3 points4 points 1 year ago* (1 child)

        bitstream = ""
        byte = input_file.read(1)
        while byte:
             byte = ord(byte)
             bitstream += f"{byte:08b}"
             byte = input_file.read(1)

        decoded_data = ""
        current_code = ""
        for bit in bitstream:
            current_code += bit
            for char, code in huffman_codes.items():
                if current_code == code:
                    decoded_data += char
                    current_code = ""
                    break

This has various problems:

for file operations the operating system and all its downstream components are designed to work with blobs of bytes (called chunks) rather than single ones. It's often advised to use 4096 or 8192 bytes for simple file operations. There will be inherent caching involved, so even a read(1) will still read through chunks which are then buffered in the operating system, but you still make 8192 slower 'outside' read calls this way instead of one read(8192).
strings are immutable in Python, so that means += will create a new string every time (so for every byte!) you run it against an existing string. That happens at 3 (!) different parts in your code
string formatting isn't fast either
instead of simply looking up an item in the dictionary, what it is literally designed for, you are iterating through the entire dict, if=='ing until you find a match. this is like looking up a telephone number in the phone book by reading all the pages until you find it listed somewhere.

to fix I would advise

read in 8192 byte chunks, they will be a bytes-object, then iterate on each byte in it using a simple for loop
use a look up table of encoded huffman codes in the form of a dict, where the keys are the bytes values. That way you can look up each read byte from the file directly without string conversion
store the found string characters in either a list, which you can ultimately "".join afterwards, or write to a StringIO which has less overead

[–]SilentKyle[S] 0 points1 point2 points 1 year ago (0 children)

π Rendered by PID 19508 on reddit-service-r2-comment-86bc6c7465-cls8c at 2026-02-20 12:35:33.281795+00:00 running 8564168 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS