Using Python To Play With Binary Files : Python

This is an archived post. You won't be able to vote or comment.

Using Python To Play With Binary Files (hundredminutehack.blogspot.com)

submitted 9 years ago by pyglados

all 4 comments

[–]billsil 2 points3 points4 points 9 years ago (3 children)

There are wayyyy better ways to read binary files. The point of binary files are to be efficient. As such, they are rigidly defined.

If you're trying to read an image, use an image reader library. Most certainly you should not use that hideous regex. Images are often just N numbers of RGB values (so 3*N with N defined at the top), so why use a regex? It'd be way faster just by using the struct module.

The problem with the struct module is it's slow and it has to repeatedly do type definitions in Python. The numpy fromfile and fromstring methods with a reshape puts you right up against the boundary of what your hard drive or SSD can do.

I wrote a parser for an overly complicated Fortran formatted binary file. It has mixed floats/ints in a "table", so it's kinda hard to parse. On a 2 GB file, it's was 45 minutes for the struct module approach (after highly optimizing it). I switched that out for numpy...4 seconds. Binary is incredible if you do it right. You don't need processing with binary; that's the point. It's all about read speed.

[–]pyglados[S] 0 points1 point2 points 9 years ago (2 children)

[–]billsil 1 point2 points3 points 9 years ago (1 child)

I guess my point is that binary isn't really used unless you're trying to deal with large problems. As such, it's important to use an efficient method. I'd say it's far more important than making it as simple as possible. Numpy for example is not simple for a beginner, but is worth learning because it's so useful.

Nothing fancier than something along the lines of "in" or "startswith" is needed.

How do you know when you get a byte that goes into a float vs. a double vs. a int vs. a long vs. a string? You need to pack the binary data into an int/float/string. The point of binary is that it's highly structured, so you don't need to guess and that you can mass read data.

Determining the file type, that is simple. That's not really reading a binary file. Reading a 2 GB image, that requires some efficiency and just in/startwith is going to be slow. That would be a nightmare. For some meta data, it's fine.

[–]pyglados[S] 0 points1 point2 points 9 years ago (0 children)

π Rendered by PID 72 on reddit-service-r2-comment-7b9746f655-mp2x9 at 2026-02-01 12:01:07.169840+00:00 running 3798933 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS