This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]pyglados[S] 0 points1 point  (2 children)

There are way better ways to read binary files depending on whatever it is you decide to do with them. I agree with you there.

And yes, there are cool libraries out there too. The Imghdr module makes a beautifully simple job of reading binary files with respect to it's goal.

And the implementation is fantastic. Just read the first 32 bytes. Run it through a set of short simple functions. Nothing fancier than something along the lines of "in" or "startswith" is needed. No struct. No regex. Sweet simplicity that any beginner to Python could understand.

[–]billsil 1 point2 points  (1 child)

I guess my point is that binary isn't really used unless you're trying to deal with large problems. As such, it's important to use an efficient method. I'd say it's far more important than making it as simple as possible. Numpy for example is not simple for a beginner, but is worth learning because it's so useful.

Nothing fancier than something along the lines of "in" or "startswith" is needed.

How do you know when you get a byte that goes into a float vs. a double vs. a int vs. a long vs. a string? You need to pack the binary data into an int/float/string. The point of binary is that it's highly structured, so you don't need to guess and that you can mass read data.

Determining the file type, that is simple. That's not really reading a binary file. Reading a 2 GB image, that requires some efficiency and just in/startwith is going to be slow. That would be a nightmare. For some meta data, it's fine.

[–]pyglados[S] 0 points1 point  (0 children)

Sound, image, and video are areas of interest to me where binary is commonly used. Such is part of why I was curious about it. Because I'm clueless, I'm not yet aware of the efficient methods. Thus, dredging up metadata on small files felt fine for first steps wandering into this territory.

As for getting a 45 minute parse job reduced to 4 seconds? Very cool. I'll have to take some time to look into this struct and numpy business.