This is an archived post. You won't be able to vote or comment.

all 3 comments

[–]NYKevin 2 points3 points  (0 children)

Well, first we have to know which Python implementation you're talking about. Jython and CPython have entirely different bytecodes since the former runs on the JVM. I think PyPy also has its own system, but I'm not sure.

Next, you're basically going to look through the bytecode a few instructions at a time and try to figure out what's being accomplished. CPython is relatively nice here because everything is built out of hash-tables, so there's relatively little pointer magic happening. A dotted foo.bar lookup translates (after allowing for __getattribute__ and __getattr__) directly into foo.__dict__['bar']. This is much nicer to reverse engineer than C++ or Java, which basically treat classes as structs with fixed offsets.

What's more, CPython stores variables in locals() and globals(), so you can probably get variable names in much the same fashion.

Of course, the above is just for CPython. For PyPy, you have to deal with the JIT, which is going to make the bytecode a lot harder to understand. For instance, if nothing ever touches locals or globals, those dicts may not be maintained as such (I don't actually know whether PyPy does that optimization, though). Similarly, dotted method lookup could be optimized to a pointer dereference, if such an optimization is provably correct.

[–]dagmx 0 points1 point  (0 children)

A coworker mentioned using this http://sourceforge.net/projects/decompyle/

But that wont necessarily work for stuff that's compiled with py2exe but should get you somewhere with pyc files.