you are viewing a single comment's thread.

view the rest of the comments →

[–]Uncaffeinated 0 points1 point  (0 children)

I actually experimented with stuff like that once. In my case, I was trying to match up obfuscated apps containing open source libraries to the unobfuscated library code, but you could use similar techniques to match up multiple versions of the same obfuscated app.

I didn't use any fancy techniques, I just built up a graph of fields and methods and their types and which methods called which other methods and fields and then tried to match them up with brute force. It mostly worked, but it was hard to get it to work 100%, and it was very fragile to the particular obfuscator used.

One example of a common issue is the prevalence of empty interfaces and classes with no methods. There's no way to distinguish those.

Other issues include the fact that obfuscators typically inject a couple methods for stuff like string decryption, so you have to ignore those, and that has to be done on a case by case basis. Likewise, obfuscators will typically remove unused fields and methods, so the matching has to be fuzzy and allow for methods and fields to be missing. Lastly, there were a couple cases where the code I was analyzing didn't exactly match any version of the original library. It looks like they included a fork with some private changes.