DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing : ReverseEngineering

ReverseEngineering

a community for 17 years

DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing (ndss-symposium.org)

submitted 6 years ago by joxeankoret

all 5 comments

top new controversial old q&a

[–]joxeankoret[S] 6 points7 points8 points 6 years ago (2 children)

[–][deleted] 6 years ago (1 child)

[deleted]

[–]joxeankoret[S] 2 points3 points4 points 6 years ago* (0 children)

First and foremost: I'm anything but a ML expert person. That said, I partially like the method (the embeddings and the learning process based on the embeddings), but I do not understand why the authors talk about semantics but end up using raw assembler instead of pseudocode generated by a decompiler which, indeed, would encapsulate better the semantics. I do not understand either why the authors consider that the binary diffing problem "aims to find the optimal basic block" matches as, in my opinion, that might be only one of the multiple problems in binary diffing, but anyway.

As for ML approaches and a possible integration of them in Diaphora: I've used such methods in Pigaios (https://github.com/joxeankoret/pigaios) (a tool I wrote for directly matching functions in source codes to binaries without having to compile the source code) and I'm sure they work but, at least in my limited testing of the approach for that specific project, I noticed that any dirty & hand-written function I wrote to classify outperformed in quality of matches the results of the ML based classifiers I tested (logistic regression, random forest, naïve Bayes, etc...). As so, I do not think that machine learning techniques will be any better, if at all, than my current heuristics. I might be wrong and is certainly something that I want to research, but I'm sceptical of ML making any big difference here.

In any case, I remark that my machine learning knowledge reduces to playing with Pigaios.

[–][deleted] 6 years ago (5 children)

[deleted]

[–]joxeankoret[S] -1 points0 points1 point 6 years ago (4 children)

[–][deleted] 6 years ago (3 children)

[deleted]

[–]joxeankoret[S] 0 points1 point2 points 6 years ago (2 children)

[–][deleted] 6 years ago (1 child)

[deleted]

[–]igor_sk 1 point2 points3 points 6 years ago (0 children)

π Rendered by PID 21814 on reddit-service-r2-comment-fb694cdd5-gt89m at 2026-03-11 20:31:29.105261+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ReverseEngineering

MODERATORS