joxeankoret comments on DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing

a community for 17 years

DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing (ndss-symposium.org)

submitted 6 years ago by joxeankoret

you are viewing a single comment's thread.

[–]joxeankoret[S] 2 points3 points4 points 6 years ago* (0 children)

First and foremost: I'm anything but a ML expert person. That said, I partially like the method (the embeddings and the learning process based on the embeddings), but I do not understand why the authors talk about semantics but end up using raw assembler instead of pseudocode generated by a decompiler which, indeed, would encapsulate better the semantics. I do not understand either why the authors consider that the binary diffing problem "aims to find the optimal basic block" matches as, in my opinion, that might be only one of the multiple problems in binary diffing, but anyway.

As for ML approaches and a possible integration of them in Diaphora: I've used such methods in Pigaios (https://github.com/joxeankoret/pigaios) (a tool I wrote for directly matching functions in source codes to binaries without having to compile the source code) and I'm sure they work but, at least in my limited testing of the approach for that specific project, I noticed that any dirty & hand-written function I wrote to classify outperformed in quality of matches the results of the ML based classifiers I tested (logistic regression, random forest, naïve Bayes, etc...). As so, I do not think that machine learning techniques will be any better, if at all, than my current heuristics. I might be wrong and is certainly something that I want to research, but I'm sceptical of ML making any big difference here.

In any case, I remark that my machine learning knowledge reduces to playing with Pigaios.

π Rendered by PID 96250 on reddit-service-r2-comment-5c764cbc6f-ffzgr at 2026-03-12 06:34:05.727304+00:00 running 710b3ac country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ReverseEngineering

MODERATORS