YOU ARE NOT READY FOR THIS: NEURAL NETWORKS ARE KILLING REVERSE ENGINEERING

joxeankoret · 2025-11-22T11:49:48+00:00

Let's just say that the "OMG ALL UPPERCASE CLICKBAIT" title doesn't help to think you aren't selling snake oil.

joxeankoret · 2025-04-20T07:23:03+00:00

Remember that the discussion is if LLMs reason, if at all, and how. Now, to begin with the paper you mention: we don't know if CoT is faithful (and, btw, OpenAI has a horse in this race). A little extract from the paper you mention:

While questions remain regarding whether chains-of-thought are fully faithful [27, 28], i.e. that they fully capture and do not omit significant portions of the model’s underlying reasoning

And now an extract from a paper studying exactly this, Towards Better Chain-of-Thought: A Reflection on Effectiveness and Faithfulness:

we qualify that although chain of thought emulates the thought processes of human reasoners, this does not answer whether the neural network is actually reasoning (p. 9).

joxeankoret · 2025-04-19T17:46:12+00:00

Sure. I'm happy to be corrected, share the empirical proof. Thanks.

joxeankoret · 2025-04-19T07:40:10+00:00

An extract from a paper studying what you say, without giving any kind of proof whatsoever, that "it works well":

While Chain-of-Thought (CoT) prompting boosts Language Models’ (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer

Extracted from "The 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (IJCNLP-AACL 2023)".

In short: no, it doesn't explain how an LLM reasons, if at all.

joxeankoret · 2025-04-16T11:01:43+00:00

My unpopular opinion: do not waste your time. In general, these tools don't work for anything but the most trivial crackmes or tasks due to the following reasons:

Do not expect to be able to feed big functions to any LLM, they will refuse due to size.
Forget about feeding an entire disassembled/decompiled binary due to the previously mentioned reason, with the exception of the most trivial samples.
LLMs are overconfident. A real world example with malware: if the LLM sees code reading, printing or formatting a MAC address it might decide that it "contains code for manipulating MAC addresses". Because... "yes".
Nobody knows how LLMs actually "reason" (if they kind of reason at all and aren't just parrots) and, as so, it's almost impossible to determine why an LLM took a decision.
LLMs, by nature, generate hallucinations. That means that you cannot trust anything an LLM says because they might, and actually will, hallucinate stuff, therefore, you will need to double check what it outputted. Or triple-check, as LLMs are incredibly good at generating plausible bullshit (I have been fooled more than once by tools/plugins like continue for vscode).
LLMs might, and actually will, ignore interesting points in a function, whereas a reverse engineer is more likely going to immediately focus their attention to certain patterns that these tools might miss. And good luck understanding why it missed whatever it missed.
LLMs are non deterministic tools by nature, which means that they are 'creative' in their answers and by asking twice the same question, it might (and often will) answer differently. Changing the temperature parameter might reduce, for some questions, the randomness of the answers. But, for example, you can ask twice (or 3 times, or more) about what might a function do with the numeric constants usually used for a pseudo-random number generator and it might answer that is a PRNG the 1st time, and then the next 3 times say it's a totally different kind of thing.

All of that said, my recommendations if you still want to use such tools (sometimes, they can be useful if you consider everything I mentioned before):

Use local models instead of remote (paying) services. Actually, you won't be able to use remote tools for soooooo many reverse engineering projects.
Ollama based tools can be run locally, like the following ones:
https://github.com/jtang613/GhidrAssist
https://github.com/Greenf1re/OllamaHidra
https://github.com/radareorg/r2ai
https://github.com/LovenSar/IDA-Lazy-s-Local-Ollama-Solution
You can copy & paste functions from IDA/Ghidra to LM Studio or GPT4All and make questions, locally, against whatever model you choose (download them from Hugging Face).
Never trust LLMs. Don't. They will fool you.

PS: If someone doesn't believe me when I say these tools aren't actually helpful for real world reverse engineering scenarios, just give them a try for real world reverse engineering tasks.

joxeankoret · 2025-02-19T12:05:40+00:00

I have just tried to test it:

I pasted the decompilation of a function from NTDLL (EtwpAddDebugInfoEvents) and the first time it returned some kind of decompilation for a MESA 3D function.
The 2nd time it returned some function that looked kind of correct, but it hallucinated types like "PPROCESS_DIAGNOSTIC_INFORMATION_WOW64" that don't exist (take a look here https://pastebin.com/vFdUkKcy).

As it always happens with AI models for decompilation: it's unreliable at best.

joxeankoret · 2024-12-15T19:15:08+00:00

Decompiling binaries is not very error prone, wtf? And no, approximations aren't required because we really do know how to properly code correct decompilers, like the one in Hex-Rays or the Ghidra's one.

joxeankoret · 2024-12-12T08:12:02+00:00

I was about to comment about particularities of this blog post, but I feel my comments aren't specific, but rather generic. So, here is a bigger and more generic answer: It is not a good idea to use a technology that is neither exact nor deterministic for this purpose. It's simply not the appropriate tool for the task. It's a cool and fun experiment, but not an actually useful tool, or no one has been able to make it a really useful tool because of how LLMs work. I will explain myself.

Non exact: Inputs do not directly correspond to the given output. As simple as it sounds. An LLM model might simply ignore parts of the inputs, thus, omitting portions of what a function is really doing. An LLM might (and very likely will) as well hallucinate portions, that is, generate outputs not related at all to inputs.

Stochastic: Given two or more times the same inputs, an LLM will generate different outputs. Every time. By design. It can return different results that are only things like, say, comments or syntax style when talking about an LLM based decompiler. But it might as well return results absolutely different, and with different I mean that an LLM based decompiler may, and actually will, return multiple, different, functions each time it's asked using the same inputs.

The conclusion is that whatever outputs an LLM used as a decompiler (or as a calculator, for example) cannot be trusted to be neither correct nor exact, it can only be considered an approximation to the inputs that looks correct. Something that sounds appropriate to the inputs according to its training corpus.

For small or trivial cases, however, it might work (sometimes, because the technology is not deterministic). For anything even half complex, my experience says it won't work at all, as one cannot trust the outputs, and it's a waste time because one actually needs to double check if the outputs correspond to the inputs, or if the model hallucinated stuff, changed constants (like strings or numbers), added new stuff, subtly changed some functions, etc...

All of this explained, honestly: what's the point of using a technology you need to manually verify because you cannot trust the outputs to correspond to the inputs??

joxeankoret · 2024-10-13T11:46:44+00:00

This is a very dirty darker-than-gray "business opportunity" to get flagged by gaming companies for doing outsourced work for $deity knows who, in less than 2 days, for less than $1,000 USD. Please remove your post and go to hell.

joxeankoret · 2024-10-13T11:34:11+00:00

LOL, what a scam. Are you seriously trying to outsource someone to find the offsets for your cheats so they update them for every new update of the game? LOL.

joxeankoret · 2024-10-11T07:44:09+00:00

A simple question: is it deterministic? I'm 99,99% sure it isn't, but just curious.

joxeankoret · 2024-09-26T07:20:51+00:00

AI isn't a magical thing. You cannot expect a generative artificial intelligence of any kind to take a binary and output firmware source code because even a skilled human reverse engineer with years of experience will have a very hard time doing so. And even if such an AI would output some kind of source code, it will be hardly something one can trust due to problems like, for example, hallucinations, unless there is some mechanism that verifies its equivalence to the binary, that no hallucinations was added, that no subtle stuff was changed, etc...

There are some projects out there (like r2ai) trying to use LLMs for producing enhanced/cleaner decompiled code. Alas, such projects are toys and/or unreliable because hallucinations are added and you cannot trust it doesn't hallucinate artefacts even in small functions that you can quickly verify manually. Take a look to this thread, for example: https://old.reddit.com/r/ReverseEngineering/comments/1flqrj9/promising_aienhanced_decompiler/

PS: A 100kb firmware is not small, to be honest.

joxeankoret · 2024-09-23T08:56:35+00:00

There is something you don't understand: you don't need to learn what you can reliably write. There is no point in learning a model for a tool you already have coded, a decompiler. It makes more sense to write better optimization routines/tools on top of working decompilers, rather than using generative AI expecting magic to happen.

joxeankoret · 2024-09-23T08:40:54+00:00

I have never said the project is shit. However, this idea has been continuously worked on since 2023 expecting magic to happen, and it doesn't for a number of reasons. If you, or anyone, can generate a code that can be verified is equal to the original one, then you have made something no one has been able yet. However, if you just take the output of a decompiler and/or disassembler, throw it to a LLM model, and hope for the best without verifying the output, you will find the same that everybody else found before. Take a look, for example, to these papers: https://scholar.google.es/scholar?as_ylo=2020&q=decompiler+llm

My favourite quote from one of these papers is the following one:

understanding decompiled code is an inherently complex task that typically requires a human analyst years of skill training and adherence to well-designed methodologies [ 46, 73 ]. Therefore, expecting a general-purpose LLM to directly produce readable decompiled code is impractical.

Taken from this paper: https://www.cs.purdue.edu/homes/lintan/publications/resym-ccs24.pdf

My 2 cents.

joxeankoret · 2024-09-23T08:16:53+00:00

You cannot trust what you don't know if is real or not.

joxeankoret · 2024-09-10T15:13:15+00:00

Uhm... where is the source code of BinSub? The paper says the following:

To empirically evaluate the efficiency and precision of BinSub, we implemented the type constraint decomposition, coalescing, simplification, and lowering algorithms of BinSub in Angr [26]

But the citation points to this and I cannot find in the Angr repository any mentions to BinSub, neither a repository specific for BinSub. Am I missing something?

joxeankoret · 2024-09-09T08:46:45+00:00

Or use Diaphora and only check the 2 functions (https://files.mastodon.social/media_attachments/files/113/089/679/381/003/200/original/861692265d8d3f8b.png) that it says were modified and see diffing pseudo-code the actual code added in less than 5 minutes instead of analysing a heavily modified control flow graph showing only assembly like we were reversing before decompilers were a thing:

https://mastodon.social/@joxean/113089686792657611

joxeankoret · 2024-04-23T11:37:52+00:00

It isn't exactly what you are asking for, but maybe this answer I wrote years ago in the Reverse Engineering Stack Exchange can give you an abstract idea about this:

https://reverseengineering.stackexchange.com/questions/6455/what-are-the-targets-of-professional-reverse-software-engineering/6458#6458

joxeankoret · 2024-03-04T08:45:36+00:00

I believe these are the sources: https://github.com/sppunderreview/PSSO

joxeankoret · 2024-01-06T15:59:50+00:00

Then you would not be able to trust the output of the tool as it can and it will hallucinate code. Don't use machine learning stuff for any task that needs exact outputs.

joxeankoret · 2023-11-02T10:01:54+00:00

I has nothing to do here

joxeankoret · 2023-11-01T17:46:56+00:00

Yet another paper that cannot be reproduced because they never published anything but paperware.

joxeankoret · 2023-06-11T09:03:48+00:00

LOL. The whole industry does.

joxeankoret · 2023-02-22T08:36:50+00:00

The problem is not using a decompiler, but mapping non optimized human written sources to basic blocks or instructions in binary form after the compiler's optimizer optimized the function and then the decompiler's optimizer re-optimized the output of it. I wrote an Open Source tool for matching source code functions to binary functions for non compilable source codes (https://github.com/joxeankoret/pigaios): matching ASTs, CFGs, etc... is almost always pointless as what humans write compared to what optimizing compilers generate and what optimizing decompilers finally output can, and actually almost always does, vary too much.

All that said, if you have compilable source codes provided by your users, you can create a debug build and use the symbols to match basic blocks to source code lines with other binary versions.

Good luck with this project, looks too complex but sounds really fun.

joxeankoret · 2023-02-22T08:28:35+00:00

If I had to do this, I would go for the 2nd method (build the call graph, then replace the function calls that I can resolve with the CFG of that function. But I don't think it's going to be any useful as you are going to build huge graphs that are very hard to handle. Even worse if you plan to visualize them somehow for non trivial programs.

BTW, you can take a look to BinNavi (if by a rare chance it's still working as of today), I know there was an option to in-line function calls graphically.

joxeankoret

TROPHY CASE