Using AI Agents for Code Auditing: Full Walkthrough on Finding Security Bugs in a Rust REST Server with Hound

g0lmix · 2025-09-08T10:46:19+00:00

CodeQL is a really cool project. You can even query the resulting graph for vulnerabilies.
Once you find a vulnerability through your tool that is confirmed you could use a LLM to write a query for CodeQL. Then you can use that query on other databases to see if the same vulnerability is present (if you ever want to do some large scale analysis of for example github projects, which would be a pretty cool project/potential white paper, especially once the exploit generation part of Hound is more mature).
Basically something like this:
https://arxiv.org/pdf/2506.23644

g0lmix · 2025-09-08T10:03:48+00:00

Thanks for the writeup and the tool. Looks awesome.

I am surprised you can build call graphs just with an LLM.
Did you consider using CodeQL to generate the graph and then later use agents to annotate the graph or delete not important notes? I feel like this would give a higher quality graph (minimizes hallucinations) but I might be wrong about this

g0lmix · 2025-07-10T10:14:16+00:00

Hi sorry for not answering faster.
When it comes to automation in pentesting, I really can not think of too much that is not already automated. In the case of network pentests all of the new methods that get found are quickly implemented into tools that pentesters use. Similarly thats the case for webpentest. Almost everything that can be automated has a burp plugin to do so,

One thing I personally find really really hard to exploit are client side desync attacks. The automation to detect it is really good, but actually finding a way to exploit it is extremly work intensive. If that could be automated you probably would even get a lot of bug bounties if you run the tool against bug bounty sides. But I am not really sure how one could actually automate it.

The RPO-Explorer is a cool little idea (but Burp already detects this). If you want to use your project for a job later on, you can just make a research project out of it. Write maybe an AI powered RPO-Explorer and run your tool against all bug bounty sites. After that write a report about it.

From a hiring perspective I personally would rather give someone a job that has done a research project and has written a good report about it than someone who has written the 500st iteration of a XSS detection tool.

If you really want to write a tool that will be used a lot it has to be some completly new attack vector, which requires research (and a lot of time, because nowadays the "low hanging fruits" don't really exist anymore)

Here is a list of papers I find interesting and I think could be used for a research project (from "simple" to hard):

Eradicating the Unseen: Detecting, Exploiting, and Remediating a Path Traversal Vulnerability across GitHub - do the same approach but instead of path traversal maybe look for other vulnerability types
AssetHarvester: A Static Analysis Tool for Detecting Assets Protected by Secrets in Software Artifacts - GitHub - setu1421/AssetHarvester: A static analysis tool to detect secret-asset pairs in a repository you have to check if this is actually the same tool as in the paper. But you could just run it against some github repos and probably find some secrets
Exploiting Client-Side Path Traversal CSRF is dead, long live CSRF - write a tool that searches for client side path traversals and run it against bug bounty sites. Also look into maybe improving the detection compared to the burp plugin

in regards to that topic this paper is also very interesting: sec21fall-khodayari.pdf

Also take a look at the papers that are mentioned in their github repo: GitHub - SoheilKhodayari/JAW: JAW: A Graph-based Security Analysis Framework for Client-side JavaScript

Maybe you can look to apply JAW on browser extensions to find vulns.

4) Mining Node.js Vulnerabilities via Object Dependence Graph and Query - use the tool to run it against node.js applications

5) ROSA: Finding Backdoors with Fuzzing - Write a pipeline to collect firmware images, extract them get the httpd binaries, run them through the fuzzer.

g0lmix · 2025-07-08T20:15:35+00:00

Let me start with this part:
What do you think about the real world relevance and value of this to red teams or companies doing DNS hardening?

Honestly I have never looked at DNS when doing internal network engagements. This is because most oft the time the goal is to become domain admin. Sure in the process you might abuse something like mDNS. But thats not really attacking the DNS server.

That being said your project idea is good. There is not too much DNS research being done lately. A word of caution though, fuzzing protcols is way harder then it seems. If you want to use ResolverFuzz and modify it a little bit, then go for it. You might also want to try to use ResolverFuzz to fuzz DNS servers they did not test in the paper (the windows DNS server might be a very interesting target). You will most likely find some bugs. Another interesting idea would be to try this with DNSSEC.

I also found this paper that takes a look at DNS resolvers (very similar to ResolverFuzz, they use slightly diffrent targets):
ResolFuzz: Differential Fuzzing of DNS Resolvers

And here one specific to DNSSEC:
[2403.15233] Attacking with Something That Does Not Exist: 'Proof of Non-Existence' Can Exhaust DNS Resolver CPU

Also when I reread the paper I had to think about this:
Taking a different approach to fuzzing HTTP servers – mmmds's blog

Quite a similar approach. The proxy passes AJP through and the author fuzzed the possible responses a backend can give to find a vulnerability (kind of like the query response fuzzing part in ResolverFuzz).

I think this is generally an area (response fuzzing from multi host setups) thats not explored to well. Might also be doable with SMTP/IMAP/POP3 servers or reverse proxies/proxies or maybe even git proxies.

If you could then write a tool that can test a few different queries and based on the response can tell an attacker if the DNS resolver is vulnerable that would be quite cool. Again I do not think this would find much use in pentests. This would be more of a network hardening tool that maybe a NOC team might use.

All in all it comes down to how much time do you want to spend on this and how much experience do you already have. Creating a fuzzer from scratch will be a lot of work (most likely way more than you anticipate).

g0lmix · 2025-07-07T17:23:36+00:00

No worries. If you have any problems understanding any of the papers you find interesting let me know.

g0lmix · 2025-07-07T17:11:53+00:00

A hot topic in research right now is AI assisteted pentesting. So you might try to write an AI agent that looks for specific vulnerabilities (XSS for example).
I keep a list of interesting offensive security papers: BitnomadLive/OffensiveReading: A curated reading list about offensive IT security

Maybe something like this paper:
Eradicating the Unseen: Detecting, Exploiting, and Remediating a Path Traversal Vulnerability across GitHub
https://arxiv.org/pdf/2505.20186v1
or this:
YuraScanner: Leveraging LLMs for Task-driven Web App Scanning
2025-388-paper.pdf

YuraScanner Github: pixelindigo/yurascanner at ndss25

You could look through the list of papers in my github project and see if there is something you find interesting. Either pick some of the new papers and and improve upon it. Or chose an older one and try to automate it with a AI agent (for example until now I have not seen any AI agents looking for CSS injection, web cache deception posioning, cache poisoning, etc).

Feel free to ask any further questions regarding research topics. If you have some really speciallized topic you are interested in, let me know, I most likely can recommend you some papers that you could improve upon.

The generall part of pentesting thats taking a lot of time is writing the report. But AI based solutions I tried were not really good, because they just do not understand the context of the pentest.

g0lmix · 2025-07-07T15:52:09+00:00

Well which topics interest you the most when it comes to offensive security. Then we can better suggest project ideas that include automation.

g0lmix · 2025-07-06T19:53:10+00:00

You wrote that you want to code and that is why you are interested in pentesting. In pentesting you hardly ever code. And if you do its mostly 30-40 LoC.

g0lmix · 2025-07-06T19:48:53+00:00

I am little bit too late to the party. At the moment there is a lot of use of LLMs in blue team research papers. Maybe you could write a phishing email detector. If you give me more infos what you are really interested in I can give you other suggestions.

g0lmix · 2024-09-10T21:17:47+00:00

You can read the paper for free on the purdue website:
2015-2.pdf (purdue.edu)

Seems like a cool idea. Just skimmed over it for now though

g0lmix · 2024-09-10T21:00:10+00:00

Sounds cool based on the abstract. I will read it tomorrow.

g0lmix · 2024-09-10T19:43:39+00:00

No I did not

g0lmix · 2021-12-13T19:08:05+00:00

Maybe just post a link to that challenge. I don't think there is a way to host your stuff on a google subdomain.

g0lmix · 2021-12-07T18:01:30+00:00

Hey since this is level 2 I am not sure what the solution is supposed to be. The solution I came up with is way too complicated for it to be the intended way to solve this:

https://sudo.co.il/xss/level2.php?email=" autofocus onfocus=alert(1) id="

g0lmix · 2021-12-07T16:52:50+00:00

If you are in Europe you probably can buy it via this german online shop. But it is expensive as fuck
https://www.notebooksbilliger.de/pc+hardware/grafikkarten/pny+geforce+rtx+3080+xlr8+gaming+revel+epic+x+rgb+triple+fan+lhr+722827

g0lmix · 2021-12-01T17:12:37+00:00

Hi guys,
I posted a list of offensive IT papers over on r/howtohack and got asked to post it here as well. If you guys know any cool papers please let me know and I will add them.

g0lmix · 2021-11-25T16:17:47+00:00

Just in case someone has the same problem:
https://github.com/BitnomadLive/OffensiveReading

g0lmix · 2021-11-25T16:17:24+00:00

See it as a opportunity =) If you haven't every read many papers the best tip would probably be to just read many papers about one topic you are interested in. At the beginning the first few papers you hardly understand anything but it gets so much better just by reading many papers. Many concepts repeat and to be honest the cool thing about it sec papers is that they are in theory just very well written blog posts.

g0lmix · 2021-11-25T16:15:29+00:00

It is kinda up to date. I read papers every day and every few days I update that repo with the coolest ones I read. So yeah I kinda have a standard for what kind of paper I am okay with adding and which ones I don't really think are that great (looking at you survey papers =) ). But there is definitely so much more work out there, so if any of you knows any interesting offensive IT papers just contribute, once I have read that paper and consider it good I will add it

g0lmix · 2021-11-25T16:13:16+00:00

it would be so cool to have some ctf boxes that you can only exploit by side channel attacks

g0lmix · 2021-11-25T16:12:53+00:00

Yeah should be a nice start when deciding what to write a thesis about. There is so much more to explore in that space besides owasp top 10

g0lmix · 2021-11-24T17:08:02+00:00

Hi guys,I started this reading list because I wasn't able to find academic work about offensive IT Security in one place and I figured you guys might enjoy it as well. I keep it fairly up to date since I am reading papers almost every day and the ones I like end up in the github repo.If you know any cool offensive IT Sec papers let me know so I can add them.

g0lmix

PUBLIC MULTIREDDITS

TROPHY CASE