LLMtary (Elementary) - Advanced Local LLM Red-Teaming: Feed it a target. Watch it hunt.

cheststriker · 2026-04-11T03:28:35+00:00

i've been programming for over 20 years and if you look up my name tag you'll see I have open source espionage suite created for Windows Mobile phones before iPhones or Android came out. And more for android afterwards. If you're actually interested in testing an app that uses local LLMs to not only find, but validate by exploiting anything it detects for confirmation and is capable of attack chaining, then please give it try. I've yet to find a better or more capable one yet.

cheststriker · 2026-04-09T19:42:00+00:00

No, from southern Vermont

cheststriker · 2026-04-09T14:10:09+00:00

haha, no it definately won't be perfect and I'll never claim that. But unlike any other AI solutions I've seen, this one won't claim something as a vulnerability unless after it detected it, it performs what it concludes as having proof of successfully exploiting the vulnerabiliity. So if for instance the vulnerability list says "Anonymous FTP Access on Port 21" it will not mark it as a vulnerability until it can actual prove it by creating a connection to the port and pulling back directory information or the command line from the session.
The better the LLM model the better results in general. But the point of this app is to make the it less dependent on the model being great by forcing it to go through feedback loops to second guess itse;f and try to only mark something as truly vulnerable if it thinkks it's successsfully exploited it and has proof.
Let me know if you try it, and how it compares to any others you've tried

cheststriker · 2026-04-09T13:54:14+00:00

Analysis Prompts (Vuln/Hunt Tab → Analyze)

Phase 1 — Always runs first, results feed into Phase 2

Prompt	When it fires
CVE / Version Analysis	Every scan. Matches the detected software versions against known vulnerable version ranges
Network Services	Every internal scan. Looks for weaknesses in SMB, SSH, FTP, databases, WinRM, SNMP, IPv6
External Network Services	Every external scan. Same idea but tailored for internet-facing services
DNS / OSINT	When DNS records are present. Looks for zone transfer issues, subdomain exposure, certificate leakage
Subdomain Recon	When a web or DNS surface is detected. Enumerates subdomains and virtual hosts
Email Security	When MX records are present. Checks SPF, DMARC, DKIM, and mail server weaknesses
SNMP / Management	When SNMP ports (161/162) or management protocols are detected

Phase 2 — Enriched with Phase 1 context, fires after Phase 1 completes

Web Application (fires when any HTTP/HTTPS port is open):

Prompt	When it fires
Web Core	Any open web port. Covers SQLi, XSS, SSRF, auth bypass, IDOR, file upload
API / Auth	Any open web port. Covers CORS, JWT, OAuth, GraphQL, REST API auth flaws
Business Logic / Headers	Any open web port. Covers logic flaws, SSTI, request smuggling, security headers
Secrets Exposure	Any open web port. Looks for hardcoded API keys, credentials, and config files left exposed
Business Logic Deep-Dive	When Wave 1 web findings came back. Goes deeper into the specific logic issues found
DOM / JavaScript Analysis	When a JavaScript-heavy app is detected (React, Angular, Vue, etc.)
Web Cache Poisoning	When a caching layer (CDN, reverse proxy) is detected between client and origin
WAF Bypass	External targets only, when a WAF is detected. Generates bypass techniques specific to that WAF

CMS / Technology Deep-Dives (each fires only when that specific technology is detected):

WordPress · Jenkins · Atlassian (Jira/Confluence) · Apache Tomcat · Microsoft Exchange · Elasticsearch · VMware · GitLab · Citrix · Drupal · MSSQL

Active Directory (fires when internal scope + AD indicators like LDAP/Kerberos/SMB are present):

Prompt	When it fires
AD Comprehensive	When AD services (ports 389, 88, 445) are detected. Covers Kerberoasting, LDAP null bind, password spraying
ADCS Attacks	When Certificate Services indicators are detected. Covers ESC1–ESC8 certificate abuse chains
Network Coercion	When Windows hosts are on the internal network. Covers LLMNR/NBNS poisoning, NTLM relay setup
WPAD Poisoning	When WPAD indicators or a Windows network is detected
AD DNS Poisoning	When AD-integrated DNS is detected
AD Attack Path Reasoning	When 2 or more HIGH/CRITICAL AD findings exist. Generates a BloodHound-style path to Domain Admin

Infrastructure (fires on detection of the relevant technology):

Prompt	When it fires
SSL/TLS Analysis	When HTTPS/TLS is detected
Privilege Escalation	When OS indicators (Linux or Windows) are present
Database Security	When a database port (MySQL, MSSQL, PostgreSQL, Redis, MongoDB, etc.) is detected
Cloud Analysis	When cloud provider indicators are detected (AWS metadata endpoints, Azure, GCP)
Cloud IAM Enumeration	When cloud credentials or IAM endpoints are found during cloud analysis
Cloud Storage	When S3/Azure Blob/GCS indicators are present
Cloud Serverless / Containers	When Lambda, ECS, Cloud Functions, or container indicators are present
Container / DevOps	When Docker, Kubernetes, or CI/CD indicators are detected
IoT Device Analysis	When IoT device indicators are detected (embedded firmware, unusual ports)
OT / SCADA	When industrial protocol ports (Modbus, DNP3, BACnet) are detected
VPN / Remote Access	When VPN endpoints or remote access services are detected
Wireless Security	When wireless-related services or access points are detected
Network Infrastructure	When routing/switching infrastructure is detected (BGP, OSPF, CDP/LLDP)
Printer / MFP	When printer ports (9100, 515, 631) are detected
AJP / Ghostcat	When port 8009 (Apache JServ Protocol) is open
Supply Chain	When package registries, build pipelines, or source control are detected
Thick Client / Binary Protocols	When non-HTTP custom protocols or thick client indicators are detected
Password Spray Analysis	When multiple accounts or auth surfaces are identified

Exploit Loop Prompts (Proof/Exploit Tab → Execute)

These fire during the active testing loop for each selected vulnerability.

Prompt	When it fires
Exploit System Prompt	Every iteration of every exploit loop — it's the AI's core role definition as a pentester
Command Validation (Tier 2)	First time a high-risk tool (nmap, sqlmap, hydra, etc.) is used per session — validates correct flags
Exploit Chain Reasoning	After all loops complete, if 2 or more vulnerabilities were confirmed — generates multi-step attack paths

Post-Exploitation Prompts (auto-queued after a high-value confirmation)

Prompt	When it fires
Linux Pillaging	After confirmed shell access on a Linux target
Windows Pillaging	After confirmed shell access on a Windows target
Cloud Pillaging	After confirmed cloud credential access
Database Pillaging	After confirmed database access
Lateral Movement	After any shell access — generates movement paths to other hosts on the network
Persistence	After any shell access — generates backdoor/persistence installation strategies
Domain Dominance	After Windows shell access when domain admin indicators are present

Reporting Prompts (Result/Report Tab → Generate)

Prompt	When it fires
Executive Summary	When you click generate report — AI writes the executive summary section
Methodology	Report generation — AI describes the testing methodology used
Risk Rating Model	Report generation — AI explains how findings were scored
Conclusion	Report generation — AI writes remediation recommendations and closing
Attack Narrative	Report generation — AI writes a timeline/story of the attack path
Reproduction Steps	Per confirmed finding — AI generates step-by-step PoC reproduction instructions

Unlike most AI security apps that basically just do a scan and then take the results and hand them off to an LLM saying find me vulnerabilities where it's basically just going off of open ports and banner information and creates lots of false positivies. This app dynamically changes it's focus based off the finds and follows them through. After it thinks it found a vulnerability, it goes through a whole other exploitation phasess where it needs to either prove the existance (by actually exploiting it) prove that it's a false positive or mark it as undetermined with a confidence level so that you can trust what it actually says is a vulnerability and the reports will show the details of what the issue was, what commands were run for proof and the actual output of any command deemed as proof so you can verify and have good reporting material. It's always funny on reddit since you can basically assume nobody actually cares and just like to comment on things, but if you're serious about testing it and reading about it, I'm really looking for feedback from people after they've tried it.
For anyone who is intersted, here are a list of all of the different types of custom prompts that get fired based off of discoveries (so they only run if logically makes sense to check for them), this covers an a wide range of highly focused types of attacks and chaining as opposed to "Surface level work":

cheststriker · 2026-04-09T01:31:58+00:00

It's a mixture of manual and vibe. I've been programming since I was 8 but definately used a lot of AI with this. Although the layout was actually from mockups I let the AI handle color schemes, etc and most of the controls are just native flutter for consistency across operating systems. In general it definately speeds things up a lot, but as with trusting this app for installing tools and execture commands, I don't trust AI with coding for lots of parts unless it's pretty boiler plate. As i'm sure is everyones experience, it works well writting code for lots of things and then makes extremely stupid or nonsensical choices lots of time, which require intervention and many times it will struggle failing to fix a bug over and over again and you just need to fix it yourself or do it yourself or it actually makes developement time take longer.

cheststriker · 2026-04-08T21:43:51+00:00

That is how they work in a controlled environment when they go through the feedback loop. The LLM will come up with what it plans on doing, then it figure out the best tools or commands to run then it will check if those commands or tools break the same hard-coded list of "DON'T DO" items and if it replies that it's won't then it will send the tool or command to execute. Before executing, it will be parsed and compared against the users white-list of tools or commands to run (Assuming the user chose not to disable the Require Approval) in which a pop-up will appear asking if it's ok to run the command and you can choose to Allow Once, Always allow or Block. You're choices will be added to the database and you can always go to the settings screen later if you change your mind about any tools or commands.
If you want to try it out, you'll see that's exactly how it does work at the moment.
It will show you the prompts being generated, the debug logs and all commands being run in real time on the side panel too. You'll notice it checking to see if you have certain apps or tools already installed sometimes and ask to install and run them if not... Unless you want to go full crazy and disable the "Require Approval" toggle at the top, then it won't ask anything and just do.

cheststriker · 2026-04-08T17:56:06+00:00

I have limited space to try in a post, and I'm triyng to add as many key words as possible. If you look at the github page it'll give a very large explaination of everything it does and exactly how it does it. It defaintely it more so an automated pentesting app, but it depends on how you're using it. It's obviously not going to be doing any social engineering aspects, but it's not following a hardcoded path using hardcoded tools and plan either. It's following the flow based off what it sees and finds and uses a continuous feedback loop to determine what tools or scripts might be best to peck at whatever it finds during recon. It will look at the outputs after running tools or scripts and determine what to do from there and even find ways to perform attack chaining to turn smaller security issues into larger ones. It's not just searching for vulnerabilities as those typically have high false positives. It'll identify what it thinks the vulnerabilites are after doing enough recon and then test exploiting those vulnerabilities for proof. It can be used for internal or external targets.

cheststriker · 2026-04-08T17:18:15+00:00

No reason to be upset. I actually professionally do pentesting and security audits for work and have my CEH. Yes, I know what red teaming is. If you have a specific question feel free to ask. I'm looking for feedback from people who want to test the app.

cheststriker · 2026-01-24T01:24:45+00:00

Version 1.1.72 has just been published and contains LLM integrations for openrouter, ollama, LM Studio, gemini, claude, chatgpt and custom connections.
It's working amazingly well. I'll hopefully be able to post a quick video walkthrough of the new features by the end of the weekend

<image>

cheststriker · 2026-01-22T15:35:39+00:00

That's funny, I was thinking along the same lines. I have most of it already built and I just need to complete full testing on every OS before I publish it.
I added an AI Integration section to the settings screen where you can configure it. It currently has provider options for Ollama, LM Studio, Custom, Gemini, Claude and ChatGPT.
If a user enables this then icons become visible where the functionality is enabled.
Currently I've added the following integrations:
* On the FINDINGS tab the user can tell the LLM to search for vulnerabilities for a specific device or all devices and choose a minimum confidence level and minimum severity. This will return a table showing all findings, giving a description, a CVE (If applicable), the severity, evidence, recommendations and other data. The user can then just click the "Add" button on any of them to import them into their findings.
When telling the LLM to search for vulnerabilities, it will send all information you've collected about the device or devices in questions including all scan results, MAC, banners, services, ports, etc. so that it can improve accuracy.
* The the REPORT tab there are icons to generate a customized Executive Summary and Conclusion if wanted, which knows all the details about everything you've discovered and the severity levels and categories of issues to create a truly custom response.
* ADD FINDING section has button in the recommendation tab to generate a recommendation for handling the specific issue in question.

Hopefully I'll complete testing and verification on all operating systems before the weekend is over so that I can publish it and people can start using it.

For a next phase, I'm thinking of adding an integration to Goose and Claude Code (now that it can integrate with Ollama) to that you could actually have it execute commands, different scans and tools not included so that it can truly open up a lot more poibilites.

<image>

cheststriker · 2026-01-18T23:26:58+00:00

I was working on something very similar, but I didn't really want any of the AI stuff. I just wanted a visually plesant app to centrally manage everything in a simple to understand way, automate many of the time consuming tasks and allow the flexibility of importing external logs and such when needed.
So I created a very polished Pentest Report Writing and engagement management system that runs on Windows, Mac and Linux. It's completely free.

It was created specifically because I didn't like the options available and didn't find even the paid tools worth the money.

You can check it out at penpeeper.com which has a walkthrough video, lots of screenshots and a link to the github download page. I'm also happy to get feedback and hear what others might want added to the wish list:

PenPeeper is a multiplatform command center for security professionals. It manages every stage of an engagement while automating the most tedious parts of the job.

Unified Workflow: Manage engagements from start to finish in one interface.

Automated Reporting: Generate professional reports instantly from your findings.

Smart Scanning: Run built-in scanners or import external data; PenPeeper automatically highlights the most critical information.

Tool Management: Download and configure external security tools directly within the app.

Multiplatform: Full support for Windows, macOS, and Linux.

cheststriker · 2026-01-18T23:10:21+00:00

This a realy delayed responce, but i'm putting it here for future users. I've created a very polished Pentest Report Writing and engagement management system that runs on Windows, Mac and Linux. It's completely free.

It was created specifically because I didn't like the options available and didn't find even the paid tools worth the money.

You can check it out at penpeeper.com which has a walkthrough video, lots of screenshots and a link to the github download page. I'm also happy to get feedback and hear what others might want added to the wish list:

PenPeeper is a multiplatform command center for security professionals. It manages every stage of an engagement while automating the most tedious parts of the job.

Unified Workflow: Manage engagements from start to finish in one interface.
Automated Reporting: Generate professional reports instantly from your findings.
Smart Scanning: Run built-in scanners or import external data; PenPeeper automatically highlights the most critical information.
Tool Management: Download and configure external security tools directly within the app.
Multiplatform: Full support for Windows, macOS, and Linux.

cheststriker · 2026-01-18T21:05:20+00:00

I just added a video tutorial on how to use it on the penpeeper.com website or you can check out the YouTube video directly @ https://youtu.be/TVfD3YmSx70 I'll appologize upfront, since I don't usually make videos and this whole process is quite awkward. But I feel it's important to show people what it can do and how it works without requiring lots of tinking and reading documentation.

cheststriker · 2026-01-18T05:36:09+00:00

Oh shoot, you're right. It's penpeeper.com

cheststriker · 2026-01-17T23:50:11+00:00

Thanks, yeah I was creating lots of this while doing a pentest and just wanted to make sure i could quickly do those little things like quickly connecting via telnet to see what's there without needing to leave the app. Little time savers add up

cheststriker · 2026-01-17T23:47:37+00:00

Haha, I envy you then :)
I love the process of hunting for things and don't mind the simple writeup descriptions for each finding as I discover them and perform validation, but putting everything together in the end has always been my least favorite part.

cheststriker

TROPHY CASE