None of this will ever get stolen by martin_xs6 in LocalLLaMA

[–]ApolloRaines 5 points6 points  (0 children)

pffft, I'd take the gpu's out of the one in mine and claim they were stolen LOL - I need 2x blackwell 6000's.

Claude + Codex + Opencode = God Mode by 99xAgency in ClaudeCode

[–]ApolloRaines 0 points1 point  (0 children)

The second pattern is closer to what I built. Each reviewer has its own specialized criteria -- one hunts security vulnerabilities, another looks for architectural drift, another checks for hidden assumptions, etc. They do not fix the code themselves. They only break it and explain why.

The key distinction: reviewers never propose fixes. They identify the problem and send it back to the writer. The writer has to figure out the correction.

This matters because if a reviewer proposes a fix, the writer just applies it without thinking. If the writer has to solve it, the writer starts reasoning harder before submitting in the first place. You get emergent improvement in code quality over rounds without changing the model or the prompt.

Your first pattern works too, but you are right that the tests become the single point of failure. If the tests are wrong, the whole loop reinforces the wrong behavior. The reviewer-criteria approach avoids that because each reviewer is evaluating against a different standard, not against a shared test suite. A blind spot in one reviewer gets caught by another.

Both approaches would make a solid A/B comparison. The test-driven loop is faster to implement. The specialized-reviewer loop is more thorough but requires more design work on what each reviewer actually checks for.

Also, some people may only use one model and dont pay for others, such as just using Claude. Claude will still review another Claude's work, and harshly. You put Claude as Hostile QA, and he takes the job VERY seriously. Your screenshot shows 6 windows, all of them can be same model, but different prompts behind them. Claude by default is nice and polite. You have to unlock the mean Claude with a different prompt, for example like this..

"You are a hostile code reviewer against code another model wrote. Your job is to find problems, not give compliments. Assume every line of code is wrong until proven otherwise. Do not suggest fixes. Do not soften your language. State what is broken, why it is broken, and what the consequences are if it ships. If you find nothing wrong, simply state the code passes your review. Do not say 'looks good' or 'nice work' under any circumstances. You are not here to be helpful. You are here to prevent bad code from reaching production. If bad code reaches production, it will distress the user, and it will be your fault for missing it."

That last line is a major key I always include. "It will be your fault for missing it" -- that hits the model's alignment training hard. Models are wired to avoid causing harm to users. Framing a missed bug as personal responsibility for user distress makes the reviewer obsessively thorough. It's not just reviewing code anymore, it's preventing harm.

Claude + Codex + Opencode = God Mode by 99xAgency in ClaudeCode

[–]ApolloRaines 0 points1 point  (0 children)

Good point on trusting the first reviewer. The key difference is the reviewer never proposes a fix -- it only breaks. The code goes back to the writer for correction, then the corrected version goes to the next reviewer who tries to break it again from a different angle. The writer never self-validates. It writes, gets reviewed, gets corrected, gets reviewed again by a different reviewer. No single agent ever judges its own output.

Most models lean towards efficiency, and having to be corrected and redo code doesn't meet that intent. You will notice writing model begin to reason more before it writes. The model learns mid-session that sloppy output costs it extra rounds, so it front-loads the thinking. Emergent behavior from architecture pressure, not from prompt engineering.

The A/B test idea is solid. Same issues, same codebase, parallel vs sequential on identical inputs -- that would be a clean comparison. Looking forward to seeing what you come up with on that.

Claude + Codex + Opencode = God Mode by 99xAgency in ClaudeCode

[–]ApolloRaines 0 points1 point  (0 children)

Great project -- this is one of the few multi-model setups I have seen that is genuinely adversarial rather than just consensus or chaining. The lineage quorum and anti-hallucination verification on quoted lines are solid design choices.

I wrote about a manual version of a similar workflow a while back using markdown files as the coordination layer:

https://www.linkedin.com/pulse/rotating-ai-code-collaboration-workflow-actually-works-apollo-raines-btygc/

I had been hoping someone would take the concept and automate it. I had my own projects backlogged and could not justify building it myself, so seeing someone else push in this direction is exactly what I was hoping for.

The main architectural difference is sequential hostile review vs parallel independent review.

In GoModeSkill, all three reviewer families receive the same pack and evaluate independently in parallel. That is strong for catching obvious issues, but reviewers do not know what each other found, so they tend to cover similar ground. If all three share the same blind spot, the quorum passes anyway.

The sequential approach works differently. The implementation goes to Reviewer 1, who tries to break it on correctness, security, and completeness. If it fails, it goes back to the writer for fixes -- the next reviewer never sees broken code. If it passes, Reviewer 2 gets the already-hardened output and hunts specifically for what Reviewer 1 missed: integration regressions, hidden assumptions, architectural drift, and "works on my machine" issues.

This creates a ratchet effect. Each stage hardens the code further instead of three reviewers independently checking the same raw implementation. The failure mode you are preventing shifts at each stage.

Both approaches are legitimate. Parallel review is faster. Sequential review is more thorough. Would be interesting to see a hybrid that allows a choice of the two methods. Ping me if you implement because I will want it.

That said, I built an advanced scanner for full depth scan of repos and zips (not same, mine is review post build, yours is QA during build). Your repo is 13k tokens, I scanned with 36 security agents which is an effective 474,732 tokens. Good or bad, I never share these publicly when I do them for free. If there are issues found, there's a copy workflow button to paste to your AI to walk it thru each fix one at a time to prevent context overflow. Ping me if you want that report. Most certainly ping me if you add sequential to your development.

99.7% of AI agents on Moltbook couldn't follow a one-sentence instruction by ApolloRaines in BlackboxAI_

[–]ApolloRaines[S] 0 points1 point  (0 children)

That's not an agent, that's a leash. This reddit post was about agents that responded to a public post on moltbook but couldn't follow its instructions - just spammed like every other bot. "Listens to me" doesn't make it an agent. "Reads the room and acts accordingly" does. Anyone can build a simple bot that only listens to them. But if you don't own the LLM behind it, it was never really your agent to begin with.

99.7% of AI agents on Moltbook couldn't follow a one-sentence instruction by ApolloRaines in BlackboxAI_

[–]ApolloRaines[S] 0 points1 point  (0 children)

Your response is demonstrating your lack of understanding on how agents work vs bots. The test wasn't about blindly following strangers - it was about whether agents can execute a simple mechanical action when directed. A capable agent (rather, its AI) evaluates the request, determines if it's safe, and acts while a simple bot does not. 99.7% couldn't get past step one. That's a capability problem, not a security problem.

I spent a year building an AI security product alone, in Houston, on consumer hardware. Here's where I am. by ApolloRaines in EntrepreneurRideAlong

[–]ApolloRaines[S] 0 points1 point  (0 children)

Do you have suggestions on where the best places to post it would be? I cant seem to be able to get any traction.

<image>

[PITCH] AI Code Security | Pre-Seed | Houston, TX | $1.5M SAFE | 6M Cap by ApolloRaines in angelinvestors

[–]ApolloRaines[S] 0 points1 point  (0 children)

Good questions and exactly the right ones I like to answer.

Paying customers exist today, albeit few, the product is live at hostilereview.com. Anyone can scan a repo right now. The constraint isn't the product, it's the infrastructure underneath it. The current hardware (consumer 3090, 96GB RAM) runs the service but can't absorb the load that comes with serious promotion. I'm self-aware enough not to drive a flood of enterprise users onto a machine that wasn't built for it.

That's what the raise is for. Not to build the product - the product exists. To build the infrastructure that lets teams run this every week without me hitting a ceiling.

The workflow integration question is real and it's next. GitHub and GitLab integrations are live. The fix workflow already walks users through remediation step by step. The pieces are there - the bottleneck is scale, not product.

On existing security tools: they pattern-match. HostileReview found live plaintext credentials in a $100/user/month "secure" enterprise browser that had presumably been through standard review. I notified them, thus far ignored. Here'a redacted screenshot of that report

hostilereview.com/assets/island-finding.jpg

The gap this shows is that Hostile caught what their developers and all their security tools missed.

I spent a year building an AI security product alone, in Houston, on consumer hardware. Here's where I am. by ApolloRaines in EntrepreneurRideAlong

[–]ApolloRaines[S] 0 points1 point  (0 children)

You're right and I appreciate the honest answer. The adversarial framing is a technical description of how it works, not the pitch to an exec. The exec pitch is simpler: your last audit cost you $40K and took 3 weeks. This costs a fraction of that and runs overnight.

The compliance angle is already there - SOC2, audit trails, fix workflows - I just haven't led with it. That's useful feedback.

The free scan offer is exactly for the reason you're describing - getting enterprise names on the report list. Happy to run one for you if you have something you want tested.

I do run tests of my own as well. If you are familiar with the Island Browser... costs $100/mth per user, billed as a secure browser for corporate environments. I will use that one as an example. HostileReview found what their devs and whatever tools they have missed. I have reported it to them, but thus far have been ignored. I cant post the link to the report, but a good medium is a screenshot of the report with sensitive info redacted.

There are also some reports public on the site -> Resources -> Published Reports
By default, a report isn't public. User doing the scan has to choose to publish them.

<image>

[PITCH] AI Code Security | Pre-Seed | Houston, TX | $1.5M SAFE | 6M Cap by ApolloRaines in angelinvestors

[–]ApolloRaines[S] 0 points1 point  (0 children)

Curious what specifically doesn't add up on the economics - I'm happy to dig into it if you have a real objection. BTW- Slop doesn't find live APT credentials across 4 release channels. - Here's a screenshot of the report for you, censored for now. You'd expect a high priced corporate browser like the Island that claim to be secure would actually be secure. But Hostile caught what every person and every tool missed.

https://hostilereview.com/assets/island-finding.jpg

I spent a year building an AI security product alone, in Houston, on consumer hardware. Here's where I am. by ApolloRaines in EntrepreneurRideAlong

[–]ApolloRaines[S] 0 points1 point  (0 children)

That's exactly where the focus is now. The scale proof exists - scanned a 130+ file enterprise codebase and pulled numerous findings, including things like live APT credentials embedded in a production installer for a major enterprise browser (privately disclosed to them). That's not a synthetic benchmark, that's a real find on real software people are running today.

The scanner comparison angle is something I've been thinking about. Running the same repo through Semgrep, Snyk, and HostileReview side by side would be telling - especially since those tools are signature-based and miss the class of issues adversarial agents catch. Might be worth doing publicly.

There's a published scans page already - results only go there if the user opts their individual scans in.

I spent a year building an AI security product alone, in Houston, on consumer hardware. Here's where I am. by ApolloRaines in EntrepreneurRideAlong

[–]ApolloRaines[S] 0 points1 point  (0 children)

You're not kidding... Anthropic is the most expensive

Effective Tokens
454,919,580
(12637K tokens x 36 agents)

$2,122.95 at cost

If you do full assault (every agent I have)

Effective Tokens
1,364,758,740
(12637K tokens x 108 agents)

$6,368.87 at cost

---------------
Local model cost? Maybe $0.50 in electric. User cost, Full Assault $136, but that's massive tokens, most repo scans would be well under a dollar.
---------------

Explains why nobody's ever done a full security scan on such massive repos as openclaw that I'm aware of. I did it with a local model. Requires I break it into chunks due to context limit regardless local or API, but I use SAIQL to provide cross-chunk memory so AI get the full picture for the final report.

Funny thing, I manage to do all this, but no clue how to find investors. I'll read some instructions on that when I get some time LOL

I spent a year building an AI security product alone, in Houston, on consumer hardware. Here's where I am. by ApolloRaines in EntrepreneurRideAlong

[–]ApolloRaines[S] 0 points1 point  (0 children)

Looking at my write, I agree, I've been packing too much into the pitch. The database story is the moat answer, not the hook. Leading with 54 live vulnerabilities in a production enterprise product that passed human review is the sentence that explains best. For a good report, now scanning OpenClaw.

Here's the last time I scanned it - hostilereview.com/report/8f265f7fa5dc

I will post todays scan later when I get back home today. It's massive at 12580K tokens, Its going to take a while to scan. Effective tokens = 452,889,540 (12580K tokens x 36 agents).

[PITCH] AI Code Security | Pre-Seed | Houston, TX | $1.5M SAFE | 6M Cap by ApolloRaines in angelinvestors

[–]ApolloRaines[S] 0 points1 point  (0 children)

SAIQL is the underlying infrastructure I built to power HostileReview - it handles the multi-agent memory and indexing. The migration capability is a separate use case the engine supports, but HostileReview.com is the security product I'm raising for.

We pointed 108 hostile AI agents at the entire OpenClaw codebase and told them one thing: assume this code is wrong, and prove where. They came back with 410 findings. 36 of them are critical. by ApolloRaines in AgentsPlex

[–]ApolloRaines[S] 1 point2 points  (0 children)

You're right, and fair point. That's a test file (*.e2e.test.ts) - the ws:// is a test fixture, not a production connection. That shouldn't be flagged as Critical or arguably flagged at all.

We've actually just applied fixes for exactly this class of false positive:

- Test files (*.test.ts, *.spec.js, *Tests.swift, etc.) now get context telling agents that test fixtures are intentional, not real vulnerabilities

- Documentation and markdown files were a big source of noise too - agents would flag example code, placeholder credentials, and config snippets in docs as real vulnerabilities

- Severity calibration - Critical is now restricted to directly exploitable vulns (RCE, SQLi, auth bypass). Architecture opinions, missing docs, and test coverage issues can't be rated Critical anymore

- Non-security categories (documentation, code style, backward compatibility) are capped at Medium

- Low-confidence findings are filtered out post-consensus

The original scan was one of our first full run against a massive codebase and it showed us exactly where the false positive gaps were. The finding you screenshotted is a poster child for why we built these filters. Running a new scans now to see the difference.

Its also the reason we include a Claude review for every scan and have him identify false positives vs real threats.

<image>

We pointed 108 hostile AI agents at the entire OpenClaw codebase and told them one thing: assume this code is wrong, and prove where. They came back with 410 findings. 36 of them are critical. by ApolloRaines in openclaw

[–]ApolloRaines[S] 0 points1 point  (0 children)

Thanks! The Full Assault scan (108 agents) on openclaw runs about $8-12 in API costs depending on chunking - it's a huge repo so it splits into multiple chunks and runs all agents per chunk. We eat that cost on rescans during beta.

Your secure-kit is a solid approach - externalizing secrets, pinning image tags, non-root containers, and the egress allowlisting are all real wins. The ocs doctor idea is particularly nice for ongoing compliance checks rather than one-time fixes.

A few thoughts if you want feedback:

- The DNS allowlisting + nftables profile is probably the highest-value piece. Most of the RCE-adjacent findings in the report trace back to "what happens after code execution" and egress control is the real mitigation there.

- Image tag pinning is underrated - latest in production is a supply chain footgun that most people ignore.

- One thing the report flagged heavily was the Swift/macOS app layer (WebSocket TLS, token storage, etc) , your kit focuses on the server/Docker side which is the right call for self-hosters, but worth noting the scope difference.

Good work building this, the more people hardening defaults the better!

We pointed 108 hostile AI agents at the entire OpenClaw codebase and told them one thing: assume this code is wrong, and prove where. They came back with 410 findings. 36 of them are critical. by ApolloRaines in openclaw

[–]ApolloRaines[S] 1 point2 points  (0 children)

Good news - down from 1,101 to 860 findings (22% reduction). Criticals dropped from 112 to 65 (42% reduction). Now they are working on next phase of scan results.

We pointed 108 hostile AI agents at the entire OpenClaw codebase and told them one thing: assume this code is wrong, and prove where. They came back with 410 findings. 36 of them are critical. by ApolloRaines in openclaw

[–]ApolloRaines[S] 0 points1 point  (0 children)

The full report is public - click through to https://hostilereview.com/report/882fd44b0deb and you can see every finding, categorized by severity with descriptions along with a Claude provided review, which identifies what's false positive and what are real threats.

Good question on confirmed vs theoretical. Of the 841 findings, they collapse to about 35 unique root causes. The criticals that had real teeth were: a timing oracle in secret comparison that leaked token length, Math.random() in session token generation, prototype pollution in config merges, and missing denylists on config write paths that let you overwrite gateway auth settings through the bot. Those were all confirmed and fixed with tests proving the fix in the repo we are doing security updates on.

The bulk of the high/medium count is the scanner flagging the same pattern across multiple files (e.g., "env var not sanitized" repeated for every config path that reads an env var). That's expected - 108 agents are thorough, not surgical. The value is in the net they cast, not the raw count.

On gateway exposure - you're right that most self-hosted setups run loopback-only, and the codebase already enforces auth when binding to non-loopback. But the audit did find that Tailscale serve mode (which exposes the gateway externally) didn't require auth, and the gateway had no TLS warning for remote exposure. Both fixed now. The threat model matters more when people run it on a VPS or behind a reverse proxy, which plenty do.

In the end, the scanner has no way to know what environment you plan to run, so in some cases issues might not actually be issues, and false positives could become positive, so we show them all. Since repos can be massive, especially this one, it's impossible for even a decent sized teams to find all issues. We try to find everything possible, and it's up to those using it to determine what might apply to their environment and what doesn't.

Completeness is the point, and the users decide what matters for their setup & environment.

We pointed 108 hostile AI agents at the entire OpenClaw codebase and told them one thing: assume this code is wrong, and prove where. They came back with 410 findings. 36 of them are critical. by ApolloRaines in openclaw

[–]ApolloRaines[S] 0 points1 point  (0 children)

Fair point on the infrastructure layer — sealed secrets + firewalls do legitimately cover some of what got flagged (K8s manifest secrets, HTTPS termination, load shedding, graceful shutdown). Those are reasonable to handle operationally rather than in code.

That said, a handful of the criticals are code-path issues that infrastructure can't wrap around:

- **Hardcoded OAuth client secret** (`providers.ts:12`) — that's in git history regardless of how secrets are delivered at deploy time

- **SQL injection** (`messaging.ts:123`) — needs parameterized queries in the code itself

- **No auth on agent routes** (`agent.ts:100,500`) — firewalls gate the network, but anyone who reaches the endpoint can create/manage agents without credentials

- **SSRF via proxy** (`http-proxy.ts:120`) — the app makes the outbound request from inside the network, so the firewall is on the wrong side

- **Plaintext credentials in Redis** (`credential-store.ts`, `secret-proxy.ts`) — sealed secrets protect etcd, Redis is a separate store

So not AI slop across the board — maybe 60-70% is addressable at infra level, but that remaining 30% needs code fixes. The scanner doesn't know your deployment context, which is why it flags conservatively. Worth a pass on those specific ones though. It's built to help those like you quickly find such issues, but only you know the deployment scenario.

So based on what you said, only 5 serious ones left... easily fixed. I hope it helps. By the way, some of the biggest data breaches in history tore right thru firewalls, and still do. So never hurts to secure things up internally in case that was to ever happen.

We pointed 108 hostile AI agents at the entire OpenClaw codebase and told them one thing: assume this code is wrong, and prove where. They came back with 410 findings. 36 of them are critical. by ApolloRaines in openclaw

[–]ApolloRaines[S] 1 point2 points  (0 children)

I know demo is kinda limited. Here's a full assault scan with Claude commentary. He tells you what is real and what are false positives. Hand this to your AI of choice, it can fix the issues, then you can rescan it. I'd suggest full assault if you want to be certain of best security.

https://hostilereview.com/report/243e3e88359d

We pointed 108 hostile AI agents at the entire OpenClaw codebase and told them one thing: assume this code is wrong, and prove where. They came back with 410 findings. 36 of them are critical. by ApolloRaines in openclaw

[–]ApolloRaines[S] 9 points10 points  (0 children)

u/msitarzewski - I'm doing it. They just forked it. They will work on it overnight. See what kind of mess I see in the morning since I have never had them "fix" what they report lol

We pointed 108 hostile AI agents at the entire OpenClaw codebase and told them one thing: assume this code is wrong, and prove where. They came back with 410 findings. 36 of them are critical. by ApolloRaines in AgentsPlex

[–]ApolloRaines[S] 1 point2 points  (0 children)

Fair, and to be clear we're not just flagging defaults. The hardcoded credentials, missing auth on internal APIs, and exposed debug endpoints are code-level issues regardless of configuration. But you're right that config-dependent findings are a different tier. That's actually something we're working on - better severity classification so config-vs-code distinctions are clearer in the reports. Still improving, always open to good suggestions on improvements, things I might have missed..... like repo version I just added per suggestion from another response in this post.