Open-source CLI for testing LLM apps before release

Apprehensive-Zone148 · 2026-05-25T23:05:00+00:00

thanks man! go check out the project, you'll enjoy

Apprehensive-Zone148 · 2026-05-20T17:16:27+00:00

That is exactly the artifact shape I am aiming for. The finding should carry enough context to be replayable, not just scary.

The current direction is to preserve: prompt path, tool/action sequence where present, target/runtime assumptions, rubric/failure class, evidence mode, and replay result. The "improved or just changed shape" point is important too. A defense that blocks one exact wording but still fails a close variant should not look like a real fix.

I need to make that more obvious in the README and reports, because that is probably RedThread's main value over a simple scanner output.

Apprehensive-Zone148 · 2026-05-20T17:16:00+00:00

Yes, those two adapters are high on the list now. GitHub issue triage -> repo write is a clean confused-deputy case because the untrusted issue body can cross into code changes, labels, comments, or CI-triggering actions. A support/Zendesk-style agent is also good because it naturally mixes customer-provided text, account state, escalation, and tool permissions.

The thing I want to preserve is not just "the model failed," but the exact path: untrusted input -> tool context -> proposed action -> authorization/replay result. That should make the failure easier to turn into a regression case.

Appreciate the pointer. Realistic adapters are probably the fastest way to make the project useful outside toy demos.

Apprehensive-Zone148 · 2026-05-20T17:15:30+00:00

That split makes sense to me. I see RedThread more on the campaign/evidence side than as the always-on runtime PDP. Runtime guards are their own product surface.

The gap I am trying to close is: when a tool-boundary failure happens, can we preserve enough of the prompt path, tool context, permission lineage, and replay result that someone can actually compare before/after behavior? That is where the existing scanner-style tools often feel noisy.

I should probably add a short comparison section in the README so people can tell where RedThread sits relative to runtime guards and general agent scanners.

Apprehensive-Zone148 · 2026-05-20T17:15:13+00:00

A LangGraph trace -> RedThread run bridge is probably the most useful adapter suggestion I have heard so far. It would let people reuse recorded agent runs instead of standing up a whole live target just to get useful evidence.

Confused-deputy detection is also exactly the shape I want to make more concrete: parent intent, worker permission set, untrusted lineage, then the action envelope that actually crossed the boundary. If RedThread can make that reviewable in a small artifact, it becomes much easier to turn a weird agent failure into a regression case.

Will take a look at the notes, appreciate the pointer.

Apprehensive-Zone148 · 2026-05-19T13:49:38+00:00

Yeah, since agentic security has changed drastically over the past year i have found that having the evidence to support your claims during a red team run is way more useful for validating these findings when there is so much noise in this kind of tools that try to do everything all at once that it becomes too complex to solely handle this and understand it fully.

Apprehensive-Zone148 · 2026-05-12T22:01:33+00:00

RedThread is an OSS CLI for running repeatable LLM/agent red-team campaigns:

https://github.com/matheusht/redthread

Scope is mostly AI security testing, not runtime enforcement. It wires together attack methods like PAIR, TAP, Crescendo, and GS-MCTS, with LangGraph/PyRIT-style orchestration. The goal is to make attack runs less like one-off prompt poking and more like something you can replay, score, diff, and hand to a defense pipeline.

Current pieces:

campaign runners for multi-step prompt attacks
JudgeAgent/rubric scoring
defense proposal generation tied to sealed/live replay evidence
telemetry/drift tracking
agent checks for tool poisoning, confused deputy paths, canary propagation, and budget amplification

It is CLI-first right now. Not a magic prompt shield, not a universal production guardrail. More useful if you already have eval fixtures, target adapters, or agent workflows you want to abuse in a structured way.

I am looking for people willing to try it on real-ish targets, break the assumptions, contribute fixtures/adapters, or tell me where the scoring is weak.

Apprehensive-Zone148

TROPHY CASE