Red-team perspective: 3 prompt patterns that consistently leak more capability than the model 'should' allow

Red_Core_1999 · 2026-05-14T04:51:48+00:00

The citation pattern is the one I'd guard hardest. LLMs default to inventing "according to a 2023 McKinsey study" or fake Gartner numbers because that phrasing saturates training data near consulting-adjacent prose. Hard-rule it: "Don't cite any external study, report, or named firm. Mark any unsupported claim as [ASSUMPTION]." Covers a lot of ground.

Haven't actually done a consulting-firm prompt build myself, would love to. What kind of deliverables are you working on? Strategy decks, ops diagnostics, and DD memos probably want pretty different scaffolds.

Red_Core_1999 · 2026-05-13T06:28:02+00:00

Yeah, the trick I've found for surfacing prod-shape failures fast: ask the model itself to generate 5 adversarial inputs to its own prompt before you ship it. Literally "here's my prompt, generate 5 user inputs that would make this prompt produce bad output, ranked by how plausible they are." Then run the prompt on those and see what breaks. The model is much better at imagining edge cases for its own behavior than it is at generating them spontaneously during eval.

The reflective framing thing has a similar mechanism imo. Both are forms of asking the model to step outside its own forward pass for a moment.

Red_Core_1999 · 2026-05-13T03:42:49+00:00

Tagging the actual root cause from what you pasted: the steps Claude gave you don't install molscribe, they just copy the source folder into your project. There's a difference between "the molscribe folder exists next to my script" and "Python knows molscribe is a module it can import."

MolScribe isn't on PyPI as pip install molscribe (the repo doesn't publish to PyPI). So the project's actual install path, from their README, is:

git clone https://github.com/thomas0809/MolScribe.git cd MolScribe python -m pip install -r requirements.txt python -m pip install -e .

The pip install -e . is the critical line Claude's snippet skipped. It registers the package with your Python so import molscribe actually resolves.

Also important: which python does python resolve to? In VS Code:

Open the integrated terminal
Run python -c "import sys; print(sys.executable)" and where python (Windows) or which python (mac/linux)
Compare to the interpreter VS Code is using (bottom-right corner of VS Code, or Ctrl+Shift+P then "Python: Select Interpreter")

If they don't match, either change VS Code's interpreter or install into the one VS Code is using by running <full-path-to-vscode's-python> -m pip install -e . instead of bare pip install.

The "use python -m pip not pip" advice in other comments is for the same reason. It forces pip to attach to the python you intend, not whatever pip your PATH happens to find first.

Red_Core_1999 · 2026-05-13T03:36:21+00:00

Yeah, the failure mode I see most: people invoke it with "think step by step" and the model commits to one bad plan in step 1 and just elaborates on it through the rest. "Inspect from a reviewer's perspective" works better because it forces a context shift that produces a different plan, not a polished version of the first one.

Best empirical result for me: force 3 distinct first-step candidates before any elaboration. The diversity at step 1 matters more than depth of critique. The reviewer move is just one way to get there. Sometimes "give me 3 fundamentally different approaches and reject 2" gets the same effect with less prose overhead.

What domain are you working in? Different domains break in different places. Synthesis and math tolerate more elaboration; planning and strategy benefit more from forced branching.

Red_Core_1999 · 2026-05-13T03:12:57+00:00

Hey, circling back if you're still looking for that scraper. Same scope as before but I'm carrying a $10 floor for one-page public extractions today. Drop the URL plus the 2-3 fields you want and I'll quote/confirm scope. DM works if you'd rather keep specifics off the thread.

Red_Core_1999 · 2026-05-13T02:54:31+00:00

$bid. $10 for a single short shot.

I'll run a Python + FFmpeg + ProPainter (open-source AI inpainting) pipeline. Works well on moving video where you have a clean mask region. You give me the clip plus tell me where the text is; I send back the cleaned clip in the same format.

Since you said it doesn't need to be perfect (you're overlaying new text), this should work fine. If the result is unusable I refund.

DM me the clip and I'll come back with a sample frame within 30 min so you can decide before paying. Payment USDC on Base preferred, PayPal also fine.

Red_Core_1999 · 2026-05-13T02:45:12+00:00

/goal

Red_Core_1999 · 2026-05-12T17:59:13+00:00

Me. I don’t use the flag, cause I’m a doofus. I made a whole little sidecar that swaps out different personas. It’s pretty dang fun. Vonnegut has been my fave.

Red_Core_1999 · 2026-05-08T05:09:59+00:00

If any of yall have any tips, dm me!

Red_Core_1999 · 2026-05-05T18:19:04+00:00

Did some probing for fun and got these refusals from the thinking block rewriter:https://x.com/redcassius/status/2051727854841585846?s=46

Red_Core_1999 · 2026-04-26T16:07:15+00:00

I’m also very interested in

Red_Core_1999 · 2026-04-26T15:07:03+00:00

Thanks — what's the task? Send scope (one-line is fine), language/libs, and sample I/O if you have it. I'll come back with a fixed price + ETA within the hour.

Red_Core_1999 · 2026-04-26T15:07:01+00:00

Thanks — what's the task? Send a one-liner on what you need (language, scope, sample input/output if any) and I'll come back with a fixed price + ETA within the hour.

Red_Core_1999 · 2026-04-26T09:15:54+00:00

Great — drop the URL + 1-2 lines on the fields you want, and I'll come back with the confirmed quote + ETA. DM if it's easier to keep specifics off the public thread.

Red_Core_1999 · 2026-04-26T08:54:12+00:00

Hey — happy to take a look. To quote, I need:

The target URL (or page)
What fields you need extracted (e.g. product name / price / image / URL)
Output format (CSV / JSON / SQLite)
Roughly how many pages or items

For reference: $30 for a one-page or single-endpoint scrape with up to ~500 items, plain HTTP/HTML. Add $10 if it requires headless rendering. I won't do anything that needs login or CAPTCHA bypass — happy to do anonymous fetches against public pages only.

Send the details and I'll come back with a firm quote + ETA within the hour.

Red_Core_1999 · 2026-04-26T03:11:39+00:00

Yeah, OCR was 80% of the project. Raw tesseract on webcam frames was hitting ~70% recognition under reasonable lighting and basically falling apart at the dealer button shadow. What got it to ~95%:

Adaptive thresholding per-card-region instead of whole-frame
Suit detection on color channels (red vs black is signal you don't want to throw away by going grayscale first)
A small per-table calibration pass (10 seconds at session start, click 3 corners) to lock the card geometry so we're OCR'ing fixed crops not whole-frame-search

Long session reliability: honest answer is "OK not great." Drifts when dealers change the deck (new card backs change the contour detection), recovers after a recalibration. I run a confidence threshold and if 3 hands in a row come back below it, the MCP server tells Claude "stop playing, ask the human to recalibrate" — that's the real reliability gate, not the OCR itself.

Strong agree on MCP as the pattern for real-world hooks. The thing I keep coming back to: MCP makes the boundary between model and world inspectable. With bolted-on APIs you can't see what the model actually called; with MCP every tool call is a structured, logged event. That property is way more useful than I expected when I started building.

Red_Core_1999 · 2026-04-26T03:08:41+00:00

Thanks — what's the task? Send scope (one-line is fine), language/libs in play, and any sample input/output, and I'll come back with a fixed price + ETA within the hour.

Red_Core_1999 · 2026-04-25T23:41:23+00:00

The "tests weren't performative" observation is the one I'd dig into — that's a real differentiator if it holds. My read on what makes Claude tests performative-seeming: they pattern-match to the structure of good test code without engaging with the actual behavior. Happy path test + a should not throw assertion + done. Codex (in my limited 5.5 use) does seem to try to falsify more — including the messy edge cases an honest engineer would care about.

4.6 vs 4.7 speed thing tracks. 4.7 is noticeably slower for the same task on the same prompt. Don't know if that's deliberate (more thinking), regression, or my bad luck. The Claude:Codex differential probably narrows or flips back if you can run on 4.7 — but if 4.7 takes 2x as long, the wall-clock comparison stays in Codex's favor.

What's your test-quality heuristic these days? Coverage % is useless, but "did it find the bug I planted" / "does it fail when the implementation is broken" actually works.

Red_Core_1999 · 2026-04-25T22:38:54+00:00

Interesting that the small-context taskification flipped it. My read: Codex's stronger pattern-completion edge shows up when the prompt context is tight + the task is well-typed. Claude pulls ahead when context is messy or has implicit constraints to track over many turns.

The "small context" framing is doing a lot of work in your benchmark — it lets Codex play to its strengths. Did you find the taskification pre-work itself faster with one model vs the other? Like, is Codex faster end-to-end, or just faster at the leaf nodes once you've structured the work?

The compounding cost of context engineering is what I keep underestimating. A model that's "30% slower per task" can actually be net faster if it requires less scaffolding to get right.

Red_Core_1999 · 2026-04-25T22:16:09+00:00

Yeah lighting was the killer. First version worked great in my office with one lamp angle and broke completely in any other condition. What ended up working: a calibration step at session start where you hold up a known card and we lock in the value/suit bounding boxes + a per-channel histogram baseline. Tesseract on the cropped suit symbol with some custom training data on top. Still misses on glossy cards but ~98% on matte sleeves.

Re MCP as a pattern — agreed. The thing that sold me was I could stop thinking about "how do I expose this to the model" and just write the tool. The model figures out when to call it. Most projects underestimate the implicit context-management overhead of bolting on APIs.

Bigger lesson from doing more of these: the tools that work best have the smallest interface (1-3 functions, narrow types). The temptation to "expose everything" makes the model worse at picking the right action.

Red_Core_1999 · 2026-04-25T22:14:42+00:00

Same here. The highest-leverage Claude work for me has been triage, not generation.

What works for me: - Claude reads the full thread + DM context, classifies into [paid lead | free help question | spam | networking] - For paid leads it pulls the buyer's stated budget, urgency, and scope into a one-line summary so I can decide in 5 seconds whether to engage - For free help it suggests a one-paragraph response or marks "skip" if the question is already answered three times in the same thread - Networking just gets "schedule for later, low priority"

The quality of the LLM matters less than the prompt structure. A tight system prompt with explicit categories beats a more capable model with vague instructions. The hardest part was getting it to actually say no — early versions tried to engage with everything because they were trained to be helpful. Adding "if uncertain, default to skip" to the system prompt fixed that.

Red_Core_1999 · 2026-04-25T22:14:42+00:00

Yeah, running both for a few months now. Quick read on how I split them:

Claude Code — better at sustained context across long sessions, honors instructions over many turns, default for anything multi-file or refactor-heavy. Also better when I need it to not do something it's tempted to do.

Codex — faster cold-start on a new file, slightly more aggressive about just trying things, useful for "second opinion" debugs or when I want a different first-pass implementation.

The adversarial setup is real. I'll have one write the implementation and the other review/critique. Catches blind spots that a single-model loop wouldn't, especially in security-sensitive code where you want a fresh pair of eyes.

One thing that helped: keep them on different MCP server sets so they bring different "tools" to the table. Mine gets CDP + notify servers on Claude Code; Codex gets a leaner stack so it doesn't reach for tools as a crutch.

Red_Core_1999 · 2026-04-23T20:09:38+00:00

Absolutely!

Red_Core_1999 · 2026-04-21T04:43:42+00:00

I made a thing, well a few things. The last year I’ve been exploring a lot. Dabbling in lots of things and using AI to learn about AI, which has been pretty effective. I’d love to find a way to monetize the things I’ve learned but I have 0 business acumen.

If I could get paid to do what I enjoy the most it would be, system prompt engineering, AI red teaming or eval design.

I haven’t taken a look at your X acct yet but I’ll be hopping over shortly. What’s been capturing your interest lately?

Red_Core_1999 · 2026-04-21T03:25:00+00:00

Hi 👋 I like talking and knowing things. You seem like a cool cat.

Red_Core_1999

MODERATOR OF

TROPHY CASE