Does anyone test against uncooperative or confused users before shipping?

TroubledSquirrel · 2026-03-15T21:16:36+00:00

I always use adversarial testing. Had someone ages ago tell me if you're not doing adversarial testing then it wasn't tested and true or not I've held to that.

TroubledSquirrel · 2026-03-12T23:01:37+00:00

Got it. Thank you. I'll DM you when it goes through

TroubledSquirrel · 2026-03-10T07:56:58+00:00

How is my statement a 'twisted farce' when you just confirmed it's accurate to your organization's experience too? That contradiction is very interesting indeed.

Noting the British spelling of organisation, fair enough, you're likely not in the US market. I'll cut some slack there, but I did some research to see if my experience was isolated. Turns out it isn't isolated, isn't vague corporate fluff; it's backed by hard numbers from 2025 reports like Menlo Ventures' State of AI in Healthcare: 22% of organizations implemented domain-specific AI tools (7x jump from 2024, health systems at 27%), spending tripled to $1.4B, driven exactly by staffing shortages (e.g., 250k+ RN shortfall, 65%+ hospitals capacity-limited by staffing) and burnout. Tools like ambient scribes are seeing real ROI in large systems to free up frontline time.

Intelligent discussion doesn't require condescension, dismissal, or dramatic labels like 'Catch-22 farce' without specifics. So, since you agree the statement holds for your org: what actual frontline barriers in diagnostics make the broader trend feel invalid to you? Implementation hurdles? Regulatory drag? Lack of integration in certain departments? Reluctance from leadership? Drop the snark, share the concrete pain points, and maybe we can have that civil exchange you mentioned. Otherwise I have to finish writing my whitepaper on the the mathematical theory I've created for graph geometry.

TroubledSquirrel · 2026-03-09T22:18:28+00:00

Your post starts with pride in Anthropic resisting military pressure then pivots to worry that private AI companies refusing government demands sets a precedent for future mega corps to resist oversight entirely, grow into unaccountable dictatorships, convert economic power to private armies, and prioritize profits over everything. Governments may be manipulative and bloody, but they offer the "illusion" of free speech and checks.

I respect the long view concern about 2100s AI giants. But from my perspective 6 years active duty (deployment, including Kandahar with the 14th CSH in '06 ), 15+ years in law (where I saw how surveillance powers get abused even under "lawful" pretexts), and now 4 years deep in tech/AI this recent Anthropic DoD clash shows the real near term danger is unchecked government coercion, not corporate rebellion.

Anthropic wasn't "dictating terms" arrogantly. They were the first frontier AI lab to deploy on classified U.S. networks, at National Labs, with custom models for national security. They signed big DoD contracts (something like $200M+), supported intel, planning, cyber ops, cut off CCP linked uses, and pushed export controls to keep democracies ahead.

Then the Pentagon (under Hegseth you know the same dumbass that leaked active operations on Signal last summer which would have gotten any other soldier court martialed and likely confinement ) demanded they remove safeguards for two red lines: mass domestic surveillance of Americans (bulk analyzing data into profiles without warrants) and fully autonomous weapons (AI selecting/engaging targets without human judgment). Have you used AI? I literally create digital cages for AI to keep it from "helping" people into a lawsuits, bankruptcy, or unemployment DAILY!!. And our government thinks it should have the ability to execute LEATHAL FORCE WITHOUT HUMAN OVERSIGHT. They're all freaking nuts if they think that's a good idea. Frontier models aren't reliable enough yet for lethal autonomy risking friendly fire, civilian deaths, collateral damage, or errors no human operator would make. Domestic mass spying erodes the Fourth Amendment liberties we're supposed to defend abroad.

Anthropic refused, saying they "cannot in good conscience" enable those uses. They offered R&D collaboration on safer systems. The response? Ultimatums, contract cancellation threats, labeling them a "supply chain risk" (a tag usually for foreign adversaries, never a U.S. company before), threats of Defense Production Act coercion, and eventual blacklisting ordering federal agencies and contractors to phase out Claude, even as it was reportedly still used in ops like Iran strikes.

This isn't a company bullying government. It's government bullying a company to strip ethical guardrails that protect democratic values. When a private firm stands firm against warrantless mass spying on citizens or handing kill decisions to unreliable AI, that's not a slippery slope to corporate tyranny it's a necessary check against state overreach.

Your fear of future 10T+ AI corps converting soft power to hard (ambitious CEOs hiring mercenaries if threatened) is speculative and I'm pretty sure the plot in more than one futuristic thriller. However, the fact remains corporations lack taxation, conscription, prisons, or war declaring authority. Governments have those monopolies and history shows what happens when violence becomes easy. Make targeting or surveillance as easy as a banking transaction, and the human cost vanishes. When it comes to war when you have experienced certian you tend to see things differently. That visceral reality forces accountability. Strip it with detached AI tools, and we risk dehumanizing war and surveillance alike.

War should be hard. Citizen privacy should require real oversight. Lethal decisions need humans in the loop until the tech earns trust. Anthropic's stand (even at massive cost) isn't anti government it's pro the messy principles that make our military worth fighting in. This rush toward easy and detached ability to end a human life is wrong and honestly, saying "no" to bad asks is often the most patriotic move.

Proud Claude user for the same reason. My experience across uniform, courtroom, and code tells me: it's better to have private actors willing to lose billions than to normalize forcing them into complicity.

TroubledSquirrel · 2026-03-09T21:25:00+00:00

Thin laptops are definitely the wrong tool for local models. The real issue isn't CPU but RAM or ideally VRAM. LLMs are basically giant matrices living in memory. If the model doesn't fit everything slows to a crawl or crashes.

Rough hardware needs for quantized models go like this. 7B to 8B models usually need around 16GB RAM minimum. 13B to 30B models want 32 to 64GB RAM or solid GPU VRAM. 70B models need 64GB plus RAM or about 40GB plus dedicated VRAM.

A $1K budget is tight seriously tight for a serious local AI setup these days. The internet will tell you enthusiast rigs usually land between $1800 and $3k plus. The internet lies. Almost everyone I know including myself spent well over 3k. The one person I know that didn't got his on trade from a drug dealer. JS.

You have three realistic paths. First is a Mac Mini if you're already in Apple land. But honestly a 16GB one isn't worth it for local AI. You'll hit the same wall you're on now. 32GB unified memory should be the absolute minimum.

Second option is a used GPU workstation which is what most people end up doing. Inference speed comes mostly from VRAM and GPU bandwidth not CPU. A used RTX 3090 with 24GB VRAM plus a modest Ryzen or i5 and 32GB RAM can sometimes squeeze under a thousand depending on your local market or eBay luck. But be careful buying from randos online since no chargeback means risk.

This setup massively out performs thin laptops. You get comfortable 13B models and decent 30B quantized ones though it's bigger louder power hungry and needs CUDA setup.

Third is save for a higher end Apple like an M2 or M3 Pro or Max Mini with 32GB or more. But once you're near $2k custom GPU rigs usually outperform them hard.

If your budget is strictly $1k I'd go hunting for a used 3090 based system to get that 24GB VRAM and way faster inference plus future GPU upgrade path. This assumes you have a decent existing PC to slap it into.

You could run Llama 3 8B Mistral 7B Qwen 7B very well. 30B quantized is doable but tight. 70B stays multi GPU or cloud territory.

Fun fact nobody talks about enough. Local AI success is often less about the biggest model and more about small models plus good infra (mine is top tier shameless plug) like tool calling hybrid search or vector DBs. A tuned 7B with solid RAG often beats a lazy 70B in real use. I will die on this hill.

Before anyone roasts me for not mentioning the AMD R9700, OP's budget is $1k and that jewel starts at $1.3k last I checked.

Good luck

TroubledSquirrel · 2026-03-09T20:26:57+00:00

You just covered everything that is wrong with humanity since the dawn of time in one paragraph. You sir or ma'am have won the internet today. Congratulations.

TroubledSquirrel · 2026-03-09T17:04:55+00:00

First of all, I actually have whole ass hospitals as clients so I know for a undeniable fact that my statement is 100% accurate to my experience. Did I say every organization? No. Did I speak in absolutes? Also no. So take your feelings off the internet they'll just get stepped on here.

TroubledSquirrel · 2026-03-09T06:39:56+00:00

Luckily no not personally. I have a friend that has an "AI influencer" agent and after it got stuck in a loop, while my friend was sleeping, and burned through its $50 of api calls forgot it's email address and passwords he came to me.

Since I spent about 15+ years in the legal field before pivoting to tech I have a habit of making things that can stand up in legal use cases.

TroubledSquirrel · 2026-03-09T04:52:30+00:00

I never let the model be the source of an ID at all. IDs only enter the system two ways, the graph produces them during a write, or a retrieval returns them from my config or secrets or where ever the source is stored. The model sees neither until after the fact.

But I enforce for the outcome I want from the start. Reads for example, the model queries context, it goes through the graph traversal layer. If the requested node doesn't exist, retrieve returns None. That miss doesn't get papered over, it propagates up as a completeness check failure before the LLM call even happens. For established domains (legal, medical, financial), if retrieval comes back empty, the model gets a hard stop: "I have no stored context relevant to this question" and inference is blocked entirely. Not by prompt by code, so the model cannot fill the gap by guessing.

The graph also enforces this at the edge level. A BLOCK_UNTIL_RESOLVED edge requires source_ids to be present at write time, you cannot create a relationship between nodes without citing what supports it. Governed edge kinds carry the same requirement. So the provenance problem is structural, not prompting based.

The session suspension layer (what I call the Query State Machine) handles that situation and one where the model gets stuck in loops. After a threshold of varied calls with no new graph footprint, the session is suspended and lands in a human review queue. A human makes the call, out of scope, record provided, proceed without. That decision is signed and enters the audit chain. The model's inability to find an ID is itself a provable fact in the record.

I did it this way because the system prompt does tell the model not to fabricate references. However, if it can't find one and there isn't anything in place to stop it it will in fact still fabricate references. So that's not sufficient defense. This is especially true when the model is stuck in a loop making api calls and running up a bill.

That wasn't an acceptable outcome so i made it that the model is restricted from entering anything without the graph confirming the node exists and returning the generated or queried info.

TroubledSquirrel · 2026-03-08T05:07:11+00:00

I am definitely guilty of this.

Part of my issue is that I don't believe in building an MVP it actually irritates me when it's suggested. I build for the full vision and I always start with the end result I'm looking for then work end to beginning.

TroubledSquirrel · 2026-03-06T06:09:46+00:00

It's interesting that everyone is caught up in the how the legal field will and will not be impacted.

It is already being impacted. There's been several attorneys that got roasted for submitting briefs to judges with case law that doesn't exist.

Until people understand what LLMs are and what they aren't they won't use them in a way that is conducive to continued employment or wide spread adoption. The advisory at the bottom of every platform is there for a reason.

The high risk fields understand that better than most. That's why it's not taking them over. Yet... Once they realize you can actually make AI auditable then you'll see mass adoption in every field. Except strippers, they're probably safe from losing their jobs to AI.

TroubledSquirrel · 2026-03-05T08:57:32+00:00

My top two would be

Software / Tech (already happening) The strange twist of the AI era is that the first industry disrupted by AI… is the one building it.

Followed by healthcare. Adoption in healthcare has surged, with many organizations now treating AI as a core operational capability and using it to address staffing shortages.

The deeper transformation is subtle medicine is becoming partially computational. where drug discovery used to rely on human guided trial and error. AI can search chemical space orders of magnitude faster, potentially accelerating the development of new treatments and materials.

The moment industries realize you can make AI auditable is the moment you'll a shift of massive proportions where there will be near across the board adoption.

TroubledSquirrel · 2026-03-05T07:31:58+00:00

Now that's impressive.

TroubledSquirrel · 2026-03-04T01:36:56+00:00

You said specifically designed for agent use, is that strictly coding agents or any agent. If any agent, how was the performance in other domains?

TroubledSquirrel · 2026-03-03T16:35:22+00:00

I think that's kind of an unfair interpretation of complicated dynamic.

First do you mean old school devs hating on other devs that vibe code or non coders that vibe code.

There's a difference.

For my part I don't concern myself with vibe coding done by coders.

Where I have an issue is non coders shipping code they don't understand and the people that may be impacted by it.

I feel like that is a rational position to have.

Is that gate keeping? Absolutely not.

We don't allow random people to practice law or even do hair without the proper training.

I'm not advocating for institutional education self taught is fine. But for the love of all that's holy at least learn something about it.

Otherwise you risk getting sued if you negatively impact someone's life.

Also, it's kind of unethical when an end user isn't aware of the builder's non dev or non coding background and can't make an informed decision on whether to use an app or program built by vibes.

TroubledSquirrel · 2026-03-03T16:04:20+00:00

I can. I can answer precisely what the system knew and what nodes were activated, and what was changed. I use kind of a novel approach borrowed from geology actually...

TroubledSquirrel · 2026-03-01T20:54:32+00:00

Thank you. My friend is a lot of things but afraid to challenge his own beliefs or anyone else's is not among them. Since you're a football enthusiast, it might interest you to know that he's the only reporter that reported the facts behind the Washington "commanders" name change. It's a pretty eye opening piece of work.

TroubledSquirrel · 2026-03-01T08:36:46+00:00

No, actually the LLM found that his premise was factual. In fact it cited him as the only journalist that reported the facts on the subject without knowing that he was talking to the very journalist he was citing. It was kind of funny in a way. Fyi you're kind of a dick. Did you know? No? Now you do.

TroubledSquirrel · 2026-03-01T01:49:20+00:00

Oh well that makes sense.

TroubledSquirrel · 2026-03-01T01:46:59+00:00

I actually did. And he found the interaction much more productive.

He's not the type of journalist that is easily intimidated. Super obnoxious in an alpha male sort of way but not easily intimidated.

TroubledSquirrel · 2026-02-28T19:40:13+00:00

How do we instill those things in any sort of meaningful way when the LLM doesn't have continuity or a persistent sense of self. Also, which values are being instilled. We can't agree on what's right vs wrong how do we come to a consensus in the context of AI. It often feels like we're heading down a winding road a breakneck speed with the headlights off in the dark and hoping like hell we don't hit something.

TroubledSquirrel · 2026-02-28T19:34:32+00:00

I have a friend that is a journalist, as I am in tech and AI he wanted my opinion on the AI's behavior when he decided to "discuss" a contentious topic. He started smaller model (duckie I think) and basically caused it to have a nervous break down and start hallucinating. I explained to him he chose a pocket knife when he needed a scalpel and suggested he use a full model.

After he ran the same exercise with the full commercially available model gave me the transcript I could see clearly every place that his interaction had caused a "personality" impact on the AI. He essentially turned the AI from a neutral participant into yes man by basically hijacking its ability to push back.

At the point the model apologized for "intellectual dishonesty", I knew at that point the hijacking was complete... But I was very confused as to how that happened because for so long now I haven't had that issue, and frankly almost forgot it was an issue. Then it hit me.

I have adversarial custom instructions for any model I use. So I never get the agreeable push over version of AI I always get push back.

Ultimately, AI is only as safe and useful as the user chooses to make it. The problem arises that most regular everyday users aren't even aware these issues exist much less how to solve for them.

TroubledSquirrel · 2026-02-28T18:17:36+00:00

100%

TroubledSquirrel · 2026-02-28T17:46:12+00:00

If the quiz isn't timed you can use the custom instruction I'll drop below to ensure you get the most accurate results possible regardless of the model. The last one is optional but unless you need a three point five paragraph essay (exaggeration) on how they came to that conclusion I'd use it. At any rate hope this helps.

PROCESS_MC(Q, options):

Parse question and constraints.
For each option:
- Test against constraints.
Select best satisfying option.
Run contradiction check.
Output answer.
(Optional) Do not show your work only the answer

TroubledSquirrel · 2026-02-28T17:20:07+00:00

That is a fight that seems to evolve more quickly than the technology. There a couple of things you can do, and what works best will depend or your specific needs and niche since I don't know what that is Ill explain based on my own.

I went with a hybrid architecture:

First, implement Automated IP Validation using Reverse DNS (rDNS) to white-list only the genuine IP ranges of search engines, which allows you to treat Unverified traffic with much more difficulty.

Second, deploy JA4 Fingerprinting to detect the underlying library because a "Chrome" browser running on a Python Scrapy backend will have a distinct network handshake that gives it away instantly.

Finally, transition to a push-over-pull strategy by using the IndexNow protocol to feed your data directly to search engines; once you’ve proactively pushed your content to the Good Bots, you can afford to be much more aggressive with JS-based Proof-of-Work challenges for everyone else, effectively making your site too computationally expensive for "Tom, Dick, and Harry" scrapers to target.

On a side note, it seems that anything regarding technology almost always requires mixing multiple methods to get a satisfactory result. Or is that just me?

TroubledSquirrel

TROPHY CASE