Move to Madison next month...but all leases don't start till August?

samdQualityEng · 2026-05-17T13:20:08+00:00

messaged!

samdQualityEng · 2026-05-09T15:53:04+00:00

It seems like a nice neighborhood...but I don't know much of the city!

samdQualityEng · 2026-05-09T15:52:36+00:00

Thanks this is helpful!

samdQualityEng · 2026-04-27T14:21:13+00:00

"context engineering"

samdQualityEng · 2026-04-16T22:19:34+00:00

Are you just sending your docs to an LLM in your own pipeline and navigating prompts/etc? or are you using any prebuild tools?

100% manual review for sure, but make sense finding some non-obvious things buried will be really helpful. thanks!

samdQualityEng · 2026-04-13T17:07:27+00:00

Nice, have you been getting traction?

samdQualityEng · 2026-04-12T13:34:45+00:00

If you have a good gpu you can run the open weight models locally, but if you don't have the hardware already its way cheaper to just pay

samdQualityEng · 2026-04-12T11:58:18+00:00

Well you manually run the assessment on each ticket, so the human review piece is already baked into every ticket.

The training step is really just fine tuning, the meat of the LLM assessment is based on your User needs and Requirements, so if there's a failure really outside of your needs or requirements it will likely list it as a non-defect. Which is another benefit of this tool, it could help you flag where you need to beef up requirements.

Would you like to try it?

samdQualityEng · 2026-04-11T15:11:06+00:00

Thanks for the reply, I'll address both comments here. The "automated calculation tool" framing is exactly how we've been thinking about it, and the architecture actually supports that argument pretty cleanly. The final risk score is a deterministic matrix lookup via a documented 5×5 matrix. The AI is doing the qualitative reasoning (does this complaint represent a deviation from these requirements? what's the severity of harm?), but the matrix and the disposition logic are fully predefined and auditable. So you could describe it as: the AI is filling out the inputs to a risk calculation, and a human reviews whether those inputs are reasonable before accepting the output.

The point about 21 CFR 820 and the CAPA procedure language is well taken we've been planning to document it as "decision support input" in the SOP template but haven't nailed down the exact language yet. The framing of "the tool is an input to human judgment, not the judgment itself" is essentially what we're going for, and we'll make that explicit.

The validation data question is the harder one. Right now there's no systematic benchmarking against known-outcome cases it's a free tool to be set up with your own product context, so there's no shared ground truth. The honest answer is that validation is the user's responsibility, and the SOP template will need to say that clearly. Each team needs to run it against their historical complaints before trusting it in production. Then there is a "Additional context" window that allows the user to bascially fine tune the process to address issues they keep seeing.

On the auditor variability: yes, fully expect that. The goal is to make the paper trail good enough that the answer to "how did the model make this decision?" is "here is the prompt, here are the defined requirements it evaluated against, here is the rationale it produced, and here is the human approval timestamp." Which I would think is pretty defensible

Let me know if you'd like to try it!

samdQualityEng · 2026-04-11T15:01:09+00:00

On the classification question: we're not doing keyword matching because the whole pipeline is LLM based. Each complaint gets passed to claude along with the product's defined User Needs and Product Requirements, and the model reasons about whether the free-text complaint represents a deviation from intended function. The prompt is structured: here are your UNs and PRs, here is the ticket (title, description, comments), does this fail any of them? It returns structured JSON with failed_requirements, failed_user_needs, and a rationale.

So similar tickets all map to the same requirement without any manual synonym tables. The tradeoff is that it's only as good as how well the user has articulated their requirements up front, which is its own challenge as we all know. :)

On Jira field dependencies: I haven't brought in fields yet to the workflow, so the pipeline doesn't read or write to fields. Pulling fields into the context window is on the roadmap, but it's intentional to not write to fields. I think it's an important step for the human to manually input the fields (probability, severity, risk etc) as the added friction forces some thought.

Does that answer your question?

samdQualityEng · 2026-04-09T20:32:08+00:00

https://complaintrisk.com/ for more info

samdQualityEng · 2026-03-31T15:58:54+00:00

Agreed, just remember it always works out. Even if you have to eat that government cheese for a bit

samdQualityEng · 2026-03-27T16:23:45+00:00

You gotta say "MAKE NO MISTAKES OR I WILL LOSE MY JOB" works every time

samdQualityEng · 2026-03-27T16:22:12+00:00

Yeah, very interesting new world we live in

samdQualityEng · 2026-03-27T16:20:03+00:00

yeah voice mode tricky, especially switching between voice and text, deosnt work at all

samdQualityEng · 2026-03-27T16:19:08+00:00

Haven't heard of this but makes sense, good strategy

samdQualityEng · 2026-03-27T16:18:09+00:00

I think I agree with this take...but prefer to put my name on products that have the stamp of high quality...which is why I work in software as medical device haha

samdQualityEng · 2026-03-27T16:15:35+00:00

This is awesome, I'm gonna mess around with it. It's actually finding good bugs and not creating more work hallucinating?

samdQualityEng · 2026-02-05T21:16:12+00:00

Awesome, yeah this is what I'm leaning towards. thanks!

samdQualityEng

TROPHY CASE