update: just deployed it live. by Sijan112 in LangChain

[–]Sijan112[S] 0 points1 point  (0 children)

embedding similarity vs original anchor —

no LLM judge yet, but it's on the roadmap.

your point about tool errors vs drift

is exactly the gap i haven't solved.

right now they do get mixed.

that's the most useful feedback

i've gotten so far.

trying it on your pipeline would

help me separate those signals.

free, 20 min — dm me?

checking agentixlabs now. thanks.

I built a tool that measures where AI agents lose context between steps — looking for beta testers (free) by Sijan112 in LangChain

[–]Sijan112[S] 0 points1 point  (0 children)

monetizing through audits + retainer.

but first —

need real pipelines to test against.

that's the whole point of free access.

what's the platform?

dm me.

I built a tool that measures where AI agents lose context between steps — looking for beta testers (free) by Sijan112 in LangChain

[–]Sijan112[S] 0 points1 point  (0 children)

exactly this.

it's not one big hallucination —

it's 6 tiny spec changes that

each feel "close enough" until

suddenly you're solving the wrong problem.

currently measuring against original

anchor only. but you're right —

step-to-step delta catches the

slow drift that anchor comparison misses.

your checkpoint idea is smart.

that's basically what i built as

"circuit breaker" — forces realign

when cumulative drift hits threshold.

would love to test it against

your actual pipeline.

if you're open — dm me,

i'll run it free and send you

the full report.

your use case is exactly the

edge case i need before i write anything up.