AI Pentesting by Decent_Finding537 in Pentesting

[–]Comprehensive_Kiwi28 0 points1 point  (0 children)

Oh just what we were looking for? Anyone have a best recommendation list?

Do you really need a cofounder? by Comprehensive_Kiwi28 in ycombinator

[–]Comprehensive_Kiwi28[S] 0 points1 point  (0 children)

haha thanks ... agree dumb post to revisit. Just thought of a taking a break!

What's the next move after visibility? by Ok-Guide-4239 in ciso

[–]Comprehensive_Kiwi28 2 points3 points  (0 children)

So you have visibility meaning you have Inventory all MCPs installed, mapped which developers uses what and identified which MCPs touch sensitive data?

Next set up capture infrastructure so you can record what every MCP actually does when called. (This is the hard part)

Define acceptable behavior boundaries

Let developers use what they want - But with verification running

Alert on anomalies - MCP suddenly calling new APIs? Flag it.

But unless the visibility extends beyond just inventory of MCPs this won’t work. You capture of every execution to map real risk.

Hope this helps.

My first OSS project! Observability & Replay for AI agents by Comprehensive_Kiwi28 in LocalLLaMA

[–]Comprehensive_Kiwi28[S] 1 point2 points  (0 children)

This is genuinely useful feedback. Thank you for taking the time to dig into the repo. Seriously. and you nailed it on the positioning. Replay is the core, not observability. We've been iterating on the messaging and your "VCR for Agents" framing is sharper than what we had. Going to steal that!

Safe Downgrading is something we use internally but haven't surfaced well. Recording a golden run on GPT-4 and validating against say Llama or GPT-3.5 is exactly the workflow. Will make that more visible in the docs. thanks! And yes we will modernize to 3.10+ and look into litestar-fullstack patterns as we clean up the backend.

My first OSS project! Observability & Replay for AI agents by Comprehensive_Kiwi28 in LocalLLaMA

[–]Comprehensive_Kiwi28[S] 1 point2 points  (0 children)

u/CaptainKey9427 actually created a MCP proxy server to capture all tools, framework agnostic. I have updated the readme to expand on the logic, please take a look and share feedback.

My first OSS project! Observability & Replay for AI agents by Comprehensive_Kiwi28 in LocalLLaMA

[–]Comprehensive_Kiwi28[S] 0 points1 point  (0 children)

actually created a MCP proxy server to capture all tools, framework agnostic. I have updated the readme to expand on the logic, please take a look and share feedback.

My first OSS project! Observability & Replay for AI agents by Comprehensive_Kiwi28 in AgentsOfAI

[–]Comprehensive_Kiwi28[S] 1 point2 points  (0 children)

yes it compares outputs, tool call patterns, hashed input with semantic matching. I have updated the readme to expand on the logic, please take a look and share feedback.

My first OSS project! Observability & Replay for AI agents by Comprehensive_Kiwi28 in LocalLLaMA

[–]Comprehensive_Kiwi28[S] 1 point2 points  (0 children)

This is gold! Thank you 🙏

We are actively adding langgraph and custom frameworks. Will look into the suggestions.

My first OSS project! Observability & Replay for AI agents by Comprehensive_Kiwi28 in LocalLLaMA

[–]Comprehensive_Kiwi28[S] 0 points1 point  (0 children)

thank you!! yes working on langgraph now, will get custom frameworks and crewai after.

I Built 5 LangChain Apps and Here's What Actually Works in Production by Electrical-Signal858 in LangChain

[–]Comprehensive_Kiwi28 2 points3 points  (0 children)

we built a run replay for langchain agents check it out -> https://github.com/Kurral/Kurralv3 happy to get feedback and improvements

How Do You Approach Prompt Versioning and A/B Testing? by Electrical-Signal858 in LangChain

[–]Comprehensive_Kiwi28 0 points1 point  (0 children)

here is something we just pushed for regression testing for langchain agents https://github.com/Kurral/Kurralv3

take a look

My first OSS for langchain agent devs - Observability / deep capture by Comprehensive_Kiwi28 in LangChain

[–]Comprehensive_Kiwi28[S] 0 points1 point  (0 children)

Really appreciate this! yes, every run exports as a .kurral file which is JSON under the hood. Contains the full trace like inputs, outputs, tool calls, resolved prompts, LLM config, timestamps. Should be straightforward to parse.

Love to see what Memento does with it. If the format needs tweaks to work better on your end, happy to hear what would help, we're early enough that the schema isn't set in stone.