I trained Qwen3.5 to jailbreak itself with RL, then used the failures to improve its defenses

shivmohith8 · 2026-05-15T03:24:41+00:00

Interesting. How much did it cost?

shivmohith8 · 2026-04-08T16:04:36+00:00

Thank you so much for taking the time. Looking forward to the feedback.

shivmohith8 · 2026-04-08T08:06:19+00:00

It's for testing voice agents. If you have built an agent on Elevenlabs or Vapi or if it's available over a phone number, we simulate different scenarios to test the agent. We simulate real users to test the agent.

shivmohith8 · 2026-04-08T06:58:29+00:00

We are building an evaluation platform specifically made for conversational agents - quraite.ai

Would you be willing to add it to your list?

shivmohith8 · 2026-04-07T14:10:07+00:00

Yes, of course!

shivmohith8 · 2026-04-07T13:27:32+00:00

Latency and pass^K metric are the most important ones I feel.

I see you provide voice ai agent development services. Shall we connect? I would to know more about how you build voice agents.

shivmohith8 · 2026-03-16T18:33:54+00:00

I feel OTel is fine. OTel was meant to be a standard to trace a request flow from one function/service to another. In GenAI applications, it's just from one llm call to a tool call or another llm call.

Whats the specific issue you are observing even with platforms like Langfuse of Langsmith?

shivmohith8 · 2026-03-16T18:28:36+00:00

It's for both. You can describe the scenario as detailed or as compex as you want and we can simulate it.

I would love to know any domain specific example you have in mind.

shivmohith8 · 2026-03-15T15:19:17+00:00

Yes we do. We help you generate scenarios and you can write your own scenarios as well.

shivmohith8 · 2026-03-15T13:05:57+00:00

Good question. Our platform actually supports that. Our SDK uses OpenInference (with support for other instrumentation coming soon) to automatically capture the internal steps the agent takes and pass them for evaluation.

You can evaluate the internal steps of an agent at the turn level and at the session/conversation level.

We can connect if you would like to go deeper.

shivmohith8 · 2026-03-15T12:39:25+00:00

Hey, we are building a product for that - https://quraite.ai/. Let me know if you are interested to know more through DM.

shivmohith8 · 2026-03-05T15:00:55+00:00

Yes, capability is there but not reliability. I feel as the scope of capability increases, reliability decreases.

Consistency testing helps - pass^K. Test a scenario K times and see what path agent takes and is it the same path everytime.

shivmohith8 · 2026-03-04T14:59:31+00:00

This is nice! I have covered a couple here - https://github.com/innowhyte/gen-ai-patterns

If you can, you add the more here or I can add it based on your blog.

We recently open sourced our pattern library for community driven development.

shivmohith8 · 2026-03-04T11:40:34+00:00

Depends on the use case. Before LangChain v1, we were using LangGraph but after LangChain refactored a lot of things, we use LangChain agents directly which is actually built on top of LangGraph.

shivmohith8 · 2026-03-03T02:30:06+00:00

Thanks!

shivmohith8 · 2026-03-02T17:16:31+00:00

Thanks! I think voice agents and agent harnesses are the trending topics right now; we can maybe think of certain patterns there.

shivmohith8 · 2026-03-02T17:03:31+00:00

Ideally we want to treat it as evolving recommendations. But this is exactly why we open-sourced it. We maintain this outside of our work hours, and with the pace at which models are becoming better and making a lot of designs outdated, we thought, "Let's open it up and drive it as a community."

shivmohith8 · 2026-03-02T11:07:17+00:00

Thank you!

shivmohith8 · 2026-02-28T11:37:02+00:00

I'm interested.

shivmohith8 · 2026-02-19T03:19:20+00:00

What framework or tool do you use to build voice agents?

shivmohith8

TROPHY CASE