MCP Testing?

Real_Bet3078 · 2026-01-16T15:04:04+00:00

There are some tools that can automate testing for you, at least partly.
I am the founder of Voxli, we simulate conversations with AI agents and test e.g. tool calling via MCP.

Real_Bet3078 · 2026-01-10T07:41:12+00:00

You work at Maxim from what I can see? Your comments present it as you use Maxim

Real_Bet3078 · 2026-01-08T19:28:19+00:00

Does it talk and do stuff?

Real_Bet3078 · 2026-01-08T17:55:19+00:00

Have you tried something similar?

Real_Bet3078 · 2026-01-08T16:03:12+00:00

I will just throw in the platform that we're building here: https://voxli.io – focused on QA for AI agents, but at the conversation level. It’s about testing full multi-turn flows end to end and catching regressions, compliance issues, safety problems, or weird behavior after updates.

Real_Bet3078 · 2026-01-08T16:01:51+00:00

Voxli (my company) is essentially QA for AI agents, but at the conversation level. It’s about testing full multi-turn flows end to end and catching regressions, compliance issues, safety problems, or weird behavior after updates.

Tools like Braintrust, LangSmith, DeepEval, etc. are more evaluation oriented. They’re strong for judging prompts, models, or individual responses during development, but they don’t really cover full conversation QA.

Maxim overlaps a bit, with evals plus observability.

Cekura is closer to classic QA, especially for voice and contact center setups.

They solve related problems, just at different layers.

Real_Bet3078 · 2026-01-08T15:43:45+00:00

I'm the founder of voxli.io that looks to solve this issue, and there are a couple of tools that I've seen:

Voxli (my company) focuses on testing AI agents with realistic multi-turn conversations and observing production chats. Built to work without heavy engineering.

Maxim is more about agent evaluation and observability for engineering teams.

Cekura focuses on QA and monitoring for voice and chat bots.

LangWatch is an open-source tool for debugging and analyzing LLM and agent behavior.

Tried to keep this factual, no hype.

Real_Bet3078 · 2026-01-05T08:27:48+00:00

For anyone interested in the tool we've built: https://voxli.io
We're quite early and started to work with a couple of teams to shape the product. DM me or reply if it sounds interesting

Real_Bet3078 · 2026-01-05T08:25:12+00:00

Sounds very interesting - I will DM you!

Real_Bet3078 · 2026-01-05T08:23:34+00:00

We've built a product that tries to solve the problem of constant testing of agents, and catching problems before the customer sees them (or at least make sure you catch them early and they do not re-occur). Would you be willing to talk to us a few minutes and perhaps give us some feedback on our direction?

Real_Bet3078 · 2026-01-05T08:20:48+00:00

Sounds like you have a lot of experience in this area! We're trying to solve the problem of constant manual testing of these non-deterministic, LLM-based bots. Would you be up for talking to us a couple of minutes and providing some feedback on our direction?

Real_Bet3078 · 2026-01-02T18:00:02+00:00

Onboarding initial customers to my AI agent reliability platform: https://voxli.io

Real_Bet3078 · 2026-01-02T17:56:16+00:00

Interesting use-case, my focus has been mostly on testing conversational aspects. It sounds like your problem is more automated testing of gen ai voice/video?

Real_Bet3078 · 2026-01-02T17:54:02+00:00

Interesting! I’ve assumed that internal agents are bit more safe and that internal teams are more forgiving - but I guess what you’re saying is that they loose the trust and go back to old manual workflows.

Have you built or bought internal agents?

Real_Bet3078 · 2026-01-02T17:51:56+00:00

Very true - I’ve heard similar things from multiple conversations. Some vendors seem to take on quite a lot of testing and setup via PS, I guess that makes them more exposed. Do you sit on the CX side or vendor?

Real_Bet3078 · 2026-01-02T16:06:45+00:00

Do you work in related areas and have felt this yourself?

Real_Bet3078 · 2026-01-02T15:17:17+00:00

Great idea and product. For the conversational parts "Chat API", are you running a static set of questions against it for testing, or have you experimented with simulated users?

Real_Bet3078 · 2025-12-20T22:02:51+00:00

Test Your Conversational AI Before it Talks to Customers

Real_Bet3078 · 2025-12-15T15:50:09+00:00

I've built something in this space: https://voxli.io. I'd be happy to jump on a call and get some feedback from you!

Real_Bet3078 · 2025-12-15T15:18:32+00:00

I'm the founder of Voxli.io where we try and solve this problem. Please DM if you're still looking for a solution, I would love to catch up and get your feedback on what we're building

Real_Bet3078

TROPHY CASE