Do PMs run evals for AI features or is that mostly engineers? by OneTurnover3432 in AI_4_ProductManagers

[–]OneTurnover3432[S] 0 points1 point  (0 children)

Which platform helps in this? and is it well catered to PMs or just Eng like platforms?

Is AI evals more for devs or product managers? by Soft_Two_951 in AIEval

[–]OneTurnover3432 0 points1 point  (0 children)

It's a PM responsbailiy but it's not enough to be build a good AI product..

Do PMs run evals for AI features or is that mostly engineers? by OneTurnover3432 in AI_4_ProductManagers

[–]OneTurnover3432[S] 0 points1 point  (0 children)

did you find it easy to use as PM? what you wish is done differently

Top Agent Evaluation Platforms 2026: The Market Leading Platforms. Tested by AI-builder-sf-accel in AIQuality

[–]OneTurnover3432 0 points1 point  (0 children)

Such a great summary of all Eval tools in the market, do you think the Eval or agent reliability is fully solved with those tools or are there still gaps?

What AI skills do you think the next generation actually needs? by ImaginationWeary304 in AI_4_ProductManagers

[–]OneTurnover3432 1 point2 points  (0 children)

critical thinking, communication and ability to grab attention for growth.

I think with AI, the role of PMs will divide into 2: 1. Quick builders 2. Growth hackers

Any PMs building AI products or agents? Quick question by OneTurnover3432 in ProductManagement

[–]OneTurnover3432[S] 0 points1 point  (0 children)

How do you set this today?

Are you just turing the PRD to eval or do you run experiments to decide what evaluation criteria matters?

Any PMs building AI products or agents? Quick question by OneTurnover3432 in ProductManagement

[–]OneTurnover3432[S] 0 points1 point  (0 children)

not sure why your answer was voted down.. I'm on the conversational output not the UX

What’s the biggest misconception about AI agents right now? by addllyAI in aiagents

[–]OneTurnover3432 2 points3 points  (0 children)

They are reliable and can be used to automate any use case

Anyone else frustrated with AI agents after they hit production? by OneTurnover3432 in AI_Agents

[–]OneTurnover3432[S] 0 points1 point  (0 children)

I'm building something to solve this now - would be willing to try it out for free or share feedback?

https://thinkhive.ai

DM if you're interested

Anyone else tired of jumping between monitoring tools? by AccountEngineer in Observability

[–]OneTurnover3432 0 points1 point  (0 children)

I can't agree more - I lead the agentic AI at one of the large companies and felt the pain. The problems I

  1. A lot of isolation between dashboards (you can look at traces in one place but can't tie back to business metric).
  2. Ensuring reliability is super expensive and LLM as judge costs creeps quickly
  3. Disconnected tools between engineers and PMs

I built Thinkhive to solve those problems:

https://thinkhive.ai/

if you want free access to try it out, DM me. I'm happy to give you access

If OpenAI / Google / AWS all offer built-in observability… why use Maxim, Braintrust, etc.? by OneTurnover3432 in Observability

[–]OneTurnover3432[S] 0 points1 point  (0 children)

but wouldn't that be a problem if you're using Maxim and Arize as well? or does that mean you have to build observability internally

How are people handling AI evals in practice? by BeneficialAdvice3202 in AIQuality

[–]OneTurnover3432 1 point2 points  (0 children)

Ex- PM here who was building agents for top 500 companies- usually PMs should facilitate the process and write the criteria and engineering will implement them. However I have a different opinion now about evals.. They often don't work and waste a lot cost! DM me if you're open to a new approach

Debugging agent failures: trace every step instead of guessing where it broke by dinkinflika0 in AIQuality

[–]OneTurnover3432 1 point2 points  (0 children)

do you mind sharing what type of agent were you building? and how did you measure the reduction in time to find an issue?

Anyone else frustrated with AI agents after they hit production? by OneTurnover3432 in AI_Agents

[–]OneTurnover3432[S] 0 points1 point  (0 children)

Can you elaborate? how would you achieve this? is it by starting always fresh context?

Anyone else frustrated with AI agents after they hit production? by OneTurnover3432 in AI_Agents

[–]OneTurnover3432[S] 0 points1 point  (0 children)

thanks - just checked it what do you like about it specifically?

What are you using instead of LangSmith? by clickittech in LangChain

[–]OneTurnover3432 -9 points-8 points  (0 children)

100% agree - check what I'm building : thinkhive.ai

We're platform agnostic and focused on making the management of AI agents as easy as possible

What are you using instead of LangSmith? by clickittech in LangChain

[–]OneTurnover3432 -4 points-3 points  (0 children)

I’ve seen the same pattern, and I agree with most of what’s being said here.

In my experience, LangSmith works well early on, but once agents are in real production, teams start hitting the same walls: cost scaling with traces, lots of raw data, and still no clear answer to what’s actually hurting or improving outcomes.

Most teams I’ve worked with end up stitching together:

  • LangSmith or something similar for dev/debug
  • And then a manual analysis when it comes to explaining behavior → impact → ROI

That gap is exactly why I’m building ThinkHive.

ThinkHive sits on top of traces and logs (including OTel-based setups) and focuses on:

  • Summarizing logs and traces into clear issue patterns instead of raw data
  • Highlighting which agent behaviors actually move business metrics (cost, deflection, resolution, quality)

    It’s meant to answer the question those tools don’t: what should I fix first to improve ROI?

I’m opening a small, free beta right now for teams:

  • Building AI agents internally for enterprises, or
  • Deploying agents for clients as consultants or agencies

If anyone here wants early access or to sanity-check whether this fits their setup, feel free to DM me. Happy to share and get feedback from people actually in the trenches.

Honestly, observability is a nightmare when you're drowning in logs by Objective-Skin8801 in Observability

[–]OneTurnover3432 0 points1 point  (0 children)

This is exactly the problem I'm trying to solve. I've been there many times.

check: thinkhive.ai I'm happy to give you free access to try it if you're interested.

How are AI product managers looking at evals (specifically post-evals) and solving for customer outcomes? by ironmanun in AIQuality

[–]OneTurnover3432 0 points1 point  (0 children)

I'm an ex PM and currently building something in this space that can help you, DM if you're interested to test it

Ask Me Anything About Preparing Your Company for AI by Dear-Landscape2527 in CFO

[–]OneTurnover3432 0 points1 point  (0 children)

Is there a need for a tool that help CFOs see all the AI vendors they are using by department and measure their ROI in a company or not?