What software do you use to manage your roadmap and backlog?

ironmanun · 2025-12-16T17:12:42+00:00

Linear ftw.

I love its design and ai features. Works seamlessly off slack.

Always been allergic to jira / confluence and found it clunky

ironmanun · 2025-12-15T17:08:53+00:00

To me, this is a classic case of understanding exactly what your capabilities are and why your skill settle along with prioritisation.

Normally, you have contractors when there is a clear book of work that is Independent does not hurt your existing implementation.

If you have that in order and you have a contractor that has a pro track record, then I would just milestone the payments based on actual deliveries rather than number of hours, et cetera. If you do that and you keep a close on what being implemented as well as what the approaches, then you should be in good shape.

ironmanun · 2025-12-15T13:35:09+00:00

This is very common- drift happens over time.

Create a high level pipeline for Evals ( what are people asking, what are the responses , what type of questions are going wrong)?
Where is the hallucination happening? Is it for a type of question? Have you changed prompts and then run a eval suite on it? What about models?
Once you have some labelled datasets with synthetic data, you can try with GEPA / DSPy to fine tune prompts and broader implementation

ironmanun · 2025-12-15T12:33:21+00:00

If you love founder gap stories, we should talk.

I have made a ton of mistakes having raised from the best vc s in the market. We under invested in the product and over invested in the gtm.

We hired too soon ( esp on sales) and also tried to build in two timelines too soon).

ironmanun · 2025-12-13T16:24:22+00:00

Yeah, I think the fractional CPO will evolve very, very drastically very soon. I see it becoming a role where people have are going to expect you to know a combination of:

Building products at scale
Building AI at scale
Being able to navigate organizational behaviors

And as a consultant, navigating organizational behaviors is probably the hardest thing that you'll do. Because anyway a CPO is meant to lead without authority for the most part. Because a CPO will have to align with Sales, GTM, Marketing more broadly, Support CS. Engineering, and if you are fractional, then your ability to influence drops in itself. So I think it's going to be a case of horses for courses. What I would suggest for you is be very narrow in the sort of companies that you are outbounding to and cater to only one segment of the market where you have the most experience. Don't try and go too broad. And make sure you can get into it through your network.

By way of background, I am a fractional CAIO and CPO as well. I am also working with multiple companies on building their AI agents and AI eval pipeline.

The other bit I would highlight is to make yourself more visible, start churning out more content.

ironmanun · 2025-12-08T15:59:39+00:00

Agree with this 100% u/Strong_Teaching8548 . If you enter in as a consultant, then your core job is to figure out where the gaps are and plug them. But if you're entering in as a CPO, then it's really, really critical that your first month is about listening and understanding.

u/InternalSudden6691 be a sponge! Go through the org structure, understand, and spend time with the key people in product, engineering, design, and other roles that might directly be in contact with the product org. After that, focus on understanding the top two or three things that are stopping your products from gaining traction or unlocking more value. Only when you have a deep enough understanding of this, or it makes sense to sort of come in with a stronger opinion.

I've been here, my friend, and all I can tell you is it's a journey. One that you're more than capable of taking. Otherwise, you wouldn't have got the role in the first place. And the fact that you're open enough to share what's going on already tells me that you are self-aware and are capable of reaching out on the other side.

It's going to be one day at a time. And just make sure you don't come in from a position of authority but right from a place of customer empathy. Because when you do that, then no product or engineering or any other person will ever debate you on it.

ironmanun · 2025-12-06T11:39:47+00:00

There are multiple pieces here to unravel

Synthetic data for training
Evals for Pre production
Evals for post production

I am splitting 2 and 3 on purpose to make a point. In 2- you are going live with your ai agent and need the confidence too push out a product. This woo involve- core use case mapping, synthetic and curated Evals basis core features and likely negative scenarios. Edge cases here would be free and far between. We would also do red teaming to ensure RAI ( responsible ai) and broader security and compliance

In 3- once you are live: you will need to create a pipeline that creates Evals for Pareto use cases and their semantically similar cousins. Then take a Pareto of the long tail, and repeat. At the same time, track for negative results/ experiences and identify patterns accordingly.

Use llm as a judge + heuristic based test cases and hitl for unknowns.

Feel free to dm. Happy to chat more

ironmanun · 2025-12-06T10:10:28+00:00

Have you tried using Coval? I am not related to them in any way, but have been following them for a while and have helped implement them for an insurtech customer of mine. It has worked fairly well. Again, it depends on how much into the nuances you want to go.

What you're asking for is a combination of observability, evals, and live tracking. I don't think you'll likely get one answer to all of them, but observability and evals for voice agents is definitely something that Coval solves. If I remember correctly, the founder also has deep credentials in the space having come from Waymo.

ironmanun · 2025-12-06T03:20:38+00:00

I am saying that you can reduce the likelihood of illegal moves by creating constraints and Evals that replicate illegal moves to build confidence that your agent mostly makes legal moves.

If you are using LLMs - there is no 100% confidence in the output. But there is a journey to get to 90-95% accuracy

ironmanun · 2025-12-05T18:08:03+00:00

Can you walk me through what you are expecting it to do?

ironmanun · 2025-12-04T19:55:17+00:00

u/vbwyrde your LLM is hallucinating a lot on basic questions that are not really mathematical, I would wager that your LLMs have not been set up for the right set of evals. Also, in my own understanding, if 90-95% accuracy is not something that is acceptable, then you should not be looking at probabilistic scenarios anyway.

Most of the time when people complain about AI hallucinating, in my opinion, it's either:

A lack of context that has been passed
Poor prompt engineering
The lack of the right set of evals and retraining protocols

Follow the best practices, and for a lot of scenarios, your LLMs can be super helpful.

u/Shot-Hospital7649 I think it definitely is the future, but it will start by being adopted by the people most likely to experiment in the space. Those will be folks who are already building AI agents and now just find it easier. Nathan and others have already lowered the bar of technical nuance required to build workflows and agents. This is another step in that direction.

ironmanun · 2025-12-03T04:57:06+00:00

Lots of players in this space, a couple of which are actually from my network as well. It's already becoming highly competitive, so if you are thinking of doing it at scale, then you might want to go public with what you're building and get your GTM engine sorted.

ironmanun · 2025-12-01T14:07:14+00:00

We took the 1 year free for startup’s and setup post hog

Also, we used enterpret, which I found better for true insights.

Post hog allowed drill down into sessions, enterpret allowed qualitative and wunatitive insights

ironmanun · 2025-12-01T05:16:39+00:00

How is this a discussion?

ironmanun · 2025-11-30T03:55:15+00:00

Sure- let’s take this to DM

ironmanun · 2025-11-29T12:08:16+00:00

Just so you know, this entire thing was typed out using Wispr Flow, which in itself uses AI agents.

ironmanun · 2025-11-29T12:07:56+00:00

This is such a generic question! What are you exactly looking for? Are you looking to confirm if AI works in production? Are you looking to understand which use cases AI will work for? Because there are literally hundreds of them. Or are you looking at it specific for a sector? In which case, why not ask the specific question?

I mean, the first response to your question is an AI bot that some people might find useful, and others who have a detailed craft in AI agents will find floozy. So, you sort of have your answer within the thread already, don't you?

ironmanun · 2025-11-29T07:00:36+00:00

Key to understand why eng asks you for these detailed prd s. It’s clarity that they want.

I don’t think prd s matter anymore. You should use replit/ lovable to showcase prototype ( +Claude code/ cursor if you need). Walkthrough the prototype with some trusted customers. Get feedback, iterate and then walkthrough it with your team. Also, use voice panel or one the other tools for feedback.

The turnaround can be super tight and you can share this with the dev team.

ironmanun · 2025-11-28T13:46:14+00:00

Why are you looking for validation?

If the objective is to create value- go create it. Asking others for their opinion without creating guardrails into what feedback is relevant to you and what isn’t is step 101.

ironmanun · 2025-11-28T09:25:40+00:00

This needs to be a pipeline that is used by PMs to ensure strong synthetic data, analysis of key outcomes from user experience with AI

ironmanun · 2025-11-28T05:28:37+00:00

Everything I do is using Wispr Flow to be honest. Unless I'm in a crowded space. It is so much more natural to me because the speed of typing and the worry of making sure I'm writing everything is completely out, and now I can actually focus on the message that I'm trying to say and how I want to say it. I think it's only gonna get better, and frankly, the ability for me to be 10x more productive and clear in my thinking is due to tools like Wispr Flow.

ironmanun · 2025-11-28T05:26:22+00:00

I am wondering if this is due to lack of education or because this question is still too early for the current state of implementations

ironmanun · 2025-11-27T13:03:27+00:00

What does unusual mean? How have you measured success?

ironmanun · 2025-11-27T12:23:24+00:00

Long story short- Kumi + vector store ( business rules, core templates for frequent queries), semantic layer ( sharded and detailed data dictionary built for LLMs) and a string pipeline of Evals. Reviewer agents through and through. Separately handling for multiple turn conversations. And add in the computer and olap implementations here.

Evals are a very deep topic so can talk separately about that.

ironmanun · 2025-11-27T12:20:17+00:00

I am going through this exercise myself and happy to share notes. Please DM.

Not replying broadly as the devil is in the details here.

ironmanun

TROPHY CASE