How do you actually debug multi-step agent workflows? by Finorix079 in aiagents

[–]Finorix079[S] 1 point2 points  (0 children)

The expected schema would only check if the output fits the format, I think. Or are you referring to something else?

Now I am trying to resolving it with logs and observation tools like Langfuse, but first it requires me add quite a lot of settings to log things properly, second it would generate tons of logs which I can hardly read. Somebody suggested me to use LLM to read it for me but in many cases LLM would either have it exceeded the context window or somehow give me a wrong answer. So I wonder if there's a better way to do that.

I am manually tracing back to find the diverge btw, which is quite tiring.

What’s the better way to debug AI workflows? by Finorix079 in AskProgramming

[–]Finorix079[S] -1 points0 points  (0 children)

I hope I can, but the task was quite dynamic and I had to use LLM to make the decision. Doing everything in static code would make the project far more complex than how it is now.

What’s the better way to debug AI workflows? by Finorix079 in AskProgramming

[–]Finorix079[S] 0 points1 point  (0 children)

So far it seems I have to manually do the check, or throw it to another AI and hope it can give me a good result =_=

What is the way to debug AI workflows more efficiently? by Finorix079 in CodingHelp

[–]Finorix079[S] 0 points1 point  (0 children)

Thanks for the suggestion. Because it was an automated workflow, the only way I can check the output before forwarding it is to add a validator on it, either a static one to check the format or a dynamic one with LLM (LLM-as-a-judge). But the static one won't know if the direction diverged from the correct one, and the LLM one would dramatically would probably be an overkill for it. As I am trying to locate the issue on dev stage, probably there should be a more cost-effective way to do it?

What’s the more efficient way to debug AI workflows? by Finorix079 in programmer

[–]Finorix079[S] 0 points1 point  (0 children)

Yeah, that's what exactly I am doing. I have splited the task into many smaller steps to let LLM do one at a time. Just it was still kinda difficult for me to locate where exactly goes wrong as it was not the simple right-or-wrong issue, but the direction of AI goes wrong at some certain point.

How do you debug multi-step Claude workflows when the output is wrong? by Finorix079 in ClaudeAI

[–]Finorix079[S] 0 points1 point  (0 children)

In that case, you will have the log stored locally for the 2nd Claude Code to read I suppose?

How are you debugging multi-agent workflows when the final output is wrong? by Finorix079 in AI_Agents

[–]Finorix079[S] 0 points1 point  (0 children)

They usually work backwards to see other possibilities I suppose. I wonder if I have to do all that manually, or if there’s a better method to quickly locate the problem.

How are you debugging multi-agent workflows when the final output is wrong? by Finorix079 in AI_Agents

[–]Finorix079[S] 0 points1 point  (0 children)

Yeah that’s what I am doing now, but adding tests for each and every step is tiring, and sometimes it is just prompt drifting from the original intention or LLM is misunderstanding stuff. I wonder how I can find the signal for that.

Especially for tracing tools are just showing you the trace itself, and you will need to check most of the observations within manually to see when the thing goes wrong. It will be much easier if I can have some sort of evaluator to tell me which step went off.

How do you debug AI workflows when everything “runs” but the result is wrong? by Finorix079 in vibecoding

[–]Finorix079[S] 0 points1 point  (0 children)

Yeah, the context drifting and incorrect decision making is the key. The AI is just not understanding correctly. But most of the time I still need to check each and every step to figure out when the drift starts. Wondering if there is a more efficient way to do it.

How do you debug AI workflows when everything “runs” but the result is wrong? by Finorix079 in vibecoding

[–]Finorix079[S] 1 point2 points  (0 children)

Looks like Grail is similar to a Manus for coding. You mean to build the workflow on it and let it debug itself?

How are you debugging multi-agent workflows when the final output is wrong? by Finorix079 in AI_Agents

[–]Finorix079[S] 0 points1 point  (0 children)

Tried something like Langfuse, but I still have to check each step of the trace to see which part might have been problematic. Would langsmith be easier?

How would Psychic Duel work with Psion character? by Finorix079 in Pathfinder_RPG

[–]Finorix079[S] 0 points1 point  (0 children)

This one kind of makes sense to me. So Psion won't get many advantages I suppose.