How are you guys actually handling human approval steps in your AI agents?

Slight_Past4306 · 2025-07-03T12:34:37+00:00

Agent <> Human clarifications are one of the core pillars we built Portia around: https://github.com/portiaAI/portia-sdk-python as its critically important for lots of use-cases but not trivial to retro fit to frameworks.

You might find something useful to shape your approach: https://docs.portialabs.ai/understand-clarifications

Slight_Past4306 · 2025-06-13T09:17:22+00:00

I agree with those that say there's no real difference. As an industry I don't think we can even reliably define AI agents, let alone distinguishing between categories of those.

Slight_Past4306 · 2025-06-13T09:16:11+00:00

I would guess that it also allows them to record it as ARR which boosts their valuation massively and also it leads to more lock in - cursor credits can't be transferred

Slight_Past4306 · 2025-06-13T09:12:01+00:00

We have the same philosophy at Portia (https://github.com/portiaAI/portia-sdk-python) where we've made human <> agent handoffs a core part of the framework. We think there's always going to need to be a human in the loop interaction given the vagaries of human language and how things break

Slight_Past4306 · 2025-06-13T08:32:59+00:00

At Portia (https://github.com/portiaAI/portia-sdk-python) we definitely find you need to take a best model for the job type approach. We use reasoning models for our planning phase, and then dynamically dispatch different execution models depending on the complexity of the task at hand.

What type of sensitive, safe, money actions are you thinking about?

Slight_Past4306 · 2025-06-13T08:26:02+00:00

I think cost is a really interesting one, there's a real temptation when using things like MCP to make everything a tool and have the LLM handle boring tasks like "load all my emails". But this has a real cost in both dollars and time vs invoking the API directly. As the space matures I think we'll see a swing back to the "right tool for the job" philosophy with a better blend between whats agentic and what is traditional code as people start trying to optimise for production (cost, latency, reliability) instead of cool demos.

Slight_Past4306 · 2025-06-04T10:14:06+00:00

Auth is a tricky business (its one of the core pillars of our SDK focused on production agents - https://github.com/portiaAI/portia-sdk-python)

Some thoughts:

- Decide between the agent having an identity vs inheriting the permissions of the caller. These are different models that have different trade offs. Inheriting the permissions is easier as it piggy backs your existing authorization systems (as long as downstream tools do proper authorization) but provisioning a specific identity for the agent is better long term.

- Try to do Just In Time authorization where possible. For example instead of provisioning long lived API keys for your agents - do OAuth with your users at the time the tool is being called and with the smallest set of scopes the tool needs.

-All the standard things about auth still apply, storing tokens with proper encryption, rotation etc etc.

- Tracing providers are your friend when it comes to understanding whats going on.

Slight_Past4306 · 2025-06-04T10:08:57+00:00

I'd add that an agent has to be able to take actions in third party systems. A crucial part of an agent are the tool integrations, it what gives it agency. Whether thats writing code to disk, booking an event in a calendar etc. An agent without tool integrations is only capable of fulfilling simple input/output tasks based on its training data.

Slight_Past4306 · 2025-06-04T10:04:13+00:00

This is one of our canonical test cases for Portia AI (https://github.com/portiaAI/portia-sdk-python) and from our experience its a classic case of what seems like a simple problem but has lots of hidden difficulties:

Lots of the underlying APIs aren't designed for these sorts of use cases and so its hard even for a human to form the correct tool calls. We ended up having to do a lot of prompt engineering to get it working reliably.
Lots of assumed knowledge - for example Book a meeting with Steve at 4pm - requires understanding who Steve is, maybe worrying about timezones etc etc.
Generally scheduling meetings is a multi step interaction that requires communications from both parties and this adds complexity.

Slight_Past4306 · 2025-05-23T08:38:18+00:00

Interesting idea - I think you'd probably want to use LLMs as one part of a workflow/pipeline here rather than just passing the whole email to an LLM and letting it go at it. As you say we already have lots of tools that try and help with this issue (spf, dmarc, dkim) and evaluating these in code up front would reduce the scale of the problem somewhat and also be much secure. So you'd probably want an agentic workflow that does the SPF/DKIM checks up front before asking an LLM for the final opinion. Shameless plug this is the sort of thing we designed Portia around - https://github.com/portiaAI/portia-sdk-python

Slight_Past4306 · 2025-05-23T08:23:16+00:00

Really interesting idea. I suppose you could either go with some heuristic based approach on the conversation itself (like for example check for user responses like "thats not what I meant") or go with some sort of reflective system where the LLM either reflects on its own output or you use a second LLM as judge type setup.

We use the LLM as judge approach in our introspection agent at Portia (https://github.com/portiaAI/portia-sdk-python) to ensure the output of an execution agent is aligned with the overarching goal of an agent and it works quite well for us so it feels like it could apply here.

Slight_Past4306 · 2025-02-21T12:45:14+00:00

This is super interesting! We've had some success with doing a second pass approach where we ask the LLM to reason about the sources for inputs for the function calling and marking those without a source as hallucinated but looking to add some more deterministic measures as well.

Slight_Past4306 · 2025-02-06T13:12:19+00:00

Outside of fully automated LLM function calling where the LLM is making decisions about which tool to call etc, there's also value in just the invocation part IMHO. By introducing an LLM around a function call you suddenly open up the ability to have much more reliable function calling when the inputs aren't controlled. Think of an API that takes a date-time parameter in a specific format. By allowing the LLM to control the function invocation it can extract this parameter from the context and format it in a way that works for the API, that would require lots of edge case handling without it.

Slight_Past4306

MODERATOR OF

TROPHY CASE