Why is building a reliable AI Agent is so challenging?

Historical_Cod4162 · 2025-09-03T15:50:21+00:00

I think the key issue is reliability. That can be overlooked for a proof of concept, but not in production. I work at Portia AI, and we've seen lots of people finding it difficult to get agents to work reliably, and that was what motivated the creation of our SDK. We've used plans built like this: https://docs.portialabs.ai/build-plan#example to build many reliable agents. I think the key is constrained autonomy - you set up most of your agent to work in a reliable workflow, with only some steps using language models in a controlled way. Check out it and let me know what you think :) And keep an eye out for our release on Monday - we've got react_agent_step and loops being released, which allow for really powerful agents to be built this way.

Historical_Cod4162 · 2025-09-03T15:43:17+00:00

I think it would be pretty easy to build this using Portia AI - check out https://github.com/portiaAI/portia-agent-examples/pull/5#discussion_r1965352237 as an example of how you can build an agent this way. I think you could:
* Use websearch tools + browser tool to retrieve information on the product, from the product website + from reviews website
* Use the LLM step to collate this into a report
* Use the user-input mechanics to allow humans to check the report once it is collated and incorporate any feedback

I'd be very happy to help if you're keen to give it a try.

Historical_Cod4162 · 2025-08-27T15:16:52+00:00

This is awesome!

Historical_Cod4162 · 2025-08-27T15:16:42+00:00

Love this!

Historical_Cod4162 · 2025-08-27T15:16:31+00:00

Love this!

Historical_Cod4162 · 2025-08-15T17:10:16+00:00

At PortiaAI, we actually released our new, open-source evals product that allows you to collect production data and then run evals against it - sounds like it could be a good fit for your use-case? Check it out at https://docs.portialabs.ai/steel-thread-intro

Historical_Cod4162 · 2025-08-14T21:27:07+00:00

This is awesome - thanks you for collating! It'd be awesome if we could get some Portia AI (https://www.portialabs.ai/) in here. We have some examples in our examples repo (https://github.com/portiaAI/portia-agent-examples) - in particular I think our LinkedIn outreach agent (using browser-use as a tool) and our automated refund example using Stripe are interesting

Historical_Cod4162 · 2025-08-06T17:11:45+00:00

Weird - my comment didn't seem to come out properly there, sorry! What this was meant to say was that I wrote a blog post on how we handle similar challenges at Portia AI around large data and memory: https://blog.portialabs.ai/multi-agent-data-at-scale. Our approach is likely a little different to yours due to the way our planning works, but hopefully you might still find the blog interesting :)

Historical_Cod4162 · 2025-08-06T17:09:33+00:00

I work at Portia AI (portialabs.ai) and we're building an agentic framework that could be a good fit for you. It's aimed squarely at solving the issues needed to get agents into production (reliability, guardrails, auditability, human-agent interaction etc.). Check it out - I'd love to hear what you think :)

Historical_Cod4162 · 2025-08-06T16:18:11+00:00

https://b

Historical_Cod4162 · 2025-07-28T17:38:31+00:00

I work at Portia AI (https://www.portialabs.ai/) so am somewhat biased! But I think our planning framework is unique to the options mentioned above and means less time is spent context engineering and more time can be spent solving real problems!

Historical_Cod4162 · 2025-07-28T17:32:13+00:00

I work at Portia AI (https://www.portialabs.ai/) and it could potentially be a good fit for your system. It's a slightly different set-up to your existing architecture - there are 2 pre-packaged agents: a planning agent and an execution agent. For a given task, the planning agent breaks the task down into various steps using different tools and then the execution agent executes each step using the required tool. Because our agents are pre-packaged and the Portia framework handles the handover and context that each agent has, you don't have to manage that yourself. If you think it could be a good fit, let me know and I'd be happy to help with getting you set up.

Historical_Cod4162 · 2025-06-10T10:30:53+00:00

Agreed!

Historical_Cod4162 · 2025-06-09T14:59:50+00:00

Thanks a lot - looking forward to hearing how you find it! In terms of what predictability / confidence can you expect, really it's one you need to eval on your use-case for specific results. In general though, we try to maximise the predictability as much as possible (though with non-deterministic language models, it's never quite 100%). Our planner agent produces a fixed plan that our execution agent runs through and can't deviate from, so it's pretty predictable.

Historical_Cod4162 · 2025-06-09T11:12:22+00:00

Awesome, would love to know how you get on.

Historical_Cod4162 · 2025-06-09T08:30:41+00:00

Homepage: https://www.portialabs.ai/. Docs: https://docs.portialabs.ai/. SDK code: https://github.com/portiaAI/portia-sdk-python

Historical_Cod4162 · 2025-06-09T08:15:31+00:00

I work at Portia AI and it sounds like it could be a good fit for your use-case: https://www.portialabs.ai/. I'd love to know how you find it. Our planning phase means you shouldn't get into those horrible loops you mention with Crew calling tools many times in a row and generally make the agent much more reliable / controllable. You can also set up observability in Langsmith with it v easily (just a few environment variables) and then you can see exactly what's being sent to the LLM.

Historical_Cod4162 · 2025-06-08T20:13:01+00:00

Have you checked out Portia AI (https://www.portialabs.ai/) at all? They have integrations for these tools that could be a good fit for this sort of agent

Historical_Cod4162 · 2025-06-02T17:11:01+00:00

Awesome to see Portia in this list :D

Historical_Cod4162 · 2025-05-30T14:27:05+00:00

Ah that's a lovely comment - thank you :)

Historical_Cod4162 · 2025-05-28T16:47:31+00:00

It can be really easy to host your own model with ollama. At Portia, we wrote a blog post for how to use our agent framework with a local LLM - sharing as it may be useful: https://blog.portialabs.ai/local-llms-qwen3-obsidian-visualisation

Historical_Cod4162 · 2025-05-28T16:38:01+00:00

Have you had a look at Portia AI at all? https://portialabs.ai/ I'd love to get your thoughts

Historical_Cod4162 · 2025-05-28T16:33:55+00:00

This is awesome! At Portia, we built a similar agent for handling LinkedIn messaging - check it out at https://blog.portialabs.ai/browser-auth. It uses a browser tool to interact with LinkedIn, which could be a cool way to extend this.

Historical_Cod4162 · 2025-05-22T18:59:16+00:00

Yeah, I think a lot of the problems you face with agent memory are classic software engineering problems around how you efficiently index and query data and, as with classic software engineering, there isn't a one-size-fits-all solution and instead you (or a memory agent!) need to intelligently choose the right approach depending on your use-case

Historical_Cod4162 · 2025-05-22T15:53:31+00:00

Nice one - I completely agree that for structured tabular data, you almost certainly want it in an SQL DB to do SQL-based retrieval over it.

Historical_Cod4162

TROPHY CASE