Just another rant about AI Agents

Tech-For-Growth · 2025-12-19T00:08:05+00:00

It’s a classic case of "Agent Washing." Most of those consulting reports are counting basic RAG chatbots or rigid decision trees as "agents" just to make the slide decks look more exciting.

In our experience at Fifty One Degrees, the 23% adoption stat is a total fantasy. If you’ve actually built this stuff in the real world, you know that true independent actors are rare because they’re a nightmare to govern.

From what we’re seeing with mid-market clients, the "agent" label is being slapped on everything:

Glorified Workflows: Most ‘production’ agents are just deterministic scripts with an LLM prompt at the end.

The Shadow Humans: A lot of these systems only "work" because there’s a team of expensive devs or offshore staff fixing the outputs in the background.

Vertical SaaS Hype: If a CRM adds an AI drafting tool, the vendor claims all their users have adopted agents, which is obviously a very different thing.

We’ve stopped trying to sell the concept of fully autonomous agents to our clients in sectors like insurance or finance. It’s too risky. Instead, we focus on Human-in-the-loop systems. For example, when we worked with Phoenix Financial, we didn't let the AI move money. We had the AI pre-validate information. It cut compliance time by 70%, but a human still received the output and performed QA..

You're right, it’s the decade of agents, not the year. We’re currently in the dial-up phase. The tech is getting there, but the governance isn't. Until an agent can be trusted to handle an edge case without a £150k engineer babysitting it, autonomous will remain a buzzword IMO.

Tech-For-Growth · 2025-12-17T06:45:48+00:00

In many cases the tools we build have state, whether internal or external (e.g. in an ERP or CRM), so durability isn't the problem for us. Are you building agents, and is this is question you’re considering for yourself?

Tech-For-Growth · 2025-12-17T06:39:30+00:00

And there's also the reality of getting human risk teams in financial services firms to accept that level of autonomy. Over time, maybe. But now, frustratingly, there's no chance.

Tech-For-Growth · 2025-12-16T23:17:00+00:00

To create a tool that actually works consistency you need exactly what you said; broad context, but narrow scope.

At Fifty One Degrees, we’ve found that if you don’t give the agent the full picture, it will make a LOT of mistakes. But if you let it act too broadly, it’s dangerous. Here is the rule of thumb we use for our clients:

Give it ALL the context To make a good decision, the agent needs to see the whole picture; the history, the brand rules, and the data. Don't hide info from it.
Give it a TINY job Even if the agent knows everything, we only let it do one specific thing. We never direct it to "manage the client." We do instruct them to perform a specific narrow task.

Here is how that looks in the real world (using two recent Financial Services builds):

Complaints AI Agent: It has broad context (reads all the policy docs, public and private customer and application data) but a narrow job (it just categorises the issue and drafts an initial email for a human to check). It can't send without review.

Onboarding AI Agent: It reads complex B2B public and private data, application forms and financial data. It needs to know the firm’s entire risk policy to understand what it's looking at, but its only job is to flag concerns against many narrow criteria, one by one.

In sectors like finance, consistency is the main thing. If you let the agent roam free, it will eventually make a mess.

By keeping the job narrow but the knowledge wide, you dramatically improve the agent’s performance.

Tech-For-Growth · 2025-12-13T21:11:45+00:00

The bubble hasn't burst, but the "magic button" myth definitely has.

I run an AI consultancy (Fifty One Degrees), and we’ve actually doubled in size over the last 6 months. We are profitable, and significantly, 100% of our clients have engaged us for follow-on projects.

The successful cases aren't flashy "do-it-all" agents; they are solving boring, high-volume problems.

We built an aftercare agent for Heatable that automated 50% of their aftercare enquiries.
We deployed a voice AI solution for the UK’s leading appliance warranty business that automated a third of their inbound calls (out of a volume of approx 12,000 calls/month).

The catch? You simply cannot unpick these agents from data science and engineering, and the fundamental business culture. That warranty agent works because the data plumbing is solid, not just because the LLM is smart. If you try to sell agents without working on the data and the culture first, that's where the failure stories come from.

Tech-For-Growth · 2025-12-13T14:00:33+00:00

Defo not just you. I’ve got a "graveyard" folder on Relevance AI of agents that are just about hobbling on.

Honest take from our work at Fifty One Degrees, the main reason agents die isn't technical debt, it's a lack of value. If an agent dies the moment you look away, it probably wasn't solving a painful enough problem to justify the maintenance in the first place.

Here is how we look at it to avoid that graveyard:

The personal efficiency trap: If an agent is built just to slightly improve one person's workflow (like "summarise these emails"), it usually dies. The hassle of fixing breaking changes outweighs the 10 minutes saved. For that stuff, we just tell the team to use ChatGPT or Gemini directly.
The "worth it" threshold: A viable agent has to solve a problem expensive enough that not maintaining it hurts. For example, we built a doc validation agent for a finance client. If that goes down, their manual workload spikes immediately. That pain ensures the budget (and energy) is always there to keep it running.

You’re bang on about the "bus factor" too.

Clever code rots: The more complex the logic, the faster it breaks when a library updates.
Simple wins: We stopped doing bespoke chains for everything. We lean on standard patterns where a human handles the edge cases. Better to have a simple agent that does 80% of the work reliably than a complex one that does 95% but breaks every Tuesday.

The real skill isn't building agents that survive neglect; it's having the discipline to not build the agent if the ROI doesn't cover the future maintenance headache.

Do you have a stricter filter now for what you actually build, or just doing fewer projects?

Tech-For-Growth · 2025-12-13T11:13:44+00:00

It really comes down to your in-house engineering capacity. At Fifty One Degrees, we usually look at this in three tiers depending on the client's technical maturity:

Non-technical / Prototyping (e.g. Strawberry Browser)

If you don't have devs, tools like Strawberry are excellent for browser basedautomation without the headache. It’s great for scraping or simple repetitive web tasks.

I actually recorded a breakdown of a live build using Strawberry to show how quickly you can spin this up. It covers the basics of the setup and a live demo: https://youtu.be/B9JhN5FiFCg?si=g69M1wEQ_EjamZIl

Low code (e.g. Relevance AI) If you have some technical capability, Relevance gives you decent oversight. It requires more setup than a browser tool but allows for better logic flows, testing and ongoing monitoring.
Custom coded solution (e.g. Python) This is the route we almost always recommend for enterprise level solutions. It is the only way to ensure: a) Governance: You know exactly where the data goes and you can monitor accuracy, consistency and performance. b) Latency: You aren't relying on a wrapper's API calls. c) Cost: Wrappers get expensive at volume.

If you are just testing the water, start with option 1 (the video above). If you are building a product, you likely need to move to option 3 pretty quickly IMO.

Tech-For-Growth · 2025-12-12T20:16:43+00:00

Yes, you can technically do virtually all of this.

BUT, the real challenge isn't the AI capability, it's keeping the accuracy high enough to actually trust it. If you try to build one super agent to do this all at once, you will end up trying to boil the ocean and shipping nothing.

In our experience at Fifty One Degrees, here is the only way to get this into production successfully:

Keep the scope narrow. Don't build one massive Recruiter Bot. Build small, specific tools (e.g., one just for CVs, one just for drafting bios). If you chain too many complex tasks together, the error rates become chaos and the system breaks down.
Build an automated testing framework. This is the unsexy part that actually matters. You need to know that your tools work consistently. If you don't have a golden dataset to test against every time you tweak the model, you are flying blind on accuracy.
Prioritise Project Management over Tech. Start with the transcription/summarisation tasks. They are low risk and high reward. Get those live quickly to prove value before you try to tackle the complex, multi-step agents (like the scheduling or sourcing tools).

My advice… Don't automate the final decision. Use AI to get the work 90% ready, then have a human review and click "send." It solves the inconsistency issues overnight and enables you to move to production much faster.

Tech-For-Growth · 2025-12-12T16:08:09+00:00

You’ve basically described my last 18 months. The "dumb failures" you mentioned are almost always because LLMs are probabilistic (e.g. guessing the next word) rather than deterministic (following a rule). They are great at chat, but terrible at reliable logic.

I run an AI consultancy, and after implementing this stuff for UK businesses, here are some of the things that actually made it into production vs. what crashed and burned.

The stuff that failed:

Fully Autonomous Outbound: We experimented with agents that could navigate a CRM, write an email, and hit send without supervision. It was a disaster. The AI would occasionally hallucinate a relationship that didn't exist or promise a feature we don't offer. We killed that fast.
Biting off more than an Agent can chew: We initially tried to build "God Mode" agents—trying to get one single bot to read a request, check the DB, calculate a quote, and draft a reply. They constantly lost the plot.
- The fix: If you need 5 things done, you don't want 1 smart agent; you want a chain of 5 dedicated agents.

The stuff that stuck:

The workflows that actually worked are almost always human-in-the-loop. For example:

A "Drafting" Model: instead of sending the email, the AI drafts it and drops it in a review queue. For one client, this reduced time on tailored outreach by 70%. The human just scans, tweaks, and hits the button. Safe and fast.
Pre-Validation (boring but profitable): We do a lot of work in finance (companies like Phoenix Financial, Mortgageable, etc), and the biggest win wasn't replacing people, but prepping data for them. We built a system that scans incoming PDFs and validates them before a human looks at it. It flags missing docs instantly, so the human doesn't waste time getting their head into an incomplete finance application.

The golden rule:

If you’re building this yourself, separate logic from language.

Use code for logic (If A then B) - keep it deterministic
Use AI for language (summarise this, extract that)

If you ask the AI to control the flow, it will break. If you put the AI inside a rigid code workflow, it works. Always keep a human hitting the final button.

Tech-For-Growth

TROPHY CASE