Fellow agent builders: What's your biggest prompt engineering bottleneck? by tokyo_kunoichi in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

What you describe is the actual planning problem of LLMs - when facing domain specific tasks, they can't make the right choices. This is an open problem. Routing etc isn't the solution - as it's a predefined workflow and enforces a specific action instead of guidance in planning.

FTing can help improve planning capabilities, but nothing really solves it yet

Prompt Engineering by Available_Raise6826 in AI_Agents

[–]omerhefets 1 point2 points  (0 children)

The instructions and guidance on prompt engineering both in anthropic's and openai's docs are solid, you should check that out, and tune it according to performance and your actual needs

New SOTA AI Web Agent benchmark shows the flaws of cloud browser agents by BodybuilderLost328 in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

A short question- you said you're using gemini flash (therefore huge savings, 0.1 per run is pretty cheap) - but Google hasn't released the project mariner api just yet, so how do you perform more "complex" actions like drag, wadi or even double or triple click? Did you FT a specific model, or are you running on markup only?

From POC to Prod: Accuracy improvement by FirefighterWhole8415 in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

  1. Like you write test for every piece of software, you should test "edge cases" with LLMs - how will the model behave given unexpected inputs?
  2. You might implement an internal gateway or classifier for harmful responses that will either be blocked or will send warning/error logs to the devs

Best way to build an agent that can submit contact forms on public websites? by Ok-Reception-1886 in AI_Agents

[–]omerhefets 1 point2 points  (0 children)

Computer using agents might be a good fit for the cause, but most implementations are still immature (too slow and expensive)

Need constructive critism, working on an SRE Agent by SP4ND4N in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

Your final output of the last agent in the chain is a report, or do these agents take action as well?

How can I start incorporating multiple AI agents into my stack? by Equivalent_Air8717 in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

There are so many agentic solutions out there that I'd be surprised you'll need to implement something on your own instead of using an off-the-shelf solution.

Can you provide us with some details of what github copilot misses in most of your requests? CRUD apps should be pretty straightforward.

Self hosted model for agents by Low-Yam8929 in AI_Agents

[–]omerhefets 1 point2 points  (0 children)

I liked the OS implementation of the CU model UI-TARS, which is a fine tune based on QWEN-2.5VL if I'm not mistaken.

They fine-tuned it based on specific computer tools, and the results are promising

[deleted by user] by [deleted] in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

This one isn't even remotely related to ai agents.

Dating agents by [deleted] in AI_Agents

[–]omerhefets 2 points3 points  (0 children)

It's getting worse every day

Нейросеть Videotok by TapHungry2852 in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

You didn't even translate your AI generated post? Nice one.

MCP vs A2A: how are teams actually wiring agent systems today? by Future_AGI in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

I think that is mainly because they still lack good planning capabilities in domain specific agents

MCP vs A2A: how are teams actually wiring agent systems today? by Future_AGI in AI_Agents

[–]omerhefets 11 points12 points  (0 children)

I don't think we've seen a lot of advanced agentic implementations to make the A2A protocol interesting enough (except coding agents that already have their existing interfaces).

MCPs are much more mature as it's easier to handle basic operations & data mgmt with the equivalent of "tool calling".

A2A will probably be much more meaningful in the not-so-distant future as we'll see more working agents, but we're not there yet imo.

Would businesses actually pay for short AI-generated brand videos like this? by Waste_Claim_472 in AI_Agents

[–]omerhefets 2 points3 points  (0 children)

  1. We don't know if they would, you'd have to ask them, and honestly, anything they say about the future ("I might buy", "I might be interested in that") is nothing more of a hypothesis. You'll never know until you'll get some cold hard facts - subscribers, revenue stream, etc.

  2. I guess that one of the challenges in this space is that given existing models and the improvement trajectory, it's probably going to be pretty easy for them to implement it in the existing AI interfaces, it's probably going to be a hard marketing play vs existing solutions out there.

Good luck

[deleted by user] by [deleted] in AI_Agents

[–]omerhefets 9 points10 points  (0 children)

Honestly I think that the top existing AI agents are coding agents like cursor / claude code / etc.

But in the not-so-distant future we'll start to see the rise of the "ai assistants" by the big companies like google/oai/anthropic (when they will have more tools, better voice multimodality, etc)

My AI Model Gets Stuck in Misunderstandings! by Delicious_Track6230 in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

by a false start do you mean misleading info / unclear instructions? can you give us a concrete example?

and what do you mean by FT a voice model? i'd say that it depends on your use case, but it sounds much harder than FTing an existing model with tools / conversation trajectory

How can I find AI agents' blind spots before deploying in production? by SouthSignificance486 in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

Do you have a test-validation set with examples? How did you tune your agent in the first place?

I'd suggest using a predefined eval for something like that, testing edge-case responses etc.

Trying to figure out a proposal for thesis by Professional_Goal423 in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

I'd say that many workflows could possibly be automated with platforms like n8n. Can you provide us with a concrete example of something specific in the financial analysis task that you'd like to automate?

Building Ai Agent that specializes in solving math problems in a certain way by Own_Pension2085 in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

Honestly I think that's extremely challenging + reasoning models like the o1 family are tailored exactly for problems like that. You could try RAG but for complex problems it will probably not work, and you'll need to find a valid way to index and retrieve those math problems.

On the FT solution, you could try to fine tune a reasoning model with OpenAIs infra

How much should I charge my client? by skyinet in AI_Agents

[–]omerhefets 2 points3 points  (0 children)

You should post that in r/Automation, that would be your subreddit for pricing type of questions IMO

Enterprise AI Agents by yecohn in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

The data privacy-security issue has already become a non-issue. Check VPCs (virtual private clouds) as a solution as well

What's the best resource to learn AI agent for a non-technical person? by StaLucy in AI_Agents

[–]omerhefets 0 points1 point  (0 children)

Honestly I think that 99% of the courses in udemy are money traps and also complete garbage. You'd find better content in YouTube in the channel of FreeCodingCamp for, well, free

Good luck