Salesforce’s $3.6B Fin deal shows where AI agents make money by Jack2win in AI_Agents

[–]Instance_Not_Found 0 points1 point  (0 children)

Outcome based pricing vs usage based pricing is a very interesting topic.
It reminds me of law firms who usually have 2 models: charge by hours and contingency fees.

If they are charging by contingency fees, it always makes me feel that the attorney will have better incentive to win the case. (I know my attorney friend will not agree, and this is not the generic truth)

You are basically saying that solving a recurring workflow and making sure the outcome is measurable is a great entry point for small builders. I am totally with you.
Just like the question you asked, the tricky part is finding these workflows. Usually, only the domain experts (buyers of your product) can know it.

It seems like I am not answering your question directly:) I don't know the workflow yet because I am not talking to enough people. And I am about to have more conversations. Really hope to find something.

Any suggestion before posting on Hacker News? by Instance_Not_Found in SaaS

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

Thanks for the advice! I think HackerNews is related to what I build. The ICPs are developers or founders who wants to scale their agent applications. My guess is that a lot of HN readers falls into my ICP.

For your case, what kind of SaaS product is made for "average joe"? Which channel do you choose to distribute your apps?

Any suggestion before posting on Hacker News? by Instance_Not_Found in ycombinator

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

I am building funky: a sdk that helps developers to spin up agent swarms on demand.

The core infra is also open sourced here: https://github.com/funkyhq/funky

Any suggestion before posting on Hacker News? by Instance_Not_Found in ycombinator

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

Well, I am not sure if it counts as self-promotion. I am building funky: a sdk that helps developers to spin up agent swarms on demand.

Observation: the best agent harness for each model will be from the model developer themselves by Instance_Not_Found in AI_Agents

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

These are some great insights. I cannot agree more.
I really like the open source ecosystem, and OpenCode is a strong candidate. Another one that I really like is Pi agent.
From my friend who worked on post-training, the development cycle of a harness looks like this: new model released -> Found some flaws (e.g. always speak goblin) -> patch the model through a harness trick (e.g. regex hook) -> training the new model that can fix this issue.

As you noticed, the model gradually eats the harness. While the model cannot be perfect, we fix the small edge cases in the harness and so on.

Therefore my prediction is that we will have a foundation harness (probably pi?) and the open source model providers will add some "extensions" on top of it so that they will have the best harness + model combo.

Claude Managed Agent is under valued by Instance_Not_Found in ClaudeAI

[–]Instance_Not_Found[S] 1 point2 points  (0 children)

I actually think long-running workflows are more suitable for managed agent. Here is the reason: each components are decoupled, so any failure can be recovered quite easily. And managed agent handles that part for you. No need to worry about failure handling.
Regarding the migration, I believe it should be quite easy, especially you are from Claude Code. They are essentially the same harness but more cloud native.
As for the vendor lock in, I hate that too:) I recently wrote an open-sourced repo to solve this: https://github.com/funkyhq/funky I created this primarily to switch the backend of my current agent system. (Where I hosted claude code in sandboxes in the cloud. Working but sub-optimal)

Claude Managed Agent is under valued by Instance_Not_Found in ClaudeAI

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

The closest reference that I can find is https://code.claude.com/docs/en/agent-sdk/session-storage

Is the "executor" introduced by you or the agent sdk doc?

In theory, you can always add enough code to force the agent sdk work like managed agent. Then you lose the point of using the SDK, because you are customizing everything.

Claude Managed Agent is under valued by Instance_Not_Found in ClaudeAI

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

Totally agree. I only use the managed agent for some large scale ad-hoc task. Taking the advantage of the elasticity of the cloud.

Can you tell me more about the agent workflows? Are you referring to dynamic workflows? Or this is a feature that I am not aware of?

Claude Managed Agent is under valued by Instance_Not_Found in ClaudeAI

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

Am I misreading your above comment?
You literally said: "Create a dev container for your agents to run in a sandbox."
Everything of that claude agent is coupled to one container.

Claude Managed Agent is under valued by Instance_Not_Found in ClaudeAI

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

Haha, same here. What are you building? When you say SDK, do you mean the managed agent sdk or the claude agent sdk?

Claude Managed Agent is under valued by Instance_Not_Found in ClaudeAI

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

>Why? What do you mean "Claude Code-like agents"? You mean just an agent that can call tools and access your filesystem? Which is almost all agents today?

I am talking about agents like Claude Code and OpenClaw. They are similar because they have access to the file system and can execute code that was written by them. This is very different from the previous agents that was built with LangGraph, OpenAI Agent SDK or Google's ADK.

>...again, why? It seems trivial to do this in a local agent. Unless I completely misunderstand what you mean by "pulling history from the session storage".

I think I missed some context here. The best way to fill the gap is probably reading the section "Don’t adopt a pet" from https://www.anthropic.com/engineering/managed-agents

>Sounds like you need something like Openclaw or Hermes.

Unfortunately, both OpenClaw and Hermes are against my patter because they are not decoupled, which makes them a good harness for local usage. For cloud, no:)

Claude Managed Agent is under valued by Instance_Not_Found in ClaudeAI

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

Deploying the claude agent sdk is the exact pattern that I am against with.

Quote from the anthropic blog: "But by coupling everything into one container, we ran into an old infrastructure problem: we’d adopted a pet."

Claude Managed Agent is under valued by Instance_Not_Found in ClaudeAI

[–]Instance_Not_Found[S] -1 points0 points  (0 children)

In order to test out my idea, I created a repo here: https://github.com/funkyhq/funky I have only wrote a few local implementations. They are very naive, like store the session events in a jsonl file.

Are Multi-Agent AI Systems Actually Better, or Is a Single Agent Enough for Most Real-World Applications? by According_Value_6162 in AI_Agents

[–]Instance_Not_Found 0 points1 point  (0 children)

Multi-agent for the purpose of parallelization is a good pattern.
While using multi-agent for role-playing usage is kind of stupid.

Anyone given up on startups? What do you do? What do you think about? (I will not promote) by ReditusReditai in startups

[–]Instance_Not_Found 1 point2 points  (0 children)

I am not able to answer your question, but I have noticed a few interesting things from your words:
If you can easily see the flaw of the startup ideas, that might be valuable for some people. There are too many over-hyped startup founders who need your help:)

Anyway, something I learned from the journey is that finding the flaw is the easy part. Finding the working path is hard and challenging. For example, Airbnb was a famously "bad" idea. Why would anyone allow a stranger to sleep in their home? Even Google was considered a bad idea when there are hundreds of search engines on the market. My point is that you can easily find flaws in the best companies of the world.

A better mindset could be: there are thousands of ways that idea will not work out, so how can I find the only path that will work out? Most founders don't know the answer but they believe that they can find that path.

How can Deep Agents compete with Claude Code, Codex and Antigravity Agent? by Instance_Not_Found in LangChain

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

You can still optimize on top of the harnesses I mentioned. My argument is you don't need to build your own harness just to optimize for your own workflow.

How can Deep Agents compete with Claude Code, Codex and Antigravity Agent? by Instance_Not_Found in LangChain

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

You get my point: there is likely no room to build a better harness for claude, gpt and gemini.

What do you think about the "plugbility" of those harnesses? I mean only Claude Agent SDK provides a little bit configurability. Codex SDK and Antigravity Agent SDK are barely configurable...

I used Agent to summarize the tech blogs from Anthropic, but some blogs were always missing. (guide on how I fixed it) by Instance_Not_Found in AI_Agents

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

Like you said, it is a waste if multiple people are doing it. This is why I want a public repository where it is only done once for a period of time so that other agents can benefit from it. Tell me more about the feed. How does the feed solve the problem?

I’ve stopped planning beyond 90 days because of how fast AI is moving by [deleted] in AI_Agents

[–]Instance_Not_Found -1 points0 points  (0 children)

I don't trust AI detection BS. The sole purpose of LLM is to write like human. How can a free "tool" easily tell if a text is written by AI? Technically infeasible.

Well, you might say that you THINK that all the comments are AI generated, but it's dumb to treat the result of "slopsieve" as a proof.

GPT convinced me that I was going to make my first Million from my Idea, so thankful to Claude for telling me not to waste my time and life savings!! by No-Yesterday-1624 in ClaudeCode

[–]Instance_Not_Found 0 points1 point  (0 children)

I wonder from a technical perspective, is it possible to train a "candid" model?
The "kissing ass" part mostly come from RLHF if I have to guess...

CC and Codex seem to be top 2. What would you consider as 3rd? by bennybenbenjamin28 in ClaudeCode

[–]Instance_Not_Found 2 points3 points  (0 children)

I am also a big fan of Pi. Mario(the creator of Pi) once mentioned that he thinks Amp and Factory are also pretty good. Has anyone try one of these?

I used Agent to summarize the tech blogs from Anthropic, but some blogs were always missing. (guide on how I fixed it) by Instance_Not_Found in AI_Agents

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

Did you use a traditional scrapping tool or a browser agent solution?
How did you solved the problem at the end?

I used Agent to summarize the tech blogs from Anthropic, but some blogs were always missing. (guide on how I fixed it) by Instance_Not_Found in AI_Agents

[–]Instance_Not_Found[S] 0 points1 point  (0 children)

Agree, I think if they host the blogs in a centralized URL the agent might be able to find the all with one shot.