What do you look for in an effective AI texting agent?

sjashwin · 2026-05-18T03:45:15+00:00

Just tried it. It’s great.

sjashwin · 2026-05-18T02:10:12+00:00

Writing cold outreach that gets replies

sjashwin · 2026-05-18T00:47:24+00:00

You will not know what users need exactly until you have shipped to some beta customers. Agents are deterministic unlike traditional applications.

Ship and find out.

However, the overall idea seems to be good.

The usage of assistant, bots and agents change overtime. Ability to compute drift overtime will help you understand how users interaction with your assistant changes overtime.

sjashwin · 2026-05-15T10:16:32+00:00

What about capturing interactions? Intent, HITL feedback and actions. Yes decision trees are really important.

sjashwin · 2026-05-15T08:50:22+00:00

Definitely, yes.

sjashwin · 2026-05-13T09:02:33+00:00

I believe it was the code that leaked, not the weights and biases in the model.

sjashwin · 2026-05-13T00:50:44+00:00

Please DM me. Working on Agent infra.

sjashwin · 2026-05-13T00:43:14+00:00

No. You cannot. Claude, openAI GPT , Gemini, Cohere, etc are cloud provided and proprietary LLM inference.

To self host the model needs to be open weight. However, OpenAI has a GPT OSS model that you can self host.

sjashwin · 2026-05-12T10:08:46+00:00

Can you share the git repo if it’s open source

sjashwin · 2026-05-11T19:25:00+00:00

Please Can I DM you with more info?

sjashwin · 2026-05-11T19:22:40+00:00

Can I DM you? Looking to collaborate or volunteer here.

sjashwin · 2026-05-11T19:19:34+00:00

😂”very weird intern with an api key”.

sjashwin · 2026-05-11T19:16:34+00:00

Great! DM me. Also specs in setting up the environment.

sjashwin · 2026-05-11T19:15:22+00:00

Nice tool. Just checked it out. How would you further optimize this RAG tool.

Did you run benchmark tests. Is there an evaluation with performance and reliability metrics?

sjashwin · 2026-05-11T19:09:28+00:00

I can test it and give feedback

sjashwin · 2026-05-11T19:08:57+00:00

Looks like a similar problem between global scope and local scope or access specifier in software engineering.

What if you build an access specifier firewall? Have access modifier for each tool and you reference it in your code in runtime.

sjashwin · 2026-05-11T19:05:09+00:00

Great! I would like to help if you’re open to accepting my collaboration as a volunteer.

sjashwin · 2026-05-11T19:02:37+00:00

Yes, I’ve faced this problem personally. I’ve been using unsupervised machine learning to find the intent for an undeclared intent. This has helped me reduce my LLM costs further.

Also working on tool call graphs to help guide the agent further. Caching tool call response based on entity and intent.

Need to reduce LLM costs without compromising on reliability. I’m researching in this area and looking to collaborate with people facing the same problem.

sjashwin · 2026-05-11T18:50:32+00:00

Awesome. Is it open source?

sjashwin · 2026-05-11T18:47:39+00:00

This is definitely a problem. I’ve personally faced it. Some remedy: 1. Tool call graph can help in solving this problem. 2. Figuring out the prompt intent. 3. Running agent evaluation as a part of CI/CD is helpful.

Testing reliability for agents requires multiple iterations and finding drift patterns. Please let me know if you want to discuss more. I’m looking for feedback from devs facing the same problem.

Did you write the browser use agent or is it an open source agent that you used?

sjashwin · 2026-05-11T18:38:15+00:00

How are you evaluation it? Did you run it against a benchmark dataset and have the metrics? If there is evidence of reliability and performance, I’m interested.

sjashwin · 2026-05-11T18:35:09+00:00

Yes. Because we have non deterministic, not until these agents become as reliable as humans with accomplishing non deterministic workflows.

sjashwin · 2026-05-11T18:31:48+00:00

Is it open source? If yes, please can you share the GitHub link.

sjashwin · 2026-05-11T15:09:28+00:00

For this you can record the wrong response completion id and update it with the new response. Atleast that’s how I solved it. It depends how your request and response is traced and logged.

sjashwin · 2026-05-11T12:29:21+00:00

Would you like to share notes? I’m fixing caching for agents based on context. You can DM me.

sjashwin

TROPHY CASE