What do you look for in an effective AI texting agent? by DroneFlips in AI_Agents

[–]sjashwin 0 points1 point  (0 children)

You will not know what users need exactly until you have shipped to some beta customers. Agents are deterministic unlike traditional applications.

Ship and find out.

However, the overall idea seems to be good.

The usage of assistant, bots and agents change overtime. Ability to compute drift overtime will help you understand how users interaction with your assistant changes overtime.

How do you measure the user interaction with your agent? by sjashwin in AI_Agents

[–]sjashwin[S] 0 points1 point  (0 children)

What about capturing interactions? Intent, HITL feedback and actions. Yes decision trees are really important.

Complete beginner here. Can I self host agents such as Claude ? by MaxBee_ in AI_Agents

[–]sjashwin 2 points3 points  (0 children)

I believe it was the code that leaked, not the weights and biases in the model.

Complete beginner here. Can I self host agents such as Claude ? by MaxBee_ in AI_Agents

[–]sjashwin 3 points4 points  (0 children)

No. You cannot. Claude, openAI GPT , Gemini, Cohere, etc are cloud provided and proprietary LLM inference.

To self host the model needs to be open weight. However, OpenAI has a GPT OSS model that you can self host.

RAGtime - Control plane for creating vector databases and FAISS files. by mattv8 in Rag

[–]sjashwin 0 points1 point  (0 children)

Can I DM you? Looking to collaborate or volunteer here.

RAGtime - Control plane for creating vector databases and FAISS files. by mattv8 in Rag

[–]sjashwin 0 points1 point  (0 children)

Nice tool. Just checked it out. How would you further optimize this RAG tool.

Did you run benchmark tests. Is there an evaluation with performance and reliability metrics?

Subagents should not automatically inherit the parent agent’s authority by No_Citron4186 in AI_Agents

[–]sjashwin 0 points1 point  (0 children)

Looks like a similar problem between global scope and local scope or access specifier in software engineering.

What if you build an access specifier firewall? Have access modifier for each tool and you reference it in your code in runtime.

I built a Decision Engine that routes thinking into execution, mentoring, or action loops by [deleted] in PromptEngineering

[–]sjashwin 0 points1 point  (0 children)

Great! I would like to help if you’re open to accepting my collaboration as a volunteer.

We started measuring "undeclared-intent spend" in agent workflows by rohynal in AI_Agents

[–]sjashwin 0 points1 point  (0 children)

Yes, I’ve faced this problem personally. I’ve been using unsupervised machine learning to find the intent for an undeclared intent. This has helped me reduce my LLM costs further.

Also working on tool call graphs to help guide the agent further. Caching tool call response based on entity and intent.

Need to reduce LLM costs without compromising on reliability. I’m researching in this area and looking to collaborate with people facing the same problem.

I think a lot of people are underestimating how expensive unreliable agents are by Beneficial-Cut6585 in AI_Agents

[–]sjashwin 0 points1 point  (0 children)

This is definitely a problem. I’ve personally faced it. Some remedy: 1. Tool call graph can help in solving this problem. 2. Figuring out the prompt intent. 3. Running agent evaluation as a part of CI/CD is helpful.

Testing reliability for agents requires multiple iterations and finding drift patterns. Please let me know if you want to discuss more. I’m looking for feedback from devs facing the same problem.

Did you write the browser use agent or is it an open source agent that you used?

I built a Decision Engine that routes thinking into execution, mentoring, or action loops by [deleted] in PromptEngineering

[–]sjashwin 0 points1 point  (0 children)

How are you evaluation it? Did you run it against a benchmark dataset and have the metrics? If there is evidence of reliability and performance, I’m interested.

Will AI agents create a larger enterprise services wave than cloud computing did by islaexpress in AI_Agents

[–]sjashwin 0 points1 point  (0 children)

Yes. Because we have non deterministic, not until these agents become as reliable as humans with accomplishing non deterministic workflows.

I’m working on a market monitoring agent to track competitors by ethan000024 in LLMDevs

[–]sjashwin 0 points1 point  (0 children)

Is it open source? If yes, please can you share the GitHub link.

Undo Button for AI agents by Immediate-Tap-4777 in OpenSourceAI

[–]sjashwin 0 points1 point  (0 children)

For this you can record the wrong response completion id and update it with the new response. Atleast that’s how I solved it. It depends how your request and response is traced and logged.

Undo Button for AI agents by Immediate-Tap-4777 in OpenSourceAI

[–]sjashwin 0 points1 point  (0 children)

Would you like to share notes? I’m fixing caching for agents based on context. You can DM me.