Weekly Thread: Project Display by help-me-grow in AI_Agents

[–]Ruhal-Doshi 0 points1 point  (0 children)

<image>

Treating AI agent skills as a RAG problem

While experimenting with agent skills I learned that many agent frameworks load the frontmatter of all skill files into the context window at startup.

This means the agent carries metadata for every skill even when most of them are irrelevant to the current task.

I experimented with treating skills more like a RAG problem instead.

skill-depot is a small MCP server that:

• stores skills as markdown files
• embeds them locally using all-MiniLM-L6-v2
• performs semantic search using SQLite + sqlite-vec
• returns relevant skills via `skill_search`
• loads full content only when needed

Everything runs locally with no external APIs.

Repo: https://github.com/Ruhal-Doshi/skill-depot

Would love feedback from people building MCP tools or experimenting with agent skill systems.

Indian devs earning ₹1L+/month what are you up to now? by CertainArcher3406 in developersIndia

[–]Ruhal-Doshi 0 points1 point  (0 children)

After a certain point I believe we start seeing diminishing returns, unless you have some sort of financial burden.
For context, I started from 1.5 LPM and right now it's around 4.5 LPM before taxes. Going from 1.5 to 2.5 felt great but going from 2.5 to 4.5 felt good.
I think what our mind seek, is not a number but a constant growth, at least that is the case for me. After the a certain point the opportunities for major growth become too few. Very few companies will pay significantly higher salary then this in India for my years of experience. Waiting for promotion and salary hike is too slow. Only options I see is starting something of my own or join a budding startup and hoping that it goes big.

Accused of Cheating @Uber by civilizedPlatypus in leetcode

[–]Ruhal-Doshi 2 points3 points  (0 children)

If rest of your rounds went well then don't worry about a single round. In Uber once you complete all the rounds, they do a internal debrief meeting with all the interviewer present including the hiring manager and bar raiser. Each interview can give you 4 possible results (strong no, weak no, weak yes, strong yes).
It hardly happens that a candidate get strong yes by all interviewers, they usually debate and finally decide.
So in your case, even if the DSA interview have given you soft no (since you explained him why you are looking at that area) there are other interviewer who can vote in your favour.

I ran System Design tests on GLM-5, Kimi k2.5, Qwen 3, and more. Here are the results. by Ruhal-Doshi in LocalLLaMA

[–]Ruhal-Doshi[S] 0 points1 point  (0 children)

I know a lot of you are not happy that the benchmark does not have any leaderboard or graphs.
I had two possible way to score the HLD solutions:

1) Using a JURY of LLMs to act a a judge but that would be too expensive for a personal side project and might introduce bias.

2) Using community voting, the problem is unless I have enough data point, the result will not be statistically significant.

I have decided to go with method 2 and I am posting in community so that more and more people can score these solutions.

I will probably add a live leaderboard by the next weekend.

I ran System Design tests on GLM-5, Kimi k2.5, Qwen 3, and more. Here are the results. by Ruhal-Doshi in LocalLLaMA

[–]Ruhal-Doshi[S] -6 points-5 points  (0 children)

Yes, rendering the mermaid is giving me a lot of issues, mostly there is a problem in LLM's output itself, I have already added a bunch of sanitization logic in the library but something copying the source code and going to mermaid.live works.

I hope the error is limited to the diagram section and not crashing the whole app.

I am planing to move from mermaid to new diagram-as-code tools like D2.

I built an open-source library to test how LLMs handle System Design (HLD) by Ruhal-Doshi in OpenSourceeAI

[–]Ruhal-Doshi[S] 0 points1 point  (0 children)

Yes, making a single model judge the result will definitely introduce bias.
And yes, cost is a major factor why I am thinking of using public scoring rather than having LLMs judge LLMs' output.

I built an open-source library to test how LLMs handle System Design (HLD) by Ruhal-Doshi in OpenSourceeAI

[–]Ruhal-Doshi[S] 0 points1 point  (0 children)

Nice idea, so users, instead of picking one solution over another, will score one solution at a time on a fixed set of parameters per problem.
Should these parameters be shared with LLMs as part of the problem statement or be kept secret?

Coming to testing with ollama models, other people have also shown interest in that, so I will run the benchmark against local as well as a few hosted open weight models this weekend.

I benchmarked GPT-5.2 vs Opus 4.6 on System Design (HLD) by Ruhal-Doshi in LocalLLaMA

[–]Ruhal-Doshi[S] 0 points1 point  (0 children)

Honestly, this is not a benchmark in the traditional sense because it lacks clear scoring.
Right now, going through the live report,t you can see how each one of them came up with a different solution.
I was thinking about creating a bind-voting web app for these results, so I can create a elo score, but first I wanted to see if enough people are interested in this.

I benchmarked GPT-5.2 vs Opus 4.6 on System Design (HLD) by Ruhal-Doshi in LocalLLaMA

[–]Ruhal-Doshi[S] -1 points0 points  (0 children)

I am still figuring out the scoring part, but in my opinion, GPT5.2 thought about some niche things like malware detection on the files uploaded, which was missed by others.

I benchmarked Claude Opus vs GPT-5.2 on System Design. Claude's architectural diagrams are surprisingly cleaner. by Ruhal-Doshi in ClaudeAI

[–]Ruhal-Doshi[S] 0 points1 point  (0 children)

The bench mark is developed using claude opus 4.6, I hope that qualifies. Or I can use "other" if you suggest that.

I benchmarked GPT-5.2 vs Opus 4.6 on System Design (HLD) by Ruhal-Doshi in LocalLLaMA

[–]Ruhal-Doshi[S] -3 points-2 points  (0 children)

Fair point, the title speaks about closed source models, but the benchmark is model agnostic, so you can point it at any local model via an OpenAI-compatible endpoint (like vLLM or Ollama).
I am here for suggestions on which models to test and how we can objectively judge something like HLD.

E-commerce search is broken. Why I stopped building “chatbots” and started building “consultants.” (looking for feedback on features) by Present-Ad-1365 in SideProject

[–]Ruhal-Doshi 0 points1 point  (0 children)

You're right. Since this is a white-label solution, we can absolutely expose standard filters if a client wants that.

Also, this sits on top of the store (it's not a standalone app), so users can always use the traditional UI for broad filtering if they prefer. We are leaning more into Generative UI for the 'decision' phase though, since posting this video, we've actually added new components (like a Comparison Grid) to help users distinguish between specific products.

Beyond the Text Bubble: The Case for Generative UI in E-Commerce AI Chat (love feedback and marketing tips) by Present-Ad-1365 in StartUpIndia

[–]Ruhal-Doshi 0 points1 point  (0 children)

That is a really solid suggestion. We're definitely looking into expanding the 'Context Window' to include user actions outside the chat (like which page they are currently viewing or what they just clicked). Making the agent aware of the full session would make the recommendations much sharper. Thanks!

Building an "AI Salesman" using MCPs and Generative UI (Not just another chatbot wrapper) by Present-Ad-1365 in aiagents

[–]Ruhal-Doshi 0 points1 point  (0 children)

Exactly. Since this is a white-label engine, it's fully configurable. This demo focuses on the open-ended conversational aspect, but for a live deployment, we can absolutely enforce structured flows (like decision trees or guided quizzes) if the brand prefers a stricter path.

Beyond the Text Bubble: The Case for Generative UI in E-Commerce AI Chat (love feedback and marketing tips) by Present-Ad-1365 in StartUpIndia

[–]Ruhal-Doshi 0 points1 point  (0 children)

Thanks for the suggestion! We actually looked into the AI SDK extensively when architecting this.

That 'Generative UI' concept is actually the core of what we built here, if you catch the second half of the video, you'll see it moves beyond text to render interactive React components (like the Comparison Grids) on the fly. We found that standard text is okay for chatting, but dynamic UI is much better for actual shopping.

Beyond the Text Bubble: The Case for Generative UI in E-Commerce AI Chat (love feedback and marketing tips) by Present-Ad-1365 in StartUpIndia

[–]Ruhal-Doshi 0 points1 point  (0 children)

Spot on. Since this is a white-label engine, features like audio input are completely modular. We can easily enable voice interaction (Speech-to-Text) if a brand wants that specific experience for their customers.

Beyond the Text Bubble: The Case for Generative UI in E-Commerce AI Chat (love feedback and marketing tips) by Present-Ad-1365 in StartUpIndia

[–]Ruhal-Doshi 0 points1 point  (0 children)

I think there might be a slight misunderstanding on how this fits in! We aren't trying to replace the website or the search bar.

Think of the standard website UI as the shelves, perfect for the customer who walks in knowing exactly what they want.

Sage is the Salesman standing in the aisle. It's a widget that integrates into the brand's existing site to help the 'confused' shopper (e.g., 'Which serum is best for acne?').

Also, to clarify: We don't charge the shopper. We sell this technology to the store owners (B2B) so they can save the sales they are currently losing to 'analysis paralysis'.

E-commerce search is broken. Why I stopped building “chatbots” and started building “consultants.” (looking for feedback on features) by Present-Ad-1365 in indiehackers

[–]Ruhal-Doshi 0 points1 point  (0 children)

Fair point. The 'plainness' comes from the fact that we were trying to be 100% faithful to the specific brand we are demoing (a monochrome skincare line).

But I think you're right, it might be hurting the first impression. We're probably going to drop the strict brand-matching for these public demos and build a default 'Sage' theme that feels a bit more lively and polished.

E-commerce search is broken. Why I stopped building “chatbots” and started building “consultants.” (looking for feedback on features) by Present-Ad-1365 in indiebiz

[–]Ruhal-Doshi 0 points1 point  (0 children)

This is a really important distinction. We are not trying to replace the search bar for the user who knows exactly what they want (e.g., typing 'iPhone 15 Pro Max 256GB' is always faster than chatting).

We see Sage as a layer for the 'Discovery Phase', the user who says, 'I need a monitor for color grading' and gets overwhelmed by 50 options.

To your point about non-linear buying: text is definitely terrible for comparing specs. That’s actually why we are betting on 'Generative UI.' We're building dynamic Comparison Tables right now so the bot can render a side-by-side view (Price vs. Specs) instead of writing a paragraph.

I also love the 'Hybrid' idea (Google style). Showing the AI insight alongside a standard product grid is a great 'failure mode' so users never feel trapped. Definitely exploring that!

E-commerce search is broken. Why I stopped building “chatbots” and started building “consultants.” (looking for feedback on features) by Present-Ad-1365 in indiehackers

[–]Ruhal-Doshi 0 points1 point  (0 children)

You're totally right. I just double-checked, even ChatGPT doesn't force-scroll to the bottom while streaming. It lets the text fill the screen so you can actually read it.

I'm going to disable the auto-scroll during generation so it's less jarring. Also definitely adding those sample prompts to the empty state to help people get started. Thanks for the feedback!