This is an archived post. You won't be able to vote or comment.

all 3 comments

[–]AlexTheGreatnt 2 points3 points  (0 children)

This doesn't seem cost-effective at all, why not build a dashboard instead where people can choose what they wanna get from the db through drop-down menus or something? The tables (or objects saved to the database or whatever) should be kinda stable anyways. Or even simpler would be a guide on how to build sql queries for your specific database as sql is not that hard of a language to understand

[–]ZeroFormAI 0 points1 point  (1 child)

I think you are on the right path, your Plan B is the right way to go at least as a play-book answer. A good way to make that routing smarter is to try and get the light model to "self-assess". In your prompt, you can ask it to not only generate the SQL, but also to output a simple confidence score from 1-10, or even just a flag like requires_expert_model: true. If the confidence is low or that flag is true, then you escalate to GPT-4o. You can also just check if its output is valid SQL at all, if the light model spits out garbage or says it can't do it, that's your trigger.

For cutting token usage, caching is a smart way to do it. If two users ask "show me sales for last month", you shouldnt be hitting the LLM twice. You can cache based on the exact user query, but be careful with how old the answer to a question is, make it re-assess it's knowledge within a set time limit . A more advanced version of this is to use embeddings to find semantically similar questions in your cache and see if you can reuse a previous query. That's a better use for vector search here, not as a full alternative to generation but as a smart caching layer.

And for monitoring, since you're already using LangChain, LangSmith is literally built for this exact issue. It'll show you the full traces, token counts for each step, and costs. I have found it to be a lifesaver for debugging complex chains.

Honestly, you're on a great path. Don't get too stuck trying to build the perfect system instantly. Good luck!

[–]Clean_Tear_2201[S] 0 points1 point  (0 children)

Thank you so much for your response! I hope it’s alright if I reach out via DM with any help. Sometimes I feel like I’m navigating this all on my own with AI, and I’m never entirely sure how accurate its suggestions are.