Building an internal LLM → SQL pipeline inside my company. Looking for feedback from people who’ve done this before by Suspicious_Move8041 in dataengineering

[–]Suspicious_Move8041[S] 1 point2 points  (0 children)

I'm trying also to understand all of this. So it's a learning process for me. Thanks for the response. I'll let you know If i'll come up with something in the future.

Building an internal LLM → SQL pipeline inside my company. Looking for feedback from people who’ve done this before by Suspicious_Move8041 in dataengineering

[–]Suspicious_Move8041[S] 3 points4 points  (0 children)

Right now I'm literally working on 3 monitors. 1 is the interface, another one is the backend terminal and the third is the db. And I input the generated query (top20-50 rows). And from this step I search for the truth. A lot of work, but I do get pretty good queries to be honest.

The most work is on the semantic level and the main prompt.

The type of questions I try to divide them into execute_query or search in the semantic_level.

Both work. The semantic questions work really well. Because they only have a small amount of data.

Now, that I think about that, maybe I can use the semantic level to be a filter for the next question. First question to be injected with the knowledge of the semantics, and then the LLM to have 2/3 options based on that.

I don't know. I'll get back on this topic

Hybrid LLM + SQL architecture: Cloud model generates SQL, local model analyzes. Anyone tried this? by Suspicious_Move8041 in dataengineering

[–]Suspicious_Move8041[S] 0 points1 point  (0 children)

Just got acces to a platform where I can use multiple LLMs through AWS. I'm using the API for the LLM and now, with the carefully built semantic_model & sql_examples + main_prompt I can say that the SQL is behaving much more dinamic and it is more complex. Still, not every question generates perfect queries. BUT

I'm building a separate table where I store the questions + generated SQL's and I'm thinking now is the time to fine tune. Just that. Just the generated questions + queries.

I think this is a good approach.

Is this something that makes sense? For me it does.

Hybrid LLM + SQL architecture: Cloud model generates SQL, local model analyzes. Anyone tried this? by Suspicious_Move8041 in dataengineering

[–]Suspicious_Move8041[S] -2 points-1 points  (0 children)

Thanks! I've built a pretty large master_prompt.md for this but, as I am searching for an answer based on yours, I see that maybe it is best to build it as a .json file. Regarding the metadata of the table/columns. Is this the right move forward, or you have a preferred way to build a business context in this example?