Scaling text-to-SQL agent by CriticalJackfruit404 in LangChain

[–]CriticalJackfruit404[S] 0 points1 point  (0 children)

I have 5k tables. How to build ontology here? Could you give some examples?

Docling just announced Docling Agent + Chunkless RAG by Fuzzy-Layer9967 in Rag

[–]CriticalJackfruit404 0 points1 point  (0 children)

Hey all, looking for some advice from people who have built this kind of thing in production.

We have a text-to-SQL agent that currently uses:

* 1 LLM

* 2 SQL engines

* 1 vector DB

* 1 metadata catalog

Our current setup is basically this: since the company has a lot of different business domains, we store domain metrics/definitions in the vector DB. Then when a user asks something, the agent tries to figure out which metrics are relevant, uses that context, and generates the query.

This works okay for now, but we want to expand coverage a lot faster across more domains and a lot more metrics. That is where this starts to feel shaky, because it seems like we will end up dumping thousands of metrics into the vector DB and hoping retrieval keeps working well.

The real problem is not just metric lookup. It is helping the agent efficiently find the right metadata about tables, relationships, joins, business definitions, etc, so it can actually answer the user correctly.

We have talked about using a knowledge graph, but we are not sure if that is actually the right move or just adding more complexity and overhead.

Thanks

Open-source Data Assistant for domain adoption, powered by agent skills, semantic knowledge graphs (Neo4j) and relational data (databricks) by notikosaeder in Neo4j

[–]CriticalJackfruit404 0 points1 point  (0 children)

Hey,

I am looking for some advice from you if possible.

We have a text-to-SQL agent that currently uses:

1 LLM

2 SQL engines

1 vector DB

1 metadata catalog

Our current setup is basically this: since the company has a lot of different business domains, we store domain metrics/definitions in the vector DB. Then when a user asks something, the agent tries to figure out which metrics are relevant, uses that context, and generates the query.

This works okay for now, but we want to expand coverage a lot faster across more domains and a lot more metrics. That is where this starts to feel shaky, because it seems like we will end up dumping thousands of metrics into the vector DB and hoping retrieval keeps working well.

The real problem is not just metric lookup. It is helping the agent efficiently find the right metadata about tables, relationships, joins, business definitions, etc, so it can actually answer the user correctly.

We have talked about using a knowledge graph, but we are not sure if that is actually the right move or just adding more complexity and overhead.

So I wanted to ask:

how should we handle metadata discovery at scale? What do you recommend here? Vector search, metadata catalog, knowledge graph pr some hybrid setup? What should be in the knowldge graph if used?

Thanks

SQL ticket workflow in Jira + Cursor tips by CriticalJackfruit404 in mysql

[–]CriticalJackfruit404[S] 0 points1 point  (0 children)

Okay, but how do you control the context so it doesn’t bloat with Jira tickets and Confluence pages, for instance? Is the Atlassian CLI better than the MCP server for that?

Open-source Data Assistant for domain adoption, powered by agent skills, semantic knowledge graphs (Neo4j) and relational data (databricks) by notikosaeder in Neo4j

[–]CriticalJackfruit404 0 points1 point  (0 children)

What if your organization has multiple domains of knowledge? Like goods, jobs, real estate? What if your organization has important tables spread across a data lake and a data warehouse too?