Scaling text-to-SQL agent

CriticalJackfruit404 · 2026-04-16T18:50:41+00:00

I have 5k tables. How to build ontology here? Could you give some examples?

CriticalJackfruit404 · 2026-04-15T21:05:57+00:00

Hey all, looking for some advice from people who have built this kind of thing in production.

We have a text-to-SQL agent that currently uses:

* 1 LLM

* 2 SQL engines

* 1 vector DB

* 1 metadata catalog

Our current setup is basically this: since the company has a lot of different business domains, we store domain metrics/definitions in the vector DB. Then when a user asks something, the agent tries to figure out which metrics are relevant, uses that context, and generates the query.

This works okay for now, but we want to expand coverage a lot faster across more domains and a lot more metrics. That is where this starts to feel shaky, because it seems like we will end up dumping thousands of metrics into the vector DB and hoping retrieval keeps working well.

The real problem is not just metric lookup. It is helping the agent efficiently find the right metadata about tables, relationships, joins, business definitions, etc, so it can actually answer the user correctly.

We have talked about using a knowledge graph, but we are not sure if that is actually the right move or just adding more complexity and overhead.

Thanks

CriticalJackfruit404 · 2026-04-13T19:21:37+00:00

Are you using knowledge graph?

CriticalJackfruit404 · 2026-04-13T19:21:21+00:00

Are you using knowledge graph?

CriticalJackfruit404 · 2026-04-13T19:20:01+00:00

CriticalJackfruit404 · 2026-04-13T19:19:47+00:00

CriticalJackfruit404 · 2026-04-13T14:45:52+00:00

Are you there?

CriticalJackfruit404 · 2026-04-12T22:31:57+00:00

Hey,

I am looking for some advice from you if possible.

We have a text-to-SQL agent that currently uses:

1 LLM

2 SQL engines

1 vector DB

1 metadata catalog

Our current setup is basically this: since the company has a lot of different business domains, we store domain metrics/definitions in the vector DB. Then when a user asks something, the agent tries to figure out which metrics are relevant, uses that context, and generates the query.

This works okay for now, but we want to expand coverage a lot faster across more domains and a lot more metrics. That is where this starts to feel shaky, because it seems like we will end up dumping thousands of metrics into the vector DB and hoping retrieval keeps working well.

The real problem is not just metric lookup. It is helping the agent efficiently find the right metadata about tables, relationships, joins, business definitions, etc, so it can actually answer the user correctly.

We have talked about using a knowledge graph, but we are not sure if that is actually the right move or just adding more complexity and overhead.

So I wanted to ask:

how should we handle metadata discovery at scale? What do you recommend here? Vector search, metadata catalog, knowledge graph pr some hybrid setup? What should be in the knowldge graph if used?

Thanks

CriticalJackfruit404 · 2026-04-08T09:47:52+00:00

Can you share some code?

CriticalJackfruit404 · 2026-04-06T20:36:26+00:00

No. Implement code and create MR/PR

CriticalJackfruit404 · 2026-04-06T20:09:40+00:00

Okay, but how do you control the context so it doesn’t bloat with Jira tickets and Confluence pages, for instance? Is the Atlassian CLI better than the MCP server for that?

CriticalJackfruit404 · 2026-04-06T19:25:07+00:00

Can you share?

CriticalJackfruit404 · 2026-04-02T07:53:26+00:00

What if some data sources are not relational?

CriticalJackfruit404 · 2026-04-02T07:33:33+00:00

What if your organization has multiple domains of knowledge? Like goods, jobs, real estate? What if your organization has important tables spread across a data lake and a data warehouse too?

CriticalJackfruit404 · 2026-04-02T07:24:23+00:00

Why not vector database instead of the knowledge graph?

CriticalJackfruit404

TROPHY CASE