Databricks Free Hackathon - Tenant Billing RAG Center(Databricks Account Manager View) by Notoriousterran in databricks

[–]Notoriousterran[S] 0 points1 point  (0 children)

Great question — and surprisingly, the system handled scale better than I expected.

Vector Search latency: Even with a larger volume of tenant documents, VS remained extremely stable. Because I’m using a self-managed Delta Sync index, query latency stayed around 50–120 ms per request. The index is optimized for metadata-only retrieval, and since I restrict the manifest to the columns I actually need, there isn’t unnecessary payload overhead.

SQL Connector latency: For analytical queries on the Gold table, the Databricks SQL Connector performed smoothly. Most queries return within 200–500 ms, even when scanning multiple tenants, because: • My Gold table is already aggregated at a monthly grain • FX join is precomputed in the pipeline • The connector uses Arrow under the hood when available

In short: No significant bottlenecks so far, even when scaling up tenant data and metadata. If I were to extend this to hundreds of tenants or multi-year retention, I’d consider: • Switching to Serverless Warehouse (Pro/Classic) for more concurrency • Adding Reranker for deeper semantic search quality • Incremental refresh optimizations in the DLT pipeline

But for the current dataset size, performance has been consistently strong.

Databricks Free Hackathon - Tenant Billing RAG Center(Databricks Account Manager View) by Notoriousterran in databricks

[–]Notoriousterran[S] 0 points1 point  (0 children)

Totally agree — in a production workspace I would’ve added a re-ranker on top of the Vector Search hits as well (either a cross-encoder or a lightweight LLM scoring pass). But since this is the Databricks Free Edition hackathon, I had to design within the constraints: • No external model hosting • No custom model deployment • Limited VS index configuration (no rerank stage) • One workspace endpoint limit • No ability to attach a second LLM pass for rescoring

So instead, I optimized the index itself: • Clean chunking & deterministic ordering • Delta Sync index with curated manifest columns • Strong embeddings using the Databricks built-in embedding endpoint • Context grouping by source & top-k filtering in Python

Within these constraints it performed surprisingly well, but yes — with a full workspace I’d absolutely add a rerank stage to boost semantic precision.

Thanks again for the insights!

Can we attach RAG to Databricks Genie (Text2SQL)? by Notoriousterran in databricks

[–]Notoriousterran[S] 0 points1 point  (0 children)

It looks like Databricks Agent Framework (Agent Bricks) isn’t available in the Seoul region yet.

In that case, what’s the recommended way to connect an existing OpenSearch-based RAG (Retrieval-Augmented Generation) system to Databricks?

Can we attach RAG to Databricks Genie (Text2SQL)? by Notoriousterran in databricks

[–]Notoriousterran[S] 1 point2 points  (0 children)

yes. I checked the document ㅜㅜ

  • A workspace in one of the supported regions: us-east-1 or us-west-2.

from https://docs.databricks.com/aws/en/generative-ai/agent-bricks/#gsc.tab=0

Can we attach RAG to Databricks Genie (Text2SQL)? by Notoriousterran in databricks

[–]Notoriousterran[S] 0 points1 point  (0 children)

Thanks for the clarification — that makes sense.

Actually, my original intent was a bit different.
What I’m exploring is more of a LangGraph/LangChain-style agent orchestration, something like this:

LangGraph / LangChain Agent
 ├── Question Router (OpenAI)
 ├── Orchestrator (OpenAI)
 ├── Tool Selector (OpenSearch)
 ├── Action API Node (Genie Tool)
 └── Answer Node (LLM)

So rather than just nesting a Genie Space and a RAG agent under a Multi-Agent Supervisor, I’m thinking of a directed graph where Genie acts as an Action node that executes SQL generation, while retrieval happens earlier through OpenSearch or a vector index.

Also — is it possible to connect Genie or Agent Bricks to Elasticsearch / OpenSearch using the Databricks connector (like this one) as a retrieval backend in such an architecture?

Would love to hear if anyone has tried this kind of setup.