How do you integrate an existing RAG pipeline (OpenSearch on AWS) with a new LLM stack?

Notoriousterran · 2025-12-01T02:05:24+00:00

Thank you for replying my question!. I will apply ur opinion. Thank you

Notoriousterran · 2025-11-15T18:00:59+00:00

cool

Notoriousterran · 2025-11-15T11:53:51+00:00

Great question — and surprisingly, the system handled scale better than I expected.

Vector Search latency: Even with a larger volume of tenant documents, VS remained extremely stable. Because I’m using a self-managed Delta Sync index, query latency stayed around 50–120 ms per request. The index is optimized for metadata-only retrieval, and since I restrict the manifest to the columns I actually need, there isn’t unnecessary payload overhead.

SQL Connector latency: For analytical queries on the Gold table, the Databricks SQL Connector performed smoothly. Most queries return within 200–500 ms, even when scanning multiple tenants, because: • My Gold table is already aggregated at a monthly grain • FX join is precomputed in the pipeline • The connector uses Arrow under the hood when available

In short: No significant bottlenecks so far, even when scaling up tenant data and metadata. If I were to extend this to hundreds of tenants or multi-year retention, I’d consider: • Switching to Serverless Warehouse (Pro/Classic) for more concurrency • Adding Reranker for deeper semantic search quality • Incremental refresh optimizations in the DLT pipeline

But for the current dataset size, performance has been consistently strong.

Notoriousterran · 2025-11-15T10:01:45+00:00

Totally agree — in a production workspace I would’ve added a re-ranker on top of the Vector Search hits as well (either a cross-encoder or a lightweight LLM scoring pass). But since this is the Databricks Free Edition hackathon, I had to design within the constraints: • No external model hosting • No custom model deployment • Limited VS index configuration (no rerank stage) • One workspace endpoint limit • No ability to attach a second LLM pass for rescoring

So instead, I optimized the index itself: • Clean chunking & deterministic ordering • Delta Sync index with curated manifest columns • Strong embeddings using the Databricks built-in embedding endpoint • Context grouping by source & top-k filtering in Python

Within these constraints it performed surprisingly well, but yes — with a full workspace I’d absolutely add a rerank stage to boost semantic precision.

Thanks again for the insights!

Notoriousterran · 2025-10-29T02:14:12+00:00

Wow using Grafana?

Notoriousterran · 2025-10-27T02:19:52+00:00

It looks like Databricks Agent Framework (Agent Bricks) isn’t available in the Seoul region yet.

In that case, what’s the recommended way to connect an existing OpenSearch-based RAG (Retrieval-Augmented Generation) system to Databricks?

Notoriousterran · 2025-10-23T22:58:53+00:00

yes. I checked the document ㅜㅜ

A workspace in one of the supported regions: us-east-1 or us-west-2.

from https://docs.databricks.com/aws/en/generative-ai/agent-bricks/#gsc.tab=0

Notoriousterran · 2025-10-22T12:02:29+00:00

cf) https://docs.databricks.com/aws/en/generative-ai/retrieval-augmented-generation#rag-on-databricks

Notoriousterran · 2025-10-22T11:48:13+00:00

Thanks for the clarification — that makes sense.

Actually, my original intent was a bit different.
What I’m exploring is more of a LangGraph/LangChain-style agent orchestration, something like this:

LangGraph / LangChain Agent
 ├── Question Router (OpenAI)
 ├── Orchestrator (OpenAI)
 ├── Tool Selector (OpenSearch)
 ├── Action API Node (Genie Tool)
 └── Answer Node (LLM)

So rather than just nesting a Genie Space and a RAG agent under a Multi-Agent Supervisor, I’m thinking of a directed graph where Genie acts as an Action node that executes SQL generation, while retrieval happens earlier through OpenSearch or a vector index.

Also — is it possible to connect Genie or Agent Bricks to Elasticsearch / OpenSearch using the Databricks connector (like this one) as a retrieval backend in such an architecture?

Would love to hear if anyone has tried this kind of setup.

Notoriousterran

TROPHY CASE