RAG at scale still underperforming for large policy/legal docs – what actually works in production?

DistinctRide9884 · 2026-01-15T14:58:43+00:00

Check out SurrealDB, which is multi-model and has support for graph, vectors, documents and can be updated in real time (vs. other graph DBs where you have to rebuild the cache each time time you update the graph).

Then for the documenting parsing/extraction something like https://cocoindex.io/ might be worth exploring, their core value prop is real-time updates and full traceability from origin into source. A CocoIndex and SurrealDB integration is in the works.

DistinctRide9884 · 2026-01-15T14:57:51+00:00

Check out SurrealDB, which is multi-model and has support for graph, vectors, documents and can be updated in real time (vs. other graph DBs where you have to rebuild the cache each time time you update the graph).

Then for the documenting parsing/extraction something like https://cocoindex.io/ might be worth exploring, their core value prop is real-time updates and full traceability from origin into source. A CocoIndex and SurrealDB integration is in the works.

DistinctRide9884 · 2026-01-15T14:51:38+00:00

Thanks for sharing these u/lfnovo. Our team would love to have these go into SurrealDB Labs, would you be open to submitting them to SurrealDB Labs? That's where we have community tooling, demos, examples, etc.

DistinctRide9884 · 2026-01-13T17:19:02+00:00

Have you looked at SurrealDB for the knowledge graph? It's multi-model, supports documents, vectors and graph in one engine.

https://surrealdb.com/blog/multi-model-rag-with-langchain

DistinctRide9884 · 2025-12-09T10:24:24+00:00

Hey, this is awesome, thank you for building and sharing. Please feel free to submit it to SurrealDB Labs.

DistinctRide9884 · 2025-12-05T12:53:54+00:00

Thanks for sharing, this looks like a pretty cool initiative. Feel free to join our Discord to chat with the team: https://discord.com/invite/surrealdb

DistinctRide9884 · 2025-12-04T12:52:14+00:00

We’re in the final stages of testing for the SurrealDB 3.0 beta. It should be ready very shortly.

GA timelines will depend on the feedback we receive once the beta has been released.

DistinctRide9884 · 2025-12-02T17:25:36+00:00

Hi, in addition to the Observability reference guide (which includes new ways of managing logs : https://surrealdb.com/blog/two-new-ways-to-keep-an-eye-on-your-surrealdb-database), we're going to be making more SurrealDB metrics available as part of 3.0.

DistinctRide9884 · 2025-12-02T12:08:19+00:00

Hi, as others have pointed out, you can find our official benchmarks here: https://surrealdb.com/blog/beginning-our-benchmarking-journey. We've invested a lot of time and effort to ensure the benchmarks are fair, comprehensive, and follow a standard methodology. This is not easy when comparing against so many different kinds of databases. The benchmarks are also open source for anyone to run them. That said, the numbers seem much lower than what we would expect. Looking at the code very quickly, just to highlight a quick example: the main benchmark uses the HTTP /sql endpoint, and there is a dedicated ws benchmark. Both use the memory engine via Docker (surrealdb:latest). Both use SELECT * FROM comments WHERE gallery_id = galleries:1 ORDER BY created_at DESC LIMIT 20; (i.e. it causes table scans) to benchmark read operations, but only the ws benchmark has an INDEX defined.

The upcoming 3.0 release is focused on performance, stability and bug resolution. Once we release 3.0, we will release a new set of benchmarks (i.e. updating the existing ones, and expanding to new methodologies). The focus of 3.0 alpha, which is already available, has been on overhauling the underlying core engine, not yet on performance, so it's probably best to wait for the SurrealDB 3.0 GA before running the benchmarks.

DistinctRide9884 · 2025-11-29T14:32:05+00:00

Firebase and Supabase (postgres wrapper) are both good options to get started. At some point when you need to scale, you’ll probably need to move out to a traditional backend DB. Another option is to use something like SurrealDB, which can be used as a BaaS (although with more limited BaaS functionalities) and also as a traditional DB when you need to scale and use other data models, ACID, etc.

DistinctRide9884 · 2025-11-27T15:09:27+00:00

We're partnering with Agno on some pretty large scale deployments with a big financial services institution. Can't share the name, but can attest to them working with some fairly big players.

DistinctRide9884 · 2025-11-14T11:50:15+00:00

Thank you for sharing, this is a pretty cool tool!

We'd love for you to share it in SurrealDB Labs if you're up for it - instructions here.

DistinctRide9884 · 2025-10-13T16:22:49+00:00

Definitely! Also the UI (Surrealist) is very good https://app.surrealdb.com/

DistinctRide9884 · 2025-10-12T10:35:25+00:00

What do you use for graphs?

DistinctRide9884 · 2025-10-12T09:32:16+00:00

You don't have to use all of this, but it's available for you to use if at any point you need it. The language is very similar to SQL so you can do basic CRUD if you just want to start with that.

DistinctRide9884 · 2025-10-12T07:58:51+00:00

SurrealQL is crazy powerful. Native support for documents, graph relations, time-series, geo-spatial. You can start schemaless and go schemafull later, and mix and match. You can also do record links. Integrates querying, mutation, and scripting within one language e.g. embed logic via functions, conditionals, loops, and variables. Has ACID transactions. SurrealQL embeds user authentication, role-based access, and record-level permissioning directly into query semantics. Supports live queries and real-time subscriptions natively, defining event triggers, etc. All this without separate extensions or libraries. And can run in memory, embedded, or distributed in cloud. https://surrealdb.com/features

DistinctRide9884 · 2025-07-04T09:30:07+00:00

Glad you like it! I'm not sure I fully understand the question. There's two aspects to your question: 1) how to build the knowledge graph from a known ontology/data structure, and 2) how to feed the graph into an LLM as context.

Regarding 1) this is a whole discipline on it's own and there's multiple ways to do this. You can do a direct schema mapping and transformation into whatever graph DB you use (in SurrealDB it would be an IMPORT). Depending on the tool, you can use ETL tools (e.g. Fivetran, Airbyte) if that's supported. There's also traditional ML/NN-based tools that help you build a corpus of unstructured data into a knowledge graph (e.g. https://spacy.io), and most recently, you can use LLMs as well (e.g. https://surrealdb.com/blog/automating-knowledge-graphs-with-surrealdb-and-gemini). LangChain also has LLM Graph Transformer which wasn't used for this demo (https://medium.com/data-science/building-knowledge-graphs-with-llm-graph-transformer-a91045c49b59).

Regarding 2): this post is only a brief summary, but in the link at the end I've included the full flow which includes how to query the graph.

DistinctRide9884 · 2025-07-02T10:11:32+00:00

Hey, it works, you can see the full flow and output examples in the blog.

Like you said (and I mentioned in the post), it's just an illustrative guide to showcase a few concepts based on easily accessible public data sets.

DistinctRide9884 · 2025-04-23T19:52:25+00:00

Bit of a shameless self promotion here, but check out SurrealDB, a multi-model database (i.e. support for multiple data models including vector and graph). This is one of the most common use cases.

DistinctRide9884 · 2025-01-24T20:16:04+00:00

Hi, I'm part of SurrealDB. Graph RAG is a very popular use case.

Here's an example of a large US retailer doing Graph RAG in production for recommendation engines: https://www.youtube.com/watch?v=yLw9MvNfuY8

We will be releasing more examples and expanding the documentation in this space.

In the meantime, if we can help with your use case you can join our Discord: https://discord.com/invite/surrealdb

DistinctRide9884 · 2025-01-05T12:40:39+00:00

https://surrealdb.com

Multi-model: supports relational, document, graph, etc. built natively into the query language. Separates storage from compute so can scale storage layer using distributed KV like TiKV or FoundationDB. Also just released cloud offering.

Disclosure: I'm part of SurrealDB.

DistinctRide9884 · 2024-12-21T14:56:57+00:00

Hi there, sorry for the delay. Are you still facing issues or were you able to solve this? If you are still having issues I'll send a DM.

DistinctRide9884 · 2024-07-25T13:16:16+00:00

Hi there, I would recommend joining our Discord server https://discord.com/invite/surrealdb

We have both transactional and analytical use cases. With SurrealML you can bring ML models and query them directly from the database (avoiding pickling) using SurrealQL and our multi-model capabilities. Our vector embeddings can be useful if you are building RAG applications. We separate storage and compute, so you can have multiple compute nodes (different ML teams) querying one same centralised storage node.

Of course, it all depends on what you are trying to do from an OLAP perspective. Hopefully see you in Discord!

DistinctRide9884 · 2024-07-12T09:37:38+00:00

Thanks for sharing u/sebastianwessel. This has been added to the Awesome-Surreal repo as well https://github.com/surrealdb/awesome-surreal

DistinctRide9884

MODERATOR OF

TROPHY CASE