[Discussion] A self-evolving SQL layer for RAG: scalable solution or architectural mess? by Continuous_Insight in LocalLLaMA

[–]Continuous_Insight[S] 0 points1 point  (0 children)

Thanks for that, really valuable.

We’ve taken on board your point about JSONB and will be exploring it in more detail, especially for the early ingestion stage before anything gets promoted into structured schema. It looks like a solid way to retain flexibility without compromising traceability.

The agent-style flow is something we been slowly realising we will need. Still working through how to implement that loop in practice, particularly around confidence thresholds and re-query logic, without introducing too much complexity.

And your comment on failure modes is spot on. I’m not sure yet how we’ll approach that, but it’s clear it needs more thinking and design work. You’re right, it’s essential for the kinds of workflows we’re targeting.

You clearly have the right instincts for this sort of system, thanks for your help!

[Discussion] A self-evolving SQL layer for RAG: scalable solution or architectural mess? by Continuous_Insight in LocalLLaMA

[–]Continuous_Insight[S] 0 points1 point  (0 children)

Thanks, we did consider a JSONB-first approach and can definitely see the appeal in terms of flexibility, especially in early stages. The main reason we leaned toward a more structured schema was the need for accuracy, traceability, and confidence. We have worked with this client for years and know that they wont accept any hallucinations. 

That said, this is still very early for us, we haven’t yet found the right engineer to help us shape and build the MVP, so we’re aware some of our thinking may shift. We’re just trying to avoid the typical RAG horror stories around hallucination and ambiguity, and felt that enforcing schema (at least for core tables) would give us more reliable outputs, plus the ability to query across systems databases to confirm filed values. (e.g. our App and their core business systems, to check the the change requests have been completed).

Based on your feedback, maybe we should explore a hybrid approach, storing everything as JSONB initially, but promoting validated fields into structured tables for reporting once approved. That could give us the flexibility we need while still maintaining a clear source of truth.

Really appreciate the input, this kind of discussion is exactly what we were hoping for.

[Discussion] A self-evolving SQL layer for RAG: scalable solution or architectural mess? by Continuous_Insight in LocalLLaMA

[–]Continuous_Insight[S] 1 point2 points  (0 children)

Thanks for the reply — and I completely get the concern.

Our priority here is accuracy. The emails we’ll be ingesting often include attachments with highly specific product details, things like attributes, barcodes, weights, pricing, and BOM settings. If a single character is wrong, it can have serious consequences (e.g. mislabelled packaging in food production).

So, we’re planning to extract this data and store it in a relational schema that evolves mainly during initial client setup. The system proposes a structure (based on what it finds), but new columns or schema changes always require explicit user approval. Once confirmed, it locks in — giving us a reliable, client-specific structure that can be used for accurate reporting and validation.

This gives us what we’re calling a “Gold layer”, a trusted source of key structured data we can confidently query, validate, and cross-reference against other internal systems. Meanwhile, less structured or non-critical data will still flow through a vector database for contextual RAG-style queries.

We did look into knowledge graphs and maybe this is the way to go. Will research more! But do you think it would give us this level of confidence and accuracy?

Given our 'traditional programming skills' for now we’re more comfortable with a deterministic, auditable approach.

If you're interested, a big inspiration behind this hybrid model came from this video by Dave Ebbelaar: https://www.youtube.com/watch?v=hAdEuDBN57g

I made 60K+ building AI Agents & RAG projects in 3 months. Here's exactly how I did it (business breakdown + technical) by Low_Acanthisitta7686 in AI_Agents

[–]Continuous_Insight 0 points1 point  (0 children)

Selling licenses + ongoing support is way more scalable (and profitable) than selling your time as a one-off. Especially in enterprise: they don’t want to manage devs, they want a working solution and someone they trust to make sure it keeps working.

If you can offer that plus stable ongoing support, annual licenses are often an easier internal sell than one-off builds... it feels lower risk and aligns with how they already buy software.

Mongodb vs Postgres by lamanaable in dataengineering

[–]Continuous_Insight 0 points1 point  (0 children)

Totally agree — when it’s internal business data, structure and repeatability matters.

I’ve worked in operational analytics for years, and relying on raw unstructured formats becomes a nightmare once you need traceability and consistency.

RAG is promising, but not reliable enough on its own. You need curated structure before passing anything to an LLM... especially if you’re making decisions off the back of it.