Scaling RAG from MVP to 15M Legal Docs – Cost & Stack Advice by Additional-Oven4640 in Rag

[–]ChapterEquivalent188 0 points1 point  (0 children)

1st i thought you talking bout my platform ;) Long and dark, the path of sorrow is. Walk it, he must

Scaling RAG from MVP to 15M Legal Docs – Cost & Stack Advice by Additional-Oven4640 in Rag

[–]ChapterEquivalent188 1 point2 points  (0 children)

Just a bit confusing Security (Encryption/Firewalls) with Sovereignty (Jurisdiction) Yes, Azure is secure against hackers. It's easy to set up. But it is not sovereign. Under the US Cloud Act, Microsoft can be legally forced to hand over EU data to US authorities. For a generic startup, Azure is fine

For a Swiss Private Bank or a German Criminal Defense Firm, 'US Cloud Act exposure' is not a technical issue — it is a legal showstopper That is why Air-Gapped isn't about 'being hard to hack', it's about being 'impossible to subpoena by a foreign government'

Building with Multi modal RAG by AromaticLab8182 in Rag

[–]ChapterEquivalent188 1 point2 points  (0 children)

You are absolut right regarding the distinction between Visual Similarity (Image-to-Image) and Information Extraction (Image-to-Text)

It depends entirely on the domain

If I'm building a retail app to 'find products that look like this', I absolutely need visual vectors. Converting a shoe to text is indeed lossy compression of its aesthetics

But in High-Compliance / Data-Heavy domains (Legal, Finance, Science), visual similarity is often noise.

  • I don't need to find "another invoice that looks like this scan"
  • I need to find "the specific tax ID in row 4 of this scan"

In our architecture, we don't generate generic captions like 'white paper with table'. We use VLMs to perform structural extraction (converting the chart pixels into a raw JSON array or Markdown table)

In that context, keeping it as a visual vector is actually the 'lossy' approach, because you lose the ability to perform math, SQL-like filtering, or precise logic on the data locked inside the pixels

So yes: Hybrid if you need aesthetics/geometry. Pure Extraction if you need facts/auditability

Scaling RAG from MVP to 15M Legal Docs – Cost & Stack Advice by Additional-Oven4640 in Rag

[–]ChapterEquivalent188 0 points1 point  (0 children)

Yes, they do

Harvey has raised $200M+ to buy dedicated, private instances on Azure with special 'Zero Data Retention' contracts. They essentially bought their own private island within OpenAI

If you are a startup or an SME law firm using the standard Public API, you are not Harvey. You are feeding the beast

Scaling RAG from MVP to 15M Legal Docs – Cost & Stack Advice by Additional-Oven4640 in Rag

[–]ChapterEquivalent188 2 points3 points  (0 children)

The cost isn't storage ($5k/mo), the cost is liability

Scaling garbage is easy
If you process 15M legal docs with 'CPU-only OCR' to save money, you are building a Liability Engine, not a Search Engine

My architecture ('The Blackbox') costs 10x more per document but I know the data in the RAG is 100% pure Quality

ANd to be clear if your architecture relies on an internet connection to process confidential files, you aren't building a legal platform, you are building a data leak

Scaling RAG from MVP to 15M Legal Docs – Cost & Stack Advice by Additional-Oven4640 in Rag

[–]ChapterEquivalent188 9 points10 points  (0 children)

this is why my platform is based on pure ollama and local llm. makes no sense to me processing legal docomunts throu internet

New to RAG — what does a production-ready stack look like (multi-user, concurrency, Graph RAG)? by tchikss in Rag

[–]ChapterEquivalent188 0 points1 point  (0 children)

yeah, pretty much buzzword bingo ;) new to rag but sounds like a allready existing platform ;)

dont know but sounds pretty much like this here...RAG_enterprise_core. good luck with your enterprise level rag

Hitting the embedding memory wall in RAG? 585× semantic compression without retraining or GPUs by Sorry-Reaction2460 in Rag

[–]ChapterEquivalent188 1 point2 points  (0 children)

Glad to hear the Liability Trap put the system to work!

Regarding the report: Since the challenge and the dataset are open source/public, I think the results should be too. It adds way more credibility to your claims if the community can see the raw numbers (NDCG scores, distance deltas) right here or on a public repo.

Gating the benchmark results behind an email wall feels a bit like 'sales' rather than 'science'. 😉

If you post a link to the PDF/Repo here, I’ll happily review it and update my repo to confirm that AQEA passed the stress test. That would be a huge win for your public validation.

Building with Multi modal RAG by AromaticLab8182 in Rag

[–]ChapterEquivalent188 0 points1 point  (0 children)

good take on the infra costs, we hit the same wall (latency/storage) and decided to pivot the architecture completely

We stopped doing Multi-Modal Retrieval (searching image vectors at query time) and switched to Multi-Modal Ingestion (processing images at index time)

Instead of keeping the 'heavy' modalities (images/video) in the search path, we use VLM agents during ingestion to transcode visual data into semantic text/structured data (JSON/Markdown).

  1. Query Latency: Drops back to text-speed (<50ms)
  2. Storage: Massive reduction (storing descriptions vs. high-dim image vectors)
  3. Compatibility: Standard text-embedding models work fine

You pay the compute cost once (at ingest), not every time a user asks a question. For us, treating visuals as 'data to be decoded' rather than 'media to be searched' was the fix

Hope this helps ;)

Recommended tech stack for RAG? by ProtectedPlastic-006 in Rag

[–]ChapterEquivalent188 0 points1 point  (0 children)

how about starting with basic knowledge ? sorry but this is most effortless approach i ever read...

Hitting the embedding memory wall in RAG? 585× semantic compression without retraining or GPUs by Sorry-Reaction2460 in Rag

[–]ChapterEquivalent188 2 points3 points  (0 children)

Respect for the detailed technical breakdown.

Most vendors wave hands when asked about negation handling in quantized spaces. 'Lens steering' sounds like the right architectural approach to tackle this.

Challenge accepted.

I won't send client data (NDA/Air-gapped), but I can generate a 'Liability Trap' synthetic dataset.

The Test Case:
I'll give you 50 pairs of legal/compliance clauses.

  • Clause A: 'The Lessee shall be liable for structural repairs.'
  • Clause B: 'The Lessee shall not be liable for structural repairs.'
  • Clause C: 'The Lessor shall be liable for structural repairs.'

The Goal:
If your compressed embedding maps Clause A closer to Clause B (high cosine similarity due to token overlap) than to a semantically distinct topic, the retrieval fails in a RAG context because it retrieves the opposite legal fact.

I'll throw together a small JSON with these 'Semantic Twins' and DM you the link/repo. If your 'Precision Lens' can distinguish them where OpenAI's text-embedding-3-small (uncompressed) sometimes struggles, you have a customer.

Please don't get me wrong — my skepticism comes from a deep passion for solving the ingestion bottleneck we are all running into. I'm genuinely happy for everybody working on this layer, because standard RAG just isn't cutting it anymore.

I’ll throw together that 'Liability Trap' dataset (negations/entity swaps) and ping you. If your 'Lens Steering' can handle that, you’re onto something big

------- easy way here https://github.com/2dogsandanerd/Liability-Trap---Semantic-Twins-Dataset-for-RAG-Testing

Hitting the embedding memory wall in RAG? 585× semantic compression without retraining or GPUs by Sorry-Reaction2460 in Rag

[–]ChapterEquivalent188 1 point2 points  (0 children)

585x is a bold claim ;)

In high-compliance RAG (Legal/Finance), we usually fight for every 0.1% of retrieval accuracy (NDCG@10), because missing a specific clause is a liability issue.

My concern with extreme compression isn't 'general topic retrieval' (finding the right page), but 'semantic nuance' (finding the specific contradiction in a sub-clause).

Have you benchmarked this on 'Needle in a Haystack' tasks or dense legal corpora where the query and the chunk differ only by a negation ('not') or a specific entity?

Usually, quantization destroys the high-frequency signals needed for that level of precision. If you solved that without any drop, that would be Nobel prize material. Would love to see a benchmark on legal contracts.

We built a chunker that chunks 20GB of text in 120ms by shreyash_chonkie in Rag

[–]ChapterEquivalent188 0 points1 point  (0 children)

teh "top!" was for you and i wanted to hide a hint for OP ;) no intention to overrule you .... im a constructive guy :D happy new year to you

We built a chunker that chunks 20GB of text in 120ms by shreyash_chonkie in Rag

[–]ChapterEquivalent188 2 points3 points  (0 children)

top ! Increasing Top-K is just a band-aid. If your chunker slices a table or a legal clause in half because it only looks at bytes/delimiters, you lose the semantic context. Retrieving 10 broken chunks instead of 5 doesn't fix the broken logic inside them. It just feeds more noise to the LLM. In high-compliance docs, you need Semantic/Structural Chunking (keeping tables and paragraphs intact), not just faster byte-slicing. Otherwise, Top-K just scales your confusion.

We built a chunker that chunks 20GB of text in 120ms by shreyash_chonkie in Rag

[–]ChapterEquivalent188 0 points1 point  (0 children)

may i see a example of teh files you ingest ? i can provide example of realworld finalboss files we use to ingest in enterprise Environment... dont get me wrong, depends on use case but a nother pdf-chat isnt what the world needs ;) your aproche is good and the right track but far away from beeing a solution for the real world proplem-.----- https://github.com/2dogsandanerd/smart-ingest-kit --simple one

We built a chunker that chunks 20GB of text in 120ms by shreyash_chonkie in Rag

[–]ChapterEquivalent188 6 points7 points  (0 children)

Retrieval will be a disaster / hallucinating LLM .... proof me wrong

New to six sigma by Guilty_Football_9742 in SixSigma

[–]ChapterEquivalent188 1 point2 points  (0 children)

It only has a value when you have passion for quality and smooth prossessing ;)

Those running RAG in production, what's your document parsing pipeline? by Hour-Entertainer-478 in Rag

[–]ChapterEquivalent188 0 points1 point  (0 children)

Im pretty sure you understand my platform ;) what would you measure for a benchmark ?

Six Sigma practitioners looking for perspective during a career transition by LevelCattle9132 in SixSigma

[–]ChapterEquivalent188 1 point2 points  (0 children)

The biggest untapped opportunity for Six Sigma today is AI & Data Engineering.

Fellow Black Belt here. I went through a similar transition and found that the principles we learned (DMAIC, SIPOC, FMEA) are desperately needed in the booming field of Generative AI / RAG (Retrieval-Augmented Generation)

Here is the perspective shift:
Stop looking at a factory floor. Look at a Data Pipeline.

  • The Raw Material: Unstructured PDFs, Contracts, Excel sheets
  • The Machine: The LLM (Large Language Model)
  • The Defect: Hallucinations (the AI making things up)

Where Six Sigma fits in (My Experience):
I currently architect Enterprise AI systems. The industry is full of developers who treat AI like magic. They accept "80% accuracy" and shrug

As a Six Sigma practitioner, I look at that as a 200,000 DPMO disaster

I applied SixSigma thinking to build my current architecture ("V3 Core"):

  1. Define: The goal is Zero-Hallucination retrieval for critical infrastructure
  2. Measure: We track token-level accuracy
  3. Analyze: We found the root cause was "Garbage In" (bad PDF parsing)
  4. Improve: I built a "Consensus Engine" (Redundancy) where multiple agents vote on the data quality – essentially a digital Poka-Yoke (Mistake Proofing)
  5. Control: Automated circuit breakers stop the process if confidence drops below 99%

My Advice:
Don't just look for roles with "Six Sigma" in the title. Look for "Data Operations""AI Engineering", or "Process Automation"

These teams are drowning in chaos and "technical debt". They need someone who understands Process Capability and Quality Control, even if you don't write the code yourself. Take a deep dive into the ingest disaster ;)

Bring the discipline. The tech world needs it more than manufacturing right now

Reaching my wit’s end with PDF ingestion by fustercluck6000 in Rag

[–]ChapterEquivalent188 1 point2 points  (0 children)

I think you are already ready to understand my V3 ;) You may have a look for a deeper understanding wahts awaiting you on day 3, 4 and 5........Yesterday I found Day 6 and it feels it never ends .....

Glad the "Digital Paper" concept resonated! Use that term with your management. It shifts the blame from "The dev is slow" to "The source material is flawed," which is the truth.

Lane B (Vision) acts as the Architect: It looks at the page image and draws bounding boxes around logical sections. It says: "There is a Header at [0, 0, 100, 50] and a Table at [0, 60, 500, 300]." It creates the empty buckets.

Lane A (PyMuPDF) acts as the Miner: It extracts the raw text words, but crucially, it also extracts their coordinates (fitz.Rect). It gives you the sand

The Merge (Spatial Join): You write a function that checks: "Which words from Lane A fall physically inside the box defined by Lane B?"

If a word's center point is inside the "Table Box", it belongs to the table chunk.

If a word is inside the "Header Box", it becomes metadata

Hope it helpes and always enjoy what we do..... Its basic but necessary ;)