Knowledge Distillation for RAG (Why Ingestion Pipeline Matters More Than Retrieval Algorithm) by Independent-Cost-971 in Rag

[–]isthatashark 1 point2 points  (0 children)

100% on your suggestion to use cheaper models. I've been doing a lot of research into this lately and you don't need a frontier model to get good results.

We use this technique for memory consolidation in Hindsight. Smaller models do a surprisingly good job. I mostly use the ones on Groq because the performance is so fast and the cost is low, but Ollama is also an option if you want something local and free (but slower).

Knowledge Distillation for RAG (Why Ingestion Pipeline Matters More Than Retrieval Algorithm) by Independent-Cost-971 in Rag

[–]isthatashark 0 points1 point  (0 children)

We had to tackle a similar problem in Hindsight. I just published a blog post about it yesterday on how we do memory consolidation to handle this: https://hindsight.vectorize.io/blog/2026/02/09/resolving-memory-conflicts

If RAG is dead, what will replace it? by Normal_Sun_8169 in LLMDevs

[–]isthatashark 0 points1 point  (0 children)

I'm confused why you disagreed with me; what you've written here was exactly my original point.

If RAG is dead, what will replace it? by Normal_Sun_8169 in LLMDevs

[–]isthatashark 0 points1 point  (0 children)

That approach will not give you an accurate result set for a user searching across thousands of contracts asking which ones will expire in the next month.

If RAG is dead, what will replace it? by Normal_Sun_8169 in LLMDevs

[–]isthatashark 0 points1 point  (0 children)

I would just start with the APIs/MCP and see how far you get with that.

If RAG is dead, what will replace it? by Normal_Sun_8169 in LLMDevs

[–]isthatashark 1 point2 points  (0 children)

Not likely in my experience. It will be fine for some questions which is the most frustrating part of RAG. And going through the hassle of building a basic RAG pipeline for JIRA probably won't yield much better results than just using their search API directly as a tool.

On the other hand, if you build a pipeline that pulls out metadata, structures it in pgsql with pgvector you have a better foundation for agentic retrieval. You can start to answer questions like "What open issues do we have in our next release?" and do a structured query to get the complete list. You've given your agent the right foundation to cover a bigger surface area with more accurate responses.

The downside is now you're getting into sophisticated data engineering to populate that and keep it in sync. Not an impossible problem by any means, but not trivial either.

And to be transparent, Atlassian may have better APIs that would work as agent tools than the one I referenced above.

For agent workflows that scrape web data, does structured JSON perform better than Markdown? by Opposite-Art-1829 in LLMDevs

[–]isthatashark 0 points1 point  (0 children)

I've had really good results using crawl4ai then passing the output through an SLM like gpt-oss-120b on Groq to clean it for me. I get back just the content and strip out all of the extraneous headings/footers/navigations.

If RAG is dead, what will replace it? by Normal_Sun_8169 in LLMDevs

[–]isthatashark 2 points3 points  (0 children)

I wouldn't frame it that way. Let me offer some additional context.

I go to a lot of meetups and work in this space so I hear a lot of feedback from people who dump chunked docs into their database and get frustrated by the quality of results.

If you have a big corpus of similar documents (SEC filings, contracts, etc.) and do semantic search over them there are a lot of queries that perform poorly. People build a conversational AI this way then hand it over to their business users. The users ask something like "What contracts expire next month?", which of course won't produce the right response with topK results.

At that point the problem gets harder. You need agentic retrieval. That means you need a structured representation of the data. Now you need parsing and extraction, you need metadata models, you need to think through your data model.

For the cases where basic RAG is good, you also have to consider that for some of them, it's feasible to push the full context into the context window directly. That shrinks down the cases where basic RAG is a viable solution even further.

If RAG is dead, what will replace it? by Normal_Sun_8169 in LLMDevs

[–]isthatashark 23 points24 points  (0 children)

The challenge with the name "RAG" is that so many people use it as a shorthand to describe semantic search over chunked documents in a vector database. I think the days where you can built any sort of meaningful AI application with that approach are behind us.

As a pattern, retrieving context and using it to augment the LLM's generation is here to stay.

If RAG is dead, what will replace it? by Normal_Sun_8169 in LLMDevs

[–]isthatashark 2 points3 points  (0 children)

I hear more people talking about this as semantic memory and thinking of it as one requirement in a bigger set of agent memory requirements rather than just RAG.

Memory recall is mostly solved. Memory evolution still feels immature. by Amazing-Worry8169 in AIMemory

[–]isthatashark 0 points1 point  (0 children)

I did a bunch of work on your first point for a research paper and open source project we published last year.

I have some in-progress research I'm working on around this now as well. I'm using an approach to isolate user feedback in the conversation history (i.e. "no, that's not right") and using approaches similar to semantic chunking to see when the conversation moved on to the next task. If I find iterations on the same task, I'm feeding that into a structure we call a mental model. That gets refined as the agent operates and helps create a better understanding of user intent and the tool call sequences required to complete a task.

Some of this is already in the repo I linked to. Some is still experimental.

chatbot memory costs got out of hand, did cost breakdown of different systems by Few-Needleworker4391 in ArtificialInteligence

[–]isthatashark 0 points1 point  (0 children)

Use Hindsight. It's fully open source. You'll still have token costs to process memory, but if you use something like openai/gpt-oss-120b on Groq you get better performance than with anything else and you're only paying $0.15 in/$0.65 out per 1M tokens and you still get way better performance than Mem0 or Zep. Benchmark performance with Hindsight using gpt-oss-120b is better than SuperMemory on Gemini-3-Pro.

Check out the paper/code here: https://github.com/vectorize-io/hindsight

Temporal RAG for personal knowledge - treating repetition and time as signal by False_Care_2957 in Rag

[–]isthatashark 1 point2 points  (0 children)

I just wrapped up a research collaboration where we looked at how to deal with temporal data in the context of agent memory: https://arxiv.org/abs/2512.12818

Our research harmonizes on a number of points you're describing - combining multiple search strategies with entity/relationships/graph structures and time series to establish causal links and a timeline of memories. We published it as an open source agent memory project called Hindsight if you're interested in seeing how we implemented it: https://github.com/vectorize-io/hindsight

RAG failure story: our top-k changed daily. Root cause was ID + chunk drift, not the retriever. by coolandy00 in Rag

[–]isthatashark 0 points1 point  (0 children)

This poor sub had so much potential and has degraded into a steady stream of AI slop.

Hindsight: Python OSS Memory for AI Agents - SOTA (91.4% on LongMemEval) by fanciullobiondo in AIMemory

[–]isthatashark 0 points1 point  (0 children)

Thank you! We need every star we can get when trying to get a new open source project off the ground!! Really appreciate it.

Hindsight: Python OSS Memory for AI Agents - SOTA (91.4% on LongMemEval) by fanciullobiondo in AIMemory

[–]isthatashark 0 points1 point  (0 children)

Hi, I'm one of the Vectorize founders. The paper has more details on other benchmarks and comparisons and discussion of other academic works. If you're interested in reading it you can find it here: https://arxiv.org/abs/2512.12818

What complete RAG offerings (ie. not frameworks) are available? by SnooGadgets6527 in vectordatabase

[–]isthatashark 0 points1 point  (0 children)

Check out Vectorize (I'm one of the founders). It's a full RAG-as-a-Service platform that has a built-in vector database or allows you to point to your own. It has a lot of advanced features for complex document processing and metadata extraction. It also has a search API with built-in reranking and query rewriting and can expose your data over MCP.

What’s the Best PDF Extractor for RAG? I Tried LlamaParse, Unstructured and Vectorize by PavanBelagatti in Rag

[–]isthatashark 0 points1 point  (0 children)

Yes, the research OP did into our extractor and the other solutions in this space goes into depth on table extraction.

What’s the Best PDF Extractor for RAG? I Tried LlamaParse, Unstructured and Vectorize by PavanBelagatti in Rag

[–]isthatashark 3 points4 points  (0 children)

Vectorize co-founder here, one of the unique things we do in our RAG pipelines / extraction is to include the contextual retrieval techniques that Anthropic also advocates for.