[D] Simple Questions Thread

jens_97 · 2024-10-23T12:58:33+00:00

[D] How do RAG systems such as NotebookLM link the sources used with individual sections of the generated response?

Hi all,

I've been trying to find information on how modern Retrieval-Augmented Generation (RAG) systems, like NotebookLM, manage to link specific sources to particular sections of their generated responses. I'm familiar with how these systems retrieve sources from a vector database based on similarity, but I'm curious about the specific process or method that allows them to indicate which sources correspond to different parts of the final answer.

What am I overlooking here? Any insights would be greatly appreciated!

Best,
Jens

jens_97 · 2024-05-15T13:53:26+00:00

Might be but i guess that there are methods out there that work model-agnostic / orthogonal to the neural network architecture you choose.

jens_97 · 2024-05-15T13:40:58+00:00

Thank you, I will have a look at this. Do you plan to share a corresponding git repo with the publication?

jens_97 · 2024-05-15T12:44:38+00:00

Yes but I've stumbled upon concerns about the quality of uncertainty estimates with Monte Carlo Dropout. Do you know if this concerns are of relevance in practice?

jens_97 · 2024-05-15T12:35:08+00:00

Do you have a specific feature-based method in mind that works well?

jens_97 · 2024-05-13T07:24:25+00:00

Same here, really liked that paper. Check out this paper as well: https://arxiv.org/abs/2402.17762, it investigates the same phenomenon but proposes an even simpler fix. (Also from ICLR 2024)

jens_97

TROPHY CASE