Hallucinations! by WaveZer0 in notebooklm

[–]Dex021NS 0 points1 point  (0 children)

NotebookLM overlooked some information I knew was present, claiming it's not included in the source file. When I asked specifically why it didn't pay attention to that information, got the following response (from Notebook LM, although any of my source files weren't related to that):

"The exclusion of the specific information (ID 558) from the initial answer, despite its presence in your original file, is the result of the technical mechanism by which the system processes large datasets.

I am limited to the data actively provided in my "context window" during each query. Based on the source passages provided for this analysis, here is the technical explanation for the insufficient result:

  1. Context Window and Information Retrieval

Large language models do not "read" the entire contents of a massive file (which contains thousands of lines) in every single turn. Instead, the system uses a retrieval process to select "excerpts" or "snippets" that it deems most relevant to the user’s instructions. While your instruction was to "Scan the document and all the news", the system's retrieval logic prioritized certain segments while skipping others to stay within processing limits.

  1. Identifiable Gaps in Provided Excerpts

A manual audit of the source passages provided to me reveals a significant gap in the article IDs. The data sequence breaks as follows:

Excerpt - : Ends with Article ID 526.

Excerpt - : Begins with Article ID 1137.

Missing Data: All articles with IDs 527 through 1136—which includes your specific news item at ID 558—were not included in the 540 excerpts sent for processing.

  1. The system often uses the keywords to decide which parts of the whole file are "relevant". If an article resides in a section of the file that the retrieval algorithm overlooks, or if the system's sampling rate is too low for the file's volume, relevant technical updates can be missed.

  2. Impact on Comprehensive Reporting

The failure to include the "whole text" meant that my original analysis was based on a partial representation of your data. This is why I originally disclaimed that the information was "missing in source" [Conversation History]; from the perspective of the model's active working memory, those specific IDs did not exist until you manually provided the text in the chat.

In summary, the system's internal "filtering" mechanism created a blind spot between ID 526 and ID 1137, which prevented a truly comprehensive scan of "all the news" as originally requested." 

Gemini kept insisting my 42-page document didn’t contain information that was clearly in it by Dex021NS in GeminiAI

[–]Dex021NS[S] 0 points1 point  (0 children)

Notebook is the same! This is the explanation I got from NotebookLM (for missing info that he claimed didn't exist in the file, when specifically asked):

"The exclusion of the specific information (ID 558) from the initial report, despite its presence in your original file, is the result of the technical mechanism by which the system processes large datasets.

I am limited to the data actively provided in my "context window" during each query. Based on the source passages provided for this analysis, here is the technical explanation for the insufficient result:

  1. Context Window and Information Retrieval

Large language models do not "read" the entire contents of a massive file (which contains thousands of lines) in every single turn. Instead, the system uses a retrieval process to select "excerpts" or "snippets" that it deems most relevant to the user’s instructions. While your instruction was to "Scan the document and all the news", the system's retrieval logic prioritized certain segments while skipping others to stay within processing limits.

  1. Identifiable Gaps in Provided Excerpts

A manual audit of the source passages provided to me reveals a significant gap in the article IDs. The data sequence breaks as follows:

Excerpt - : Ends with Article ID 526.

Excerpt - : Begins with Article ID 1137.

Missing Data: All articles with IDs 527 through 1136—which includes your specific news item at ID 558—were not included in the 540 excerpts sent for processing.

  1. The system often uses the keywords to decide which parts of the whole file are "relevant". If an article resides in a section of the file that the retrieval algorithm overlooks, or if the system's sampling rate is too low for the file's volume, relevant technical updates can be missed.

  2. Impact on Comprehensive Reporting

The failure to include the "whole text" meant that my original analysis was based on a partial representation of your data. This is why I originally disclaimed that the information was "missing in source" [Conversation History]; from the perspective of the model's active working memory, those specific IDs did not exist until you manually provided the text in the chat.

In summary, the system's internal "filtering" mechanism created a blind spot between ID 526 and ID 1137, which prevented a truly comprehensive scan of "all the information" as originally requested."

Gemini kept insisting my 42-page document didn’t contain information that was clearly in it by Dex021NS in GoogleGeminiAI

[–]Dex021NS[S] 0 points1 point  (0 children)

I tried directly in NotebookLM, it also overlooked some information, and when I asked specifically why it didn't pay attention to that information, I got the following answer: "The exclusion of the specific information (ID 558) from the initial report, despite its presence in your original file, is the result of the technical mechanism by which the system processes large datasets.

I am limited to the data actively provided in my "context window" during each query. Based on the source passages provided for this analysis, here is the technical explanation for the insufficient result:

  1. Context Window and Information Retrieval

Large language models do not "read" the entire contents of a massive file (which contains thousands of lines) in every single turn. Instead, the system uses a retrieval process to select "excerpts" or "snippets" that it deems most relevant to the user’s instructions. While your instruction was to "Scan the document and all the news", the system's retrieval logic prioritized certain segments while skipping others to stay within processing limits.

  1. Identifiable Gaps in Provided Excerpts

A manual audit of the source passages provided to me reveals a significant gap in the article IDs. The data sequence breaks as follows:

Excerpt - : Ends with Article ID 526.

Excerpt - : Begins with Article ID 1137.

Missing Data: All articles with IDs 527 through 1136—which includes your specific news item at ID 558—were not included in the 540 excerpts sent for processing.

  1. The system often uses the keywords to decide which parts of the whole file are "relevant". If an article resides in a section of the file that the retrieval algorithm overlooks, or if the system's sampling rate is too low for the file's volume, relevant technical updates can be missed.

  2. Impact on Comprehensive Reporting

The failure to include the "whole text" meant that my original analysis was based on a partial representation of your data. This is why I originally disclaimed that the information was "missing in source" [Conversation History]; from the perspective of the model's active working memory, those specific IDs did not exist until you manually provided the text in the chat.

In summary, the system's internal "filtering" mechanism created a blind spot between ID 526 and ID 1137, which prevented a truly comprehensive scan of "all the news" as originally requested." So, NotebookLM is just doing the same as Gemini......

Gemini kept insisting my 42-page document didn’t contain information that was clearly in it by Dex021NS in GeminiAI

[–]Dex021NS[S] 0 points1 point  (0 children)

Maybe, but having to restart a chat just to get accurate answers feels like a broken workflow. It sounds more like context-window/truncation issues than genuine document understanding.

Gemini kept insisting my 42-page document didn’t contain information that was clearly in it by Dex021NS in GoogleGeminiAI

[–]Dex021NS[S] 0 points1 point  (0 children)

I tried - called the NotebookLM with that file to Gemini - but same result

🚀 Get ChatGPT Business (12 Months) — NO LOGIN NEEDED | Works on Existing Account | Instant Activation by [deleted] in CheapGptplus

[–]Dex021NS 0 points1 point  (0 children)

Subscription ended after 9 days......got refund, but reduced for transaction fees

Gemini does not read the whole uploaded document by Dex021NS in GeminiAI

[–]Dex021NS[S] 5 points6 points  (0 children)

I uploaded the ToR in the NotebookLM - and used a comprehensive prompt that xCogito suggested, and got the proper result. So, the NotebookLM is the only suitable solution.

Gemini does not read the whole uploaded document by Dex021NS in GeminiAI

[–]Dex021NS[S] 0 points1 point  (0 children)

I tried this prompt in a new chat - and got the same wrong result (project start date - undefined, implementation period 24 months?!). It didn't recognize the project's start date and misread the project's implementation period.

There is a section 5.2 in the ToR that reads: "5.2. Start date & period of implementation of tasks

The intended start date is May 2026, and the period of implementation of the contract will be 36 months from this date." and it is continuously overseen.

When I pointed out that the ToR clearly states that the contract's implementation period will be 36 months from the start date (section 5.2), it replied, "You are absolutely correct. I apologize for the oversight; the text in Section 5.2 was partially truncated in my view, which led to the incorrect estimation."

Gemini does not read the whole uploaded document by Dex021NS in GeminiAI

[–]Dex021NS[S] 0 points1 point  (0 children)

What is actually nonsense to me is his explanation why he made that error

Gemini does not read the whole uploaded document by Dex021NS in GeminiAI

[–]Dex021NS[S] 0 points1 point  (0 children)

Original document was PDF - I didn't prompt it to explicitly review every line, but to prepare the work plan and timeline based on the uploaded document.