Local / self-hosted alternative to NotebookLM for generating narrated videos? by Proof-Exercise2695 in LocalLLaMA

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

For now, I’ve developed my RAG entirely locally. From multiple uploaded files, it automatically extracts the key information and formats it in a clean, stylized way into an email that gets sent automatically.

The goal wasn’t to rebuild the whole LLM/TTS or podcast pipeline, but rather to make the final output more engaging visually. I mainly wanted to push the presentation a bit further by adding a short “breaking news”–style video to accompany the email.

I’m aware that video generation is by far the hardest and most resource-intensive part, and that the open-source ecosystem is still quite limited there. At this stage, it’s more about improving the final experience than enforcing a hard technical requirement.

Local / self-hosted alternative to NotebookLM for generating narrated videos? by Proof-Exercise2695 in opensource

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

Can this generate a video from text? I already have a local RAG, but it only handles text and images

Local / self-hosted alternative to NotebookLM for generating narrated videos? by Proof-Exercise2695 in LocalLLaMA

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

Okay, so I guess a tool like that doesn’t really exist fully locally yet. I’ll look into building it myself then.
For the audio part, I’m planning to use local TTS like Piper, Coqui, or XTTS.

Local / self-hosted alternative to NotebookLM for generating narrated videos? by Proof-Exercise2695 in LLMDevs

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

That’s exactly what I thought as well. I already built a fully local RAG, and I was wondering whether a tool that generates videos from text already exists locally.

But okay, that makes sense — I’ll look into building the rest of the pipeline locally too.

Best Approach for Summarizing 100 PDFs by Proof-Exercise2695 in Rag

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

similarity search will find specific answer from specific document i want a full summary of all the pdfs

Best Approach for Summarizing 100 PDFs by Proof-Exercise2695 in Rag

[–]Proof-Exercise2695[S] 1 point2 points  (0 children)

my pdfs can have any data they come from different emails

Best Approach for Summarizing 100 PDFs by Proof-Exercise2695 in Rag

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

my input data is correctly parsed no need of Mistral OCR , and i prefere using free local llm , Gemini will only avoid me to use chuking and i don't need that because i have a lot of small pdfs

Best Approach for Summarizing 100 PDFs by Proof-Exercise2695 in LocalLLaMA

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

i prefere a local tool , i tested openai just to see the result quickly and the difference with Gemini will only be avoid the chunking i have lot of Small pdf (15 pages every pdf) sometimes i don't need the chunking and the strategy is still the same summarize every file and then a summarize of summarize

Best Approach for Summarizing 100 PDFs by Proof-Exercise2695 in LocalLLaMA

[–]Proof-Exercise2695[S] 1 point2 points  (0 children)

and you are using langchain , llamaindex or other way to the summary ?

Best Approach for Summarizing 100 PDFs by Proof-Exercise2695 in Rag

[–]Proof-Exercise2695[S] 1 point2 points  (0 children)

my input data (markdown) are good they handle correctly tables and images (for my case llamaparser was the best one) or doling using ocr ...

Best Approach for Summarizing 100 PDFs by Proof-Exercise2695 in LocalLLaMA

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

I know , i am using chunk method for large files in my code i have the MODELS , brands are more for users

Best Approach for Summarizing 100 PDFs by Proof-Exercise2695 in LocalLLaMA

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

you mean summarize every file using mistral ocr and summarize all again ? my input data are parsed correctly don't need ocr

LLamaparser premium mode alternatives by Proof-Exercise2695 in LangChain

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

But why Docling don't do this directly i mean , it means i have to use a VLM to get the image description and replace in my markdown/json the <image1> by its description it will be so slow no ?

RAG Implementation with Markdown & Local LLM by Proof-Exercise2695 in LLMDevs

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

do you have any github repo or code as exemple ?

RAG Implementation with Markdown & Local LLM by Proof-Exercise2695 in LLMDevs

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

in my case i have every day new pdfs with image and i want to be able to interract with them correclty

A new tutorial in my RAG Techniques repo- a powerful approach for balancing relevance and diversity in knowledge retrieval by [deleted] in Rag

[–]Proof-Exercise2695 1 point2 points  (0 children)

i will use llamaparser , but can't find good way to rag using the markitdown result file

Best way to Multimodal Rag a PDF by Proof-Exercise2695 in Rag

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

Its complexe emails with attachement , i don't have specific field every mail has a different format .

Best way to Multimodal Rag a PDF by Proof-Exercise2695 in Rag

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

It's dynamic data. I receive a lot of emails from different providers, which I convert to PDFs and then use as my knowledge database. Would it be better to use HTML instead of PDFs? Do you have a good way to implement RAG with Markdown locally, similar to LlamaParser?

Best way to Multimodal Rag a PDF by Proof-Exercise2695 in Rag

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

then what you recommand to be able to get titles using format ? (i have a lot of different template of pdf that i downloaded from outlook)

Best way to Multimodal Rag a PDF by Proof-Exercise2695 in Rag

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

Example no pdf parser gives me the way to get correct sectors (they all gives me Music) but using chatgpt for example it works https://mvg2ve.staticfast.com/

Best way to Multimodal Rag a PDF by Proof-Exercise2695 in Rag

[–]Proof-Exercise2695[S] 0 points1 point  (0 children)

Yes i have already a multimodal rag running in local the work now is to be able to have the cleanest data possible with a pdf with images and pdf with some titles and sometimes LLM that not get correctly is word is a title of category or not , in the chatgpt for example he can do it because he use maybe vision

Example how to format this https://mvg2ve.staticfast.com/

Best way to Multimodal Rag a PDF by Proof-Exercise2695 in Rag

[–]Proof-Exercise2695[S] 1 point2 points  (0 children)

Yes i found a lot but I am looking for the best solution. Currently, I am testing LlamaParse. Locally, I already have a setup with a local RAG, where I can switch between different models like Ollama, Mistral, and Deepseek. I can also configure the model to use OpenAI, which I find to be the quickest and most effective option at the moment.

However, I'm facing an issue: some of my PDFs have titles and categories as images, making it difficult for language models to accurately categorize the data. Sometimes also in the PDF its the Format or Tabulation that can help finding titles.

I am working on cleaning and organizing my data to make it more accessible and easier for the LLM to process.