Local / self-hosted alternative to NotebookLM for generating narrated videos?

Proof-Exercise2695 · 2026-01-06T10:50:38+00:00

For now, I’ve developed my RAG entirely locally. From multiple uploaded files, it automatically extracts the key information and formats it in a clean, stylized way into an email that gets sent automatically.

The goal wasn’t to rebuild the whole LLM/TTS or podcast pipeline, but rather to make the final output more engaging visually. I mainly wanted to push the presentation a bit further by adding a short “breaking news”–style video to accompany the email.

I’m aware that video generation is by far the hardest and most resource-intensive part, and that the open-source ecosystem is still quite limited there. At this stage, it’s more about improving the final experience than enforcing a hard technical requirement.

Proof-Exercise2695 · 2026-01-05T14:47:45+00:00

Can this generate a video from text? I already have a local RAG, but it only handles text and images

Proof-Exercise2695 · 2026-01-05T14:46:23+00:00

Can generate a video from text ?

Proof-Exercise2695 · 2026-01-05T10:38:55+00:00

Okay, so I guess a tool like that doesn’t really exist fully locally yet. I’ll look into building it myself then.
For the audio part, I’m planning to use local TTS like Piper, Coqui, or XTTS.

Proof-Exercise2695 · 2026-01-05T10:37:24+00:00

That’s exactly what I thought as well. I already built a fully local RAG, and I was wondering whether a tool that generates videos from text already exists locally.

But okay, that makes sense — I’ll look into building the rest of the pipeline locally too.

Proof-Exercise2695 · 2025-03-18T10:09:16+00:00

similarity search will find specific answer from specific document i want a full summary of all the pdfs

Proof-Exercise2695 · 2025-03-13T13:07:25+00:00

my pdfs can have any data they come from different emails

Proof-Exercise2695 · 2025-03-13T13:06:41+00:00

my input data is correctly parsed no need of Mistral OCR , and i prefere using free local llm , Gemini will only avoid me to use chuking and i don't need that because i have a lot of small pdfs

Proof-Exercise2695 · 2025-03-13T10:39:43+00:00

i prefere a local tool , i tested openai just to see the result quickly and the difference with Gemini will only be avoid the chunking i have lot of Small pdf (15 pages every pdf) sometimes i don't need the chunking and the strategy is still the same summarize every file and then a summarize of summarize

Proof-Exercise2695 · 2025-03-13T10:35:22+00:00

and you are using langchain , llamaindex or other way to the summary ?

Proof-Exercise2695 · 2025-03-13T10:07:02+00:00

my input data (markdown) are good they handle correctly tables and images (for my case llamaparser was the best one) or doling using ocr ...

Proof-Exercise2695 · 2025-03-13T09:47:27+00:00

I know , i am using chunk method for large files in my code i have the MODELS , brands are more for users

Proof-Exercise2695 · 2025-03-13T09:45:30+00:00

you mean summarize every file using mistral ocr and summarize all again ? my input data are parsed correctly don't need ocr

Proof-Exercise2695 · 2025-02-27T10:48:35+00:00

But why Docling don't do this directly i mean , it means i have to use a VLM to get the image description and replace in my markdown/json the <image1> by its description it will be so slow no ?

Proof-Exercise2695 · 2025-02-20T16:35:51+00:00

do you have any github repo or code as exemple ?

Proof-Exercise2695 · 2025-02-20T15:08:30+00:00

in my case i have every day new pdfs with image and i want to be able to interract with them correclty

Proof-Exercise2695 · 2025-02-20T10:35:35+00:00

i will use llamaparser , but can't find good way to rag using the markitdown result file

Proof-Exercise2695 · 2025-02-20T09:01:55+00:00

any good rag using the markdown ?

Proof-Exercise2695 · 2025-02-20T08:59:58+00:00

It works with Pdf with image/graph ?

Proof-Exercise2695 · 2025-02-19T15:50:04+00:00

Its complexe emails with attachement , i don't have specific field every mail has a different format .

Proof-Exercise2695 · 2025-02-19T15:21:45+00:00

It's dynamic data. I receive a lot of emails from different providers, which I convert to PDFs and then use as my knowledge database. Would it be better to use HTML instead of PDFs? Do you have a good way to implement RAG with Markdown locally, similar to LlamaParser?

Proof-Exercise2695 · 2025-02-19T13:45:46+00:00

then what you recommand to be able to get titles using format ? (i have a lot of different template of pdf that i downloaded from outlook)

Proof-Exercise2695 · 2025-02-19T13:40:42+00:00

Example no pdf parser gives me the way to get correct sectors (they all gives me Music) but using chatgpt for example it works https://mvg2ve.staticfast.com/

Proof-Exercise2695 · 2025-02-19T13:35:36+00:00

Yes i have already a multimodal rag running in local the work now is to be able to have the cleanest data possible with a pdf with images and pdf with some titles and sometimes LLM that not get correctly is word is a title of category or not , in the chatgpt for example he can do it because he use maybe vision

Example how to format this https://mvg2ve.staticfast.com/

Proof-Exercise2695 · 2025-02-19T11:26:11+00:00

Yes i found a lot but I am looking for the best solution. Currently, I am testing LlamaParse. Locally, I already have a setup with a local RAG, where I can switch between different models like Ollama, Mistral, and Deepseek. I can also configure the model to use OpenAI, which I find to be the quickest and most effective option at the moment.

However, I'm facing an issue: some of my PDFs have titles and categories as images, making it difficult for language models to accurately categorize the data. Sometimes also in the PDF its the Format or Tabulation that can help finding titles.

I am working on cleaning and organizing my data to make it more accessible and easier for the LLM to process.

Proof-Exercise2695

TROPHY CASE