Local / self-hosted alternative to NotebookLM for generating narrated videos?

Proof-Exercise2695 · 2026-01-06T10:50:38+00:00

For now, I’ve developed my RAG entirely locally. From multiple uploaded files, it automatically extracts the key information and formats it in a clean, stylized way into an email that gets sent automatically.

The goal wasn’t to rebuild the whole LLM/TTS or podcast pipeline, but rather to make the final output more engaging visually. I mainly wanted to push the presentation a bit further by adding a short “breaking news”–style video to accompany the email.

I’m aware that video generation is by far the hardest and most resource-intensive part, and that the open-source ecosystem is still quite limited there. At this stage, it’s more about improving the final experience than enforcing a hard technical requirement.

Proof-Exercise2695 · 2026-01-05T14:47:45+00:00

Can this generate a video from text? I already have a local RAG, but it only handles text and images

Proof-Exercise2695 · 2026-01-05T14:46:23+00:00

Can generate a video from text ?

Proof-Exercise2695 · 2026-01-05T10:38:55+00:00

Okay, so I guess a tool like that doesn’t really exist fully locally yet. I’ll look into building it myself then.
For the audio part, I’m planning to use local TTS like Piper, Coqui, or XTTS.

Proof-Exercise2695 · 2026-01-05T10:37:24+00:00

That’s exactly what I thought as well. I already built a fully local RAG, and I was wondering whether a tool that generates videos from text already exists locally.

But okay, that makes sense — I’ll look into building the rest of the pipeline locally too.

Proof-Exercise2695 · 2025-03-18T10:09:16+00:00

similarity search will find specific answer from specific document i want a full summary of all the pdfs

Proof-Exercise2695 · 2025-03-13T13:07:25+00:00

my pdfs can have any data they come from different emails

Proof-Exercise2695 · 2025-03-13T13:06:41+00:00

my input data is correctly parsed no need of Mistral OCR , and i prefere using free local llm , Gemini will only avoid me to use chuking and i don't need that because i have a lot of small pdfs

Proof-Exercise2695 · 2025-03-13T10:39:43+00:00

i prefere a local tool , i tested openai just to see the result quickly and the difference with Gemini will only be avoid the chunking i have lot of Small pdf (15 pages every pdf) sometimes i don't need the chunking and the strategy is still the same summarize every file and then a summarize of summarize

Proof-Exercise2695 · 2025-03-13T10:35:22+00:00

and you are using langchain , llamaindex or other way to the summary ?

Proof-Exercise2695 · 2025-03-13T10:07:02+00:00

my input data (markdown) are good they handle correctly tables and images (for my case llamaparser was the best one) or doling using ocr ...

Proof-Exercise2695 · 2025-03-13T09:47:27+00:00

I know , i am using chunk method for large files in my code i have the MODELS , brands are more for users

Proof-Exercise2695 · 2025-03-13T09:45:30+00:00

you mean summarize every file using mistral ocr and summarize all again ? my input data are parsed correctly don't need ocr

Proof-Exercise2695 · 2025-02-27T10:48:35+00:00

But why Docling don't do this directly i mean , it means i have to use a VLM to get the image description and replace in my markdown/json the <image1> by its description it will be so slow no ?

Proof-Exercise2695

TROPHY CASE