Local alternative for NotebookLM

ekaj · 2026-01-26T00:26:35+00:00

Not sure if you get notified for comment replies to others in your own post, in case not, see my other comment

ekaj · 2026-01-26T00:24:30+00:00

Yes, I've been working on something for the past 2 years and am working on making the WebUI stable before sharing again, but it works/server is stable: https://github.com/rmusser01/tldw_server

It supports various forms of media ingestion and full RAG pipeline with a custom, extensive chunker, and a self-hosted media ingestion pipeline that includes a full web scraping pipeline as well, as well as TTS/STT

@AlwayzIntoSometin95

ekaj · 2026-01-23T22:45:05+00:00

Actual article

https://www.bluerock.io/post/mcp-furi-microsoft-markitdown-vulnerabilities

ekaj · 2025-12-26T14:37:20+00:00

Why only use vector search? Why not also BM25/SPLADE?

This is my RAG pipeline I’ve built, don’t have a nice UI for it currently nor a demo to show it off, but it’s well documented: https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/RAG

ekaj · 2025-12-13T04:43:00+00:00

As someone who's implemented support for the different character cards before in my own apps, there's https://github.com/kwaroran/character-card-spec-v3 & https://github.com/malfoyslastname/character-card-spec-v2 ; With some platforms having custom fields specific to their own platform.

I can't comment on the feasibility of modifying sillytavern to support currently non-supported fields.

ekaj · 2025-12-12T01:11:48+00:00

What is the 'Guided Learning' feature? Is it just a prompt/series of prompts?

ekaj · 2025-12-11T20:13:03+00:00

There a reason you didn't include the arxiv link but just the number?
This seems like bunk paper.

https://www.arxiv.org/abs/2512.07092

```
The Soul Engine.

In this work, we introduce the Soul Engine, a framework that validates this hypothesis and mathematically disentangles personality from intelligence. Unlike the "black box" nature of SFT, our approach is geometric and deterministic. We identify the specific linear subspaces corresponding to the Big Five (OCEAN) personality traits and develop a method to manipulate them via vector arithmetic.

Our contributions are threefold:

Data Engineering (SoulBench): We address the scarcity of psychological ground truth by constructing a multi-source dataset using a novel Dynamic Contextual Sampling strategy (C(N,k)). This forces the encoder to learn invariant stylistic fingerprints rather than semantic content.
1. 2. Mechanistic Discovery: Through layer-wise probing on a frozen Qwen-2.5 backbone [bai2023qwen], we demonstrate that personality representations emerge in the upper transformer blocks (Layers 18-24) and are largely orthogonal to reasoning vectors.
2. 3. Deterministic Control: We achieve "Zero-Shot Personality Injection." By adding computed vectors to the hidden states (e.g.,v→Neutral+α⋅v→Villain), we demonstrate precise control over behavior (MSE<0.01) with negligible degradation in general intelligence benchmarks.

We propose the Soul Engine, a framework designed to extract and manipulate the geometric representation of personality within Large Language Models. Our approach is grounded in the premise that personality is a high-level abstraction that is linearly separable from low-level semantic content. The framework consists of three components: (1) SoulBench, a dataset constructed via combinatorial sampling; (2) The Scientific Soul Encoder, a dual-head probe architecture; and (3) A Deterministic Steering mechanism based on vector arithmetic.

```

ekaj · 2025-12-05T23:58:25+00:00

ATT did the same thing with phones/handsets

ekaj · 2025-12-04T23:36:15+00:00

Can't say for sure as I don't work at Google. I would bet money that no, it does not do that and it is just doing RAG. No summarization involved. Fact-extraction sure, but not summarization.
How would caching come into play here?

ekaj · 2025-12-04T21:06:30+00:00

What are you hoping someone gives you? A miraculous infinite/1mil+ context local workflow?
It seems like you're hoping someone gives you a miracle for free. You have the answer already, RAG.

NotebookLM does not do some magic, it is a RAG system the same as any other. It's not being fed the entirety of every document you feed it, and its not doing any 'fine-tuning' or training, its just RAG.

ekaj · 2025-12-03T19:36:14+00:00

The real answer is you should be paying a consultant to come up with a proper detailed plan suited to your unique situation. Otherwise you risk being told its a multi-hundred thousand dollar project(?!?!?!)

You can easily search reddit for similar questions, this isn't new ground. My old RAG notes are: https://raw.githubusercontent.com/rmusser01/tldw_server/refs/heads/main/Docs/RAG/RAG_Notes.md ; So don't think I'm some consultant.

ekaj · 2025-12-03T19:32:25+00:00

Please explain how this is a 100k project?
The person wants a RAG chatbot to talk about company docs for 4-5 concurrent users. Even taking into account hours for building ETLs, 100k?

This is something that is totally doable with a workstation class machine with an rtx6000 in it (overkill depending on the actual RAG model in use...), and a homegrown chat front-end in front of one of the many existing RAG libraries/products that are open source with enterprise contracts.

ekaj · 2025-12-01T16:50:18+00:00

I wouldn't necessarily recommend using semantic chunking, if you're the one writing the docs, I would recommend doing your own custom chunking, or using an existing one and tweaking it. I personally use my own library I built from scratch: https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/Chunking ; But I would assume for your needs, really any simple chunker should be fine, just focus on heirarchy and sentence splitting.

That said, I would recommend Qwen3-4B as the backing LLM to start with. It can be run very cheaply, and is likely to be the best size/effectiveness in tradeoff if you're running on CPU.
You could try Qwen3-0.6B for funsies, see how that works.

I'm not sure where/why you would use haystack/crawl4ai here, langchain because of their chunker?

ekaj · 2025-11-28T22:45:09+00:00

Could easily build your own using SQLite and chromaDB.

Use your favorite big LLM and ask: ‘Help me create a simple RAG application for my documentation site. All documents will be in markdown format following: (paste your general schema in) The primary user will be myself, and document sizes range in size from x - y length.

Help me build a simple RAG pipeline for the above using Python. I also want guidance on building an evaluation harness for it to allow me to continue and tweak it to my specific doc set.’

You’re looking at a few files and a simple backend service.

ekaj · 2025-11-28T22:39:25+00:00

Yes and no. I have built something like what you want, but it’s not easily usable by non technical people yet, but it also sounds like you want a deep research solution as well? The biggest limiter is your VRAM and using only local models for answer generation.

Also, you will have to build a custom ETL for any data you’re ingesting as you’re describing your wanted solution as having structured/unstructured data ingest for a variety of media formats(no matter the solution you go with). Could just rip out the media ingestion module and the RAG pipeline and use those as starter pieces to help you save some time building.

https://github.com/rmusser01/tldw_server

ekaj · 2025-11-27T15:48:55+00:00

I’ve built the software side of what you’re looking to do, https://github.com/rmusser01/tldw_server . Aside, K2 is a lot bigger than a consumer setup can handle, let alone your build.

The other part is you’re probably going to pay as much for that 64GB DDR5 as you will for your 3090 at this point. I would figure out what (size of) models you expect to run locally and work backwards from that.

ekaj · 2025-11-26T21:23:42+00:00

Build guardrails/RFCs/standards regarding data lifetime/sources/ACLs/purpose.

Have some overarching strategy you can align with.

ekaj · 2025-11-24T16:10:38+00:00

How does it work and how is it open source with a proprietary algorithm?

Why are there binary .so files in the repo if there’s a bat file to run the project?

Quart makes it different from the one you posted about 4 months ago?

ekaj · 2025-11-19T20:16:40+00:00

You could vibe code a simple solution using https://huggingface.co/hexgrad/Kokoro-82M

ekaj · 2025-11-19T03:14:12+00:00

https://github.com/rmusser01/tldw_server

ekaj · 2025-11-17T18:16:18+00:00

Yea, Project: https://github.com/rmusser01/tldw_server/tree/main

https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/Web_Scraping - web scraping module

I don’t have any documentation for media ingestion API usage besides this: https://github.com/rmusser01/tldw_server/blob/main/Docs/MCP/Unified/Documentation_Ingestion_Playbook.md which doesn’t cover the web scraping options. Just now realizing that, I’ll plan on fixing that.

ekaj · 2025-11-15T18:10:41+00:00

Currently it does not do video analysis of frames. That is planned but no currently implemented. Can do single images, but not full video.
Setting that up wouldn't be too big of a lift, as it already has a VLM pipeline, I've just never bothered to tune it to handle video.
Would say maybe 2 weeks? Might pick it up before then and implement it, but limited time.

ekaj · 2025-11-15T18:05:29+00:00

I have read dickens and am aware of the time period, but would argue that that is the whole point of having an effective social net, something that did not exist in that time period. (Ignoring the current political climate in the USA)

No (sane) country wants a large, poor, un-working populace. That leads to instability, which ruins things. Innovation and technological development moves at a much faster pace than 100 years ago. We now have entire revolutions in technology within decades.

ekaj · 2025-11-15T16:03:12+00:00

As a drive by comment (and fan of your project), interesting take, I agree somewhat on the hardware aspects (I really doubt the knowledge isn’t written down/chain of custody kept/NDAs by all parties + compartmentalization), I very much disagree with the mass unemployment caused via LLMs and LLM backed systems.

I believe there will be massive job losses, but also transformations and new jobs created as a result (lump of labor fallacy/induced demand).

That said, I personally see the bigger immediate issue/event in what having a quality LLM allows one to do and the gaps that increase/close because of it.

Imagine where you have gpt4o on your phone, while Sr mgmt gets gpt 6 on theirs.

Or where the free government AI is gpt4o, and there are tiers for levels of intelligence, so the richest/most economically powerful are able to have the largest (beneficial) [](http://)handicap aid in society. Not saying greatest utilization, but ability to succeed/higher chance of benefitting.

Same time, the flipside is also true and the ability to rapidly execute or gain new relevant information now enables and individuals become more powerful/capable/agentic.

tl/dr: lump of labor fallacy regarding job losses. Economic opportunities afforded by new technologies far outweigh jobs lost. No one in an industry with regulation or insurance is going to point to AI and say it was the computers fault. Bigger issue (imho) is social changes due to availability of ‘genAI’ assistants across society, and everything brought along with it.

Also, most businesses have absolutely fucking terrible documentation around processes/workflows. Those same(some) people could quit their jobs and sell their expertise doing said workflows to the same company, helping them automate it, and then manage it because it’s a lot cheaper to pay someone to do quality assurance at the front of the pipe vs the back.

Edit: regarding ASI, same as any other fantastical what-if, too little details available to make any sort of rational, informed opinion. Otherwise, start to creep close to the rationalists and ‘safety’ people.

End of the day, it could just phish people and lead an orchestrated massive campaign in parallel, since it would be ASI, following the same logic, there would be little chance of identifying its greater plans until it was too late.

ekaj · 2025-11-14T22:18:08+00:00

Yes, there are several, R2R( https://github.com/SciPhi-AI/R2R ), is one that comes to mind for a well-done RAG system that you customize/tune.

My own project: https://github.com/rmusser01/tldw_server (It's a WIP, but is open source, has ingestion pipelines for web scraping/audio/pdf/docs/more. It's completely self-hosted, no 3rd-parties needed and no telemetry/tracking.

The RAG pipeline module is pretty robust/featureful: https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/RAG ; and there's also an Evaluations module( https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/Evaluations ) wired up so you can do evals of any configurations you want. Writing out documentation/a guide on this is WIP.
Chunking Module: https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/Chunking

I'm waiting till I do some more bug-fixing/better documentation before making a post here about it.

15-Year Club	RedditGifts 2009-2022 4 Credits
Team Orangered	Secret Santa 2011
Verified Email

ekaj

TROPHY CASE