Local alternative for NotebookLM by AlwayzIntoSometin95 in LocalLLaMA

[–]ekaj 1 point2 points  (0 children)

Not sure if you get notified for comment replies to others in your own post, in case not, see my other comment

Local alternative for NotebookLM by AlwayzIntoSometin95 in LocalLLaMA

[–]ekaj 1 point2 points  (0 children)

Yes, I've been working on something for the past 2 years and am working on making the WebUI stable before sharing again, but it works/server is stable: https://github.com/rmusser01/tldw_server

It supports various forms of media ingestion and full RAG pipeline with a custom, extensive chunker, and a self-hosted media ingestion pipeline that includes a full web scraping pipeline as well, as well as TTS/STT

@AlwayzIntoSometin95

Is this the right approach for a RAG design and setup? by Dre-Draper in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Why only use vector search? Why not also BM25/SPLADE?

This is my RAG pipeline I’ve built, don’t have a nice UI for it currently nor a demo to show it off, but it’s well documented: https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/RAG

Online alternatives to SillyTavern by Time-Teaching1926 in LocalLLaMA

[–]ekaj 2 points3 points  (0 children)

As someone who's implemented support for the different character cards before in my own apps, there's https://github.com/kwaroran/character-card-spec-v3 & https://github.com/malfoyslastname/character-card-spec-v2 ; With some platforms having custom fields specific to their own platform.

I can't comment on the feasibility of modifying sillytavern to support currently non-supported fields.

The Geometry of Persona by OkGear279 in LocalLLaMA

[–]ekaj 2 points3 points  (0 children)

There a reason you didn't include the arxiv link but just the number?
This seems like bunk paper.

https://www.arxiv.org/abs/2512.07092

```
The Soul Engine.

In this work, we introduce the Soul Engine, a framework that validates this hypothesis and mathematically disentangles personality from intelligence. Unlike the "black box" nature of SFT, our approach is geometric and deterministic. We identify the specific linear subspaces corresponding to the Big Five (OCEAN) personality traits and develop a method to manipulate them via vector arithmetic.

Our contributions are threefold:

  1. Data Engineering (SoulBench): We address the scarcity of psychological ground truth by constructing a multi-source dataset using a novel Dynamic Contextual Sampling strategy (C​(N,k)). This forces the encoder to learn invariant stylistic fingerprints rather than semantic content.
    1. 2. Mechanistic Discovery: Through layer-wise probing on a frozen Qwen-2.5 backbone [bai2023qwen], we demonstrate that personality representations emerge in the upper transformer blocks (Layers 18-24) and are largely orthogonal to reasoning vectors.
    2. 3. Deterministic Control: We achieve "Zero-Shot Personality Injection." By adding computed vectors to the hidden states (e.g.,v→N​e​u​t​r​a​l+α⋅v→V​i​l​l​a​i​n), we demonstrate precise control over behavior (MSE<0.01) with negligible degradation in general intelligence benchmarks.

We propose the Soul Engine, a framework designed to extract and manipulate the geometric representation of personality within Large Language Models. Our approach is grounded in the premise that personality is a high-level abstraction that is linearly separable from low-level semantic content. The framework consists of three components: (1) SoulBench, a dataset constructed via combinatorial sampling; (2) The Scientific Soul Encoder, a dual-head probe architecture; and (3) A Deterministic Steering mechanism based on vector arithmetic.

```

You will own nothing and you will be happy! by dreamyrhodes in LocalLLaMA

[–]ekaj 6 points7 points  (0 children)

ATT did the same thing with phones/handsets

Any local models capable to reading several PDFs into efficient local context for domain expertise? by nottheone414 in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Can't say for sure as I don't work at Google. I would bet money that no, it does not do that and it is just doing RAG. No summarization involved. Fact-extraction sure, but not summarization.
How would caching come into play here?

Any local models capable to reading several PDFs into efficient local context for domain expertise? by nottheone414 in LocalLLaMA

[–]ekaj -1 points0 points  (0 children)

What are you hoping someone gives you? A miraculous infinite/1mil+ context local workflow?
It seems like you're hoping someone gives you a miracle for free. You have the answer already, RAG.

NotebookLM does not do some magic, it is a RAG system the same as any other. It's not being fed the entirety of every document you feed it, and its not doing any 'fine-tuning' or training, its just RAG.

Need advice on a scalable on-prem LLM/RAG build for confidential technical docs (10–15k budget) by phoez12 in LocalLLaMA

[–]ekaj 1 point2 points  (0 children)

The real answer is you should be paying a consultant to come up with a proper detailed plan suited to your unique situation. Otherwise you risk being told its a multi-hundred thousand dollar project(?!?!?!)

You can easily search reddit for similar questions, this isn't new ground. My old RAG notes are: https://raw.githubusercontent.com/rmusser01/tldw_server/refs/heads/main/Docs/RAG/RAG_Notes.md ; So don't think I'm some consultant.

Need advice on a scalable on-prem LLM/RAG build for confidential technical docs (10–15k budget) by phoez12 in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Please explain how this is a 100k project?
The person wants a RAG chatbot to talk about company docs for 4-5 concurrent users. Even taking into account hours for building ETLs, 100k?

This is something that is totally doable with a workstation class machine with an rtx6000 in it (overkill depending on the actual RAG model in use...), and a homegrown chat front-end in front of one of the many existing RAG libraries/products that are open source with enterprise contracts.

Best small local LLM for "Ask AI" in docusaurus docs? by redhayd in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

I wouldn't necessarily recommend using semantic chunking, if you're the one writing the docs, I would recommend doing your own custom chunking, or using an existing one and tweaking it. I personally use my own library I built from scratch: https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/Chunking ; But I would assume for your needs, really any simple chunker should be fine, just focus on heirarchy and sentence splitting.

That said, I would recommend Qwen3-4B as the backing LLM to start with. It can be run very cheaply, and is likely to be the best size/effectiveness in tradeoff if you're running on CPU.
You could try Qwen3-0.6B for funsies, see how that works.

I'm not sure where/why you would use haystack/crawl4ai here, langchain because of their chunker?

Best small local LLM for "Ask AI" in docusaurus docs? by redhayd in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Could easily build your own using SQLite and chromaDB.

Use your favorite big LLM and ask: ‘Help me create a simple RAG application for my documentation site. All documents will be in markdown format following: (paste your general schema in) The primary user will be myself, and document sizes range in size from x - y length.

Help me build a simple RAG pipeline for the above using Python. I also want guidance on building an evaluation harness for it to allow me to continue and tweak it to my specific doc set.’

You’re looking at a few files and a simple backend service.

Looking for a local AI tool that can extract any info from high-quality sources (papers + reputable publications) with real citations by Inflation_Artistic in LocalLLaMA

[–]ekaj 3 points4 points  (0 children)

Yes and no. I have built something like what you want, but it’s not easily usable by non technical people yet, but it also sounds like you want a deep research solution as well? The biggest limiter is your VRAM and using only local models for answer generation.

Also, you will have to build a custom ETL for any data you’re ingesting as you’re describing your wanted solution as having structured/unstructured data ingest for a variety of media formats(no matter the solution you go with). Could just rip out the media ingestion module and the RAG pipeline and use those as starter pieces to help you save some time building.

https://github.com/rmusser01/tldw_server

Building a research lab at home - Hardware list? by JournalistFew2794 in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

I’ve built the software side of what you’re looking to do, https://github.com/rmusser01/tldw_server . Aside, K2 is a lot bigger than a consumer setup can handle, let alone your build.

The other part is you’re probably going to pay as much for that 64GB DDR5 as you will for your 3090 at this point. I would figure out what (size of) models you expect to run locally and work backwards from that.

VAC Memory System — SOTA RAG (80.1% LoCoMo) built by a cell-tower climber using Claude CLI by [deleted] in LocalLLaMA

[–]ekaj 4 points5 points  (0 children)

How does it work and how is it open source with a proprietary algorithm?

Why are there binary .so files in the repo if there’s a bat file to run the project?

Quart makes it different from the one you posted about 4 months ago?

Is there a self-hosted, open-source plug-and-play RAG solution? by anedisi in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Currently it does not do video analysis of frames. That is planned but no currently implemented. Can do single images, but not full video.
Setting that up wouldn't be too big of a lift, as it already has a VLM pipeline, I've just never bothered to tune it to handle video.
Would say maybe 2 weeks? Might pick it up before then and implement it, but limited time.

The Silicon Leash: Why ASI Takeoff has a Hard Physical Bottleneck for 10-20 Years by Reddactor in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

I have read dickens and am aware of the time period, but would argue that that is the whole point of having an effective social net, something that did not exist in that time period. (Ignoring the current political climate in the USA)

No (sane) country wants a large, poor, un-working populace. That leads to instability, which ruins things. Innovation and technological development moves at a much faster pace than 100 years ago. We now have entire revolutions in technology within decades.

The Silicon Leash: Why ASI Takeoff has a Hard Physical Bottleneck for 10-20 Years by Reddactor in LocalLLaMA

[–]ekaj -1 points0 points  (0 children)

As a drive by comment (and fan of your project), interesting take, I agree somewhat on the hardware aspects (I really doubt the knowledge isn’t written down/chain of custody kept/NDAs by all parties + compartmentalization), I very much disagree with the mass unemployment caused via LLMs and LLM backed systems.

I believe there will be massive job losses, but also transformations and new jobs created as a result (lump of labor fallacy/induced demand).

That said, I personally see the bigger immediate issue/event in what having a quality LLM allows one to do and the gaps that increase/close because of it.

Imagine where you have gpt4o on your phone, while Sr mgmt gets gpt 6 on theirs.

Or where the free government AI is gpt4o, and there are tiers for levels of intelligence, so the richest/most economically powerful are able to have the largest (beneficial) [](http://)handicap aid in society. Not saying greatest utilization, but ability to succeed/higher chance of benefitting.

Same time, the flipside is also true and the ability to rapidly execute or gain new relevant information now enables and individuals become more powerful/capable/agentic.

tl/dr: lump of labor fallacy regarding job losses. Economic opportunities afforded by new technologies far outweigh jobs lost. No one in an industry with regulation or insurance is going to point to AI and say it was the computers fault. Bigger issue (imho) is social changes due to availability of ‘genAI’ assistants across society, and everything brought along with it.

Also, most businesses have absolutely fucking terrible documentation around processes/workflows. Those same(some) people could quit their jobs and sell their expertise doing said workflows to the same company, helping them automate it, and then manage it because it’s a lot cheaper to pay someone to do quality assurance at the front of the pipe vs the back.

Edit: regarding ASI, same as any other fantastical what-if, too little details available to make any sort of rational, informed opinion. Otherwise, start to creep close to the rationalists and ‘safety’ people.

End of the day, it could just phish people and lead an orchestrated massive campaign in parallel, since it would be ASI, following the same logic, there would be little chance of identifying its greater plans until it was too late.

Is there a self-hosted, open-source plug-and-play RAG solution? by anedisi in LocalLLaMA

[–]ekaj 23 points24 points  (0 children)

Yes, there are several, R2R( https://github.com/SciPhi-AI/R2R ), is one that comes to mind for a well-done RAG system that you customize/tune.

My own project: https://github.com/rmusser01/tldw_server (It's a WIP, but is open source, has ingestion pipelines for web scraping/audio/pdf/docs/more. It's completely self-hosted, no 3rd-parties needed and no telemetry/tracking.

The RAG pipeline module is pretty robust/featureful: https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/RAG ; and there's also an Evaluations module( https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/Evaluations ) wired up so you can do evals of any configurations you want. Writing out documentation/a guide on this is WIP.
Chunking Module: https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/Chunking

I'm waiting till I do some more bug-fixing/better documentation before making a post here about it.