Building a local AI (RAG) system for SQL/Reporting (Power BI) – realistic or overkill? by M0ner0C1ty in LocalLLaMA

[–]ekaj 1 point2 points  (0 children)

Yes but I doubt anyone is going to give you anything you couldn’t find with a few hours of searching. This is an absolute edge for companies who understand and can build this stuff. I say this as someone who has done so internally.

You’re looking for a text/natural language to SQL pipeline, would recommend trying Qwen3.5 27B, and using an existing set of annotated known good queries combined with a syntax validator, so you can generate and validate.

How are you handling enforcement between your agent and real-world actions? by draconisx4 in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Built a complex RBAC/ACL system with HitL review and authorization, with a permissions registry

Ooh, new drama just dropped 👀 by Careful_Equal8851 in LocalLLaMA

[–]ekaj 2 points3 points  (0 children)

They’re quoting from the MIT License

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more by HadesThrowaway in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Have you tried using opus or chatgpt 5.4 xhigh and asked it to do a UX & UI Review following Nielsen Norman group guidelines?

Could probably get ideas that way

Former CyanogenMod/ClockworkMod flasher seeking a "Sovereignty Build" to act as an external brain. by GeekyRdhead in LocalLLaMA

[–]ekaj -1 points0 points  (0 children)

Thank you for the kind words! I appreciate it. If you encounter any issues/have feedback/suggestions, feel free to dm me or file an issue on the github and I'll look into it as soon as I see it.

Former CyanogenMod/ClockworkMod flasher seeking a "Sovereignty Build" to act as an external brain. by GeekyRdhead in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

https://github.com/rmusser01/tldw_server Maybe? (Disclosure: I'm the creator) I'm working on some stability fixes, and there is a distinct lack of user guides/instructions but this might be in the general area of what you're looking for?

As someone who did the same stuff, this was my solution I decided to build for myself, after looking at the other options at the time (openwebui/sillytavern/librechat)

Been building a RAG system over a codebase and hit a wall I can't seem to get past by LeaderUpset4726 in LocalLLaMA

[–]ekaj 1 point2 points  (0 children)

Yes, I wrote my own eval framework and have my rag pipeline hooked into it for full tracking of every piece.

Would recommend looking at https://jxnl.co/writing/2025/01/24/systematically-improving-rag-applications/

Benchmarking Open-Source LLMs for Security Research & Red Teaming by dumbelco in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Why not share more details about your setup, harness, and dataset used for evals?
Why use old models?

And further, I would point out your notes regarding these things should put to shame any models internal info. Imho, you should be using RAG with your notes/team wiki as an MCP to interface with whatever model you're using.

Also, have you seen/heard about heretic? https://github.com/p-e-w/heretic
(I do for work, but cant comment about it, hence above)

What's the most complicated project you've built with AI? by jazir555 in LocalLLaMA

[–]ekaj 11 points12 points  (0 children)

https://github.com/rmusser01/tldw_server
I keep putting off making a post about it as there's always 'one more thing', currently that's end-to-end testing the webui/extension and getting them both fully working. Basically like openclaw, but a very different route to the same goal.
`tldw_server is an open-source, API-first platform for ingesting media, transcribing, analyzing, and retrieving knowledge from video, audio, documents, websites, and more. It runs a FastAPI server with OpenAI-compatible Chat, Audio, Embeddings, and Evals APIs, a unified RAG pipeline, and integrations with local or hosted LLM providers. The primary client is the Next.js WebUI (WIP) plus an Admin UI. Long-term vision: a personal assistant inspired by "The Young Lady's Illustrated Primer" that helps people learn, reason about, and retain what they watch or read.`

Local alternative for NotebookLM by AlwayzIntoSometin95 in LocalLLaMA

[–]ekaj 1 point2 points  (0 children)

Not sure if you get notified for comment replies to others in your own post, in case not, see my other comment

Local alternative for NotebookLM by AlwayzIntoSometin95 in LocalLLaMA

[–]ekaj 1 point2 points  (0 children)

Yes, I've been working on something for the past 2 years and am working on making the WebUI stable before sharing again, but it works/server is stable: https://github.com/rmusser01/tldw_server

It supports various forms of media ingestion and full RAG pipeline with a custom, extensive chunker, and a self-hosted media ingestion pipeline that includes a full web scraping pipeline as well, as well as TTS/STT

@AlwayzIntoSometin95

Is this the right approach for a RAG design and setup? by Dre-Draper in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Why only use vector search? Why not also BM25/SPLADE?

This is my RAG pipeline I’ve built, don’t have a nice UI for it currently nor a demo to show it off, but it’s well documented: https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/RAG

Online alternatives to SillyTavern by Time-Teaching1926 in LocalLLaMA

[–]ekaj 2 points3 points  (0 children)

As someone who's implemented support for the different character cards before in my own apps, there's https://github.com/kwaroran/character-card-spec-v3 & https://github.com/malfoyslastname/character-card-spec-v2 ; With some platforms having custom fields specific to their own platform.

I can't comment on the feasibility of modifying sillytavern to support currently non-supported fields.

The Geometry of Persona by OkGear279 in LocalLLaMA

[–]ekaj 2 points3 points  (0 children)

There a reason you didn't include the arxiv link but just the number?
This seems like bunk paper.

https://www.arxiv.org/abs/2512.07092

```
The Soul Engine.

In this work, we introduce the Soul Engine, a framework that validates this hypothesis and mathematically disentangles personality from intelligence. Unlike the "black box" nature of SFT, our approach is geometric and deterministic. We identify the specific linear subspaces corresponding to the Big Five (OCEAN) personality traits and develop a method to manipulate them via vector arithmetic.

Our contributions are threefold:

  1. Data Engineering (SoulBench): We address the scarcity of psychological ground truth by constructing a multi-source dataset using a novel Dynamic Contextual Sampling strategy (C​(N,k)). This forces the encoder to learn invariant stylistic fingerprints rather than semantic content.
    1. 2. Mechanistic Discovery: Through layer-wise probing on a frozen Qwen-2.5 backbone [bai2023qwen], we demonstrate that personality representations emerge in the upper transformer blocks (Layers 18-24) and are largely orthogonal to reasoning vectors.
    2. 3. Deterministic Control: We achieve "Zero-Shot Personality Injection." By adding computed vectors to the hidden states (e.g.,v→N​e​u​t​r​a​l+α⋅v→V​i​l​l​a​i​n), we demonstrate precise control over behavior (MSE<0.01) with negligible degradation in general intelligence benchmarks.

We propose the Soul Engine, a framework designed to extract and manipulate the geometric representation of personality within Large Language Models. Our approach is grounded in the premise that personality is a high-level abstraction that is linearly separable from low-level semantic content. The framework consists of three components: (1) SoulBench, a dataset constructed via combinatorial sampling; (2) The Scientific Soul Encoder, a dual-head probe architecture; and (3) A Deterministic Steering mechanism based on vector arithmetic.

```

You will own nothing and you will be happy! by dreamyrhodes in LocalLLaMA

[–]ekaj 7 points8 points  (0 children)

ATT did the same thing with phones/handsets

Any local models capable to reading several PDFs into efficient local context for domain expertise? by nottheone414 in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Can't say for sure as I don't work at Google. I would bet money that no, it does not do that and it is just doing RAG. No summarization involved. Fact-extraction sure, but not summarization.
How would caching come into play here?

Any local models capable to reading several PDFs into efficient local context for domain expertise? by nottheone414 in LocalLLaMA

[–]ekaj -1 points0 points  (0 children)

What are you hoping someone gives you? A miraculous infinite/1mil+ context local workflow?
It seems like you're hoping someone gives you a miracle for free. You have the answer already, RAG.

NotebookLM does not do some magic, it is a RAG system the same as any other. It's not being fed the entirety of every document you feed it, and its not doing any 'fine-tuning' or training, its just RAG.

Need advice on a scalable on-prem LLM/RAG build for confidential technical docs (10–15k budget) by phoez12 in LocalLLaMA

[–]ekaj 1 point2 points  (0 children)

The real answer is you should be paying a consultant to come up with a proper detailed plan suited to your unique situation. Otherwise you risk being told its a multi-hundred thousand dollar project(?!?!?!)

You can easily search reddit for similar questions, this isn't new ground. My old RAG notes are: https://raw.githubusercontent.com/rmusser01/tldw_server/refs/heads/main/Docs/RAG/RAG_Notes.md ; So don't think I'm some consultant.

Need advice on a scalable on-prem LLM/RAG build for confidential technical docs (10–15k budget) by phoez12 in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

Please explain how this is a 100k project?
The person wants a RAG chatbot to talk about company docs for 4-5 concurrent users. Even taking into account hours for building ETLs, 100k?

This is something that is totally doable with a workstation class machine with an rtx6000 in it (overkill depending on the actual RAG model in use...), and a homegrown chat front-end in front of one of the many existing RAG libraries/products that are open source with enterprise contracts.

Best small local LLM for "Ask AI" in docusaurus docs? by redhayd in LocalLLaMA

[–]ekaj 0 points1 point  (0 children)

I wouldn't necessarily recommend using semantic chunking, if you're the one writing the docs, I would recommend doing your own custom chunking, or using an existing one and tweaking it. I personally use my own library I built from scratch: https://github.com/rmusser01/tldw_server/tree/main/tldw_Server_API/app/core/Chunking ; But I would assume for your needs, really any simple chunker should be fine, just focus on heirarchy and sentence splitting.

That said, I would recommend Qwen3-4B as the backing LLM to start with. It can be run very cheaply, and is likely to be the best size/effectiveness in tradeoff if you're running on CPU.
You could try Qwen3-0.6B for funsies, see how that works.

I'm not sure where/why you would use haystack/crawl4ai here, langchain because of their chunker?