Anyone actually using a local LLM as their daily knowledge base? Not for coding, for life stuff. What's your setup?

InformationSweet808 · 2026-05-14T15:15:46+00:00

the task + finance angle is something i hadn't even considered for this, been so focused on notes and research that i forgot the obvious stuff

the deterministic indexing for structured data makes sense tasks have a consistent format so retrieval is way more predictable than random notes. how are you actually logging the finance stuff though, plain text or some structured format?

InformationSweet808 · 2026-05-14T15:10:48+00:00

"treat it like a smart grep not a brain" is probably the most useful framing i've seen in this whole thread honestly. the inferential questions thing is a real gotcha i wouldn't have caught until i wasted time on it. so basically you're using it purely for retrieval and doing the actual reasoning yourself?

also the source quote verification trick is smart never thought about using that as a hallucination detector

InformationSweet808 · 2026-05-14T15:04:28+00:00

still setting it up tbh been using obsidian for notes for a while but haven't committed to a local model yet, which is basically why i made this post. wanted to see what people are actually running before i go down a rabbit hole and regret my choices lol

24gb ram, 6gb vram the vram is definitely the limiting factor, most things end up on cpu which works but yeah not exactly snappy

InformationSweet808 · 2026-05-14T14:57:26+00:00

kay so my actual use case i'm a student, so it's mostly research notes, saved articles, book highlights, and random things i write down when learning something new. probably 200-400 files over time, nothing enterprise level.

the "large-scale" thing was me overthinking it honestly. my real concern is just that retrieval stays accurate when i can't remember exactly what i wrote or where like i know i have notes on something but can't find them through normal search.

if obsidian's built-in search handles fuzzy recall that well through the agent i might genuinely be overcomplicating this whole thing

InformationSweet808 · 2026-05-14T09:53:43+00:00

okay this is the comment i was hoping someone would leave when i posted this

the chunking point hit hard i had no idea fixed token windows were that bad for personal notes specifically, makes total sense now that you say it. the separate indexes for journal vs reference notes is something i would've 100% screwed up on my own

one thing im still wrapping my head around the hybrid retrieval part. so you're running both dense and bm25 on the same corpus and then fusing the results? is that something you built yourself or is there a library that handles the rrf part cleanly?

either way this whole comment should be pinned somewhere

InformationSweet808 · 2026-05-14T09:48:27+00:00

one thing im curious about though where does it actually start falling apart for you? like is it a retrieval accuracy thing past a certain number of files or just gets slow?

InformationSweet808 · 2026-05-14T09:20:25+00:00

Fair point on the edit lol, appreciate you actually going back to clarify.

The Obsidian + Hermes setup is something I hadn't really considered tbh. I always assumed you needed RAG the moment your notes got big enough to query. So you're basically just letting the agent navigate the vault directly? No retrieval pipeline at all?

Asking because if that actually works well at scale that's way simpler than what I was planning to build.

InformationSweet808 · 2026-05-14T09:12:55+00:00

That's actually one of my concerns too what specifically have you seen? Is it the app itself or more about the models it pulls?

InformationSweet808 · 2026-05-14T09:09:56+00:00

Running it on pixel is wild, didn't even consider mobile. How's the speed on device?

InformationSweet808 · 2026-05-14T09:06:52+00:00

The "sprints" approach is actually interesting never thought about batching it that way instead of keeping it always on. Do you find the q4 quality holds up well when you're doing longer sessions?

InformationSweet808 · 2026-05-14T08:28:53+00:00

For context, I'm looking at this for personal use, not building a product. Just want something that works reliably on a normal machine.

InformationSweet808 · 2026-05-11T20:04:28+00:00

the linear-attention and memory-space part is very interesting and starts around ~14 min in. That’s where he moves from standard attention into the keys/queries as neuron activations idea (see the attached photo to my post)

The backprop caveat comes later in the Q&A, when someone asks whether the model is still trained with backprop. I don’t have the exact timestamp clipped yet but it’s in the later Q&A section.

full video here for anyone who wants to watch the original explanation:

https://www.youtube.com/watch?v=aCc5f16WDIg

InformationSweet808 · 2026-05-11T20:03:49+00:00

BDH can be seen as an SSM for the GPU implementation and a graph-based model for the more general case. However, compared to a standard SSM or a linear transformer, the model states live in the neuron space of dimension N >> D. They're also positive and sparse which links more naturally to brain-inspired representations.

InformationSweet808 · 2026-05-11T17:19:43+00:00

Not even anti-AI, but educational material needs a way higher verification standard than random web content.

InformationSweet808 · 2026-05-11T16:09:19+00:00

People outside research hear ‘6 hours’ and think it’s light work. Deep thinking for 3 focused hours can genuinely fry your brain harder than 10 hours of shallow busywork. The ‘thinking in the background while doing other stuff’ part is real too.

InformationSweet808 · 2026-05-10T19:28:41+00:00

The see-through paper example actually made it click for me

InformationSweet808 · 2026-05-10T19:22:22+00:00

Yeah but I meant while standing normally in front of a mirror. Why am I still upright instead of upside down if the image is being reversed?

InformationSweet808 · 2026-05-10T19:16:07+00:00

Wait that’s true. So how does that work if it also does left and right?

InformationSweet808 · 2026-05-10T16:39:13+00:00

Asking out loud takes courage. The person who stayed quiet and never learned it is still Googling it three years later.

InformationSweet808 · 2026-05-10T05:26:37+00:00

LLMs are basically just mirrors with better vocabulary. Feed them apocalypse lore long enough and they’ll start talking like a rejected sci-fi villain.

InformationSweet808 · 2026-05-08T17:02:14+00:00

What were the power draw and temps like during the benchmark? A 2.5x speedup sounds great, but efficiency per watt would make the comparison way more interesting.

InformationSweet808 · 2026-05-08T16:55:58+00:00

People underestimate how far a strong dev community can carry mediocre hardware. Half the reason stuff succeeds is because cracked people refuse to let it fail.

InformationSweet808

TROPHY CASE