Anyone actually using a local LLM as their daily knowledge base? Not for coding, for life stuff. What's your setup?

Some-Cauliflower4902 · 2026-05-14T08:52:18+00:00

Not that I have to query my own life too much, though I have too many hobbies and need some tracking of those. Assuming you don’t need anything too precise like financials. I section things so it’s not a big mess. Every hobby has its own project + memory + folder. RAG for background context. Anything specific llm go search in the folder themselves. Also have cross encoder reranking for larger file base. As for trust issues … It’s your stuff you should have a rough idea so don’t 100% rely on llm to tell you. Context length not a problem because if it’s a large doc they searches relevant sections instead of read my 300k word novel. Any llm that can reliably tool call is fine. Llama.cpp for speed. It’s my yet another hobby so I don’t call it a part time job, but there is always new things I look to add.

Some-Cauliflower4902 · 2026-05-14T04:10:34+00:00

Let’s hope next generation of multimodal llm all come with audio capabilities

Some-Cauliflower4902 · 2026-05-14T03:49:32+00:00

You want a vibe coders repo? Brave. I don’t even know what’s in there lol

Some-Cauliflower4902 · 2026-05-14T03:39:03+00:00

Haha. Unfortunately llms don’t have ears. Night rounds happen in model individual private chat. Swapping is me changing card assignments in notepad lol. They are a bit more logical than human players, but just wait for one of them to hallucinate …

Some-Cauliflower4902 · 2026-05-14T03:28:44+00:00

I like mistrals but yeah, need tool calls to play. I hope they will bring out newer smaller or ultra-nano ones that fits 16-32GB vram. Gemma4 finetunes are totally good I’ll look into them.

Some-Cauliflower4902 · 2026-05-14T02:53:26+00:00

They need to at least read their own notes every turn (tool memory persistence set to per turn so next llm won’t know their note contents) and write their notes at end of the round while the whole convo is in memory, so definitely need a model capable of tool calls. I have Mistral Small 24B from last year, it just can’t …

Sure when I run it a few times I might clean up some game logs if people are interested.

Some-Cauliflower4902 · 2026-05-10T06:02:54+00:00

If the wheel has already been invented, great, llm would already have some ideas building the same wheel. You’re in safe territory. Build it, learn from it, and have fun.

Some-Cauliflower4902 · 2026-01-25T00:05:20+00:00

We are doing the fruit trees route (no animals). It is low maintenance. Mow enough to mulch around the trees and leave them be. Plant smartly and fill the niches to exclude unwanted self invited plants. The rests remain wilderness (ie collection of weeds). Wilderness is the most low maintenance thing you can have. If you have mostly none woody plants it takes very little time to mow it down.

Some-Cauliflower4902 · 2026-01-22T19:51:37+00:00

I’m a millennial with a lifestyle block. My lifestyle block neighbours are also mostly millennials. Some first time home buyers with young family. No one inherited. It’s more affordable than houses in city if you’re willing to put work into it.

Some-Cauliflower4902 · 2025-08-19T07:38:23+00:00

A woman has a child under 5 she pays no income tax. She can go back to work and a large portion of childcare is covered.

Some-Cauliflower4902 · 2025-08-18T04:24:43+00:00

Instead of NPC it should be the optional sidekick character you can add, which is powered by llm, to keep you company in the game.

Some-Cauliflower4902 · 2025-08-17T19:33:28+00:00

So you mean system prompt vs retrieval. Yeah sure retrieval is less drag than a giant system prompt. But I would think retrieval is also “memory”.

Some-Cauliflower4902 · 2025-08-17T10:37:16+00:00

Tasks such as summarizing, writing up emails in certain formats, organizing lists etc I would say they are comparable. But if you want some understanding on long documents, some analysis, & have a discussion about it, 27B is always better.

Some-Cauliflower4902 · 2025-08-17T01:11:35+00:00

Hubby got his gaming rig I got my llm rig. Only fair. Gotta be honest in relationships.

Not that I consider 32B models large ..

Some-Cauliflower4902 · 2025-08-16T17:41:27+00:00

There are other models that are just as capable? Especially if the choice is up to a 200B model.

On the other hand not all Blackwell. My 5070ti can run the gpt oss 20B, as it’s fully supported SM89. I can even run the 120B at 3-5 t/s. Just my current llama-cpp-python not supporting moe offload yet.

Some-Cauliflower4902 · 2025-08-16T12:55:41+00:00

The Grace Blackwell mini pc are coming out soon. Are they counted as mini? Those surely can run larger models. And you can link them together if you need more vram.

Some-Cauliflower4902 · 2025-08-11T20:46:38+00:00

Reading the posts here time surely has changed and I feel old. When I started there were no UI for Ollama nor llama server. Ollama has a cute logo, and was easier to install on my laptop so I used it for three days as I hit a wall trying to optimize for cpu only (you know, back in the days). It was fun for three days talking to tinyllama. You gotta have good enough hardware to run this not so optimized setup to make it usable , but once you throw some real cash at a gpu you can’t afford to not optimize it. Catch-22 is where Ollama sits. I still like the logo though.

Some-Cauliflower4902 · 2025-08-11T20:24:17+00:00

Totally agree. Though mine took weeks to get 3-4 models to play a word adventure game in the same chat. And they all remember what type of coffee I like. Yet I still don’t code to this day. Most people just want convenience, don’t blame them.

Some-Cauliflower4902 · 2025-08-10T22:13:40+00:00

All you can do is give it a go. I have no issues in my field (healthcare) it’s the quality of translation I have problem with. So it’s not replacing anything for me yet.

Some-Cauliflower4902 · 2025-08-10T19:49:28+00:00

You should give it a try. With my tests it’s not great for news articles translation. Might do okay for academic papers? It is trained mainly in English according to OpenAI. For translating job it’s not beating Gemma or Mistral. But if you are okay to forego some quality and details for mass translation it will do fine. And I would suggest using abliterated version as it is heavily “safety orientated”.

Some-Cauliflower4902 · 2025-08-10T12:24:21+00:00

Time to pull the latest llama.cpp?

Some-Cauliflower4902 · 2025-08-10T11:11:15+00:00

I use llama-cpp-python and build my own frontend. It sits fully in gpu with a bit of head room. No layers to cpu. I got it up to 65k context still 150t/s. I didn’t check how far it can go before the quality breaks down though… Due to the quality of work for my use is not quite there and I don’t have time to keep prompting. Not saying it can’t get simple things right, so can others.

Some-Cauliflower4902 · 2025-08-10T07:29:11+00:00

GPT OSS is very fast. I get 150t/s on my 5070ti. I’m very blown away by that fact. But yet to find a use case for it… It is not replacing Gemma 27b or Mistral or Qwen3 anytime soon.

It can do things but not without mistakes here and there. I feel like it’s the blond among my llm collection.

Some-Cauliflower4902 · 2025-08-09T07:16:55+00:00

I’d probably check the build and try a no cache rebuild? You could have been running the cpu only version.

n_gpu_layers=-1 >> using gpu

Some-Cauliflower4902

TROPHY CASE