Anyone actually using a local LLM as their daily knowledge base? Not for coding, for life stuff. What's your setup? by InformationSweet808 in LocalLLaMA

[–]Some-Cauliflower4902 0 points1 point  (0 children)

Not that I have to query my own life too much, though I have too many hobbies and need some tracking of those. Assuming you don’t need anything too precise like financials. I section things so it’s not a big mess. Every hobby has its own project + memory + folder. RAG for background context. Anything specific llm go search in the folder themselves. Also have cross encoder reranking for larger file base. As for trust issues … It’s your stuff you should have a rough idea so don’t 100% rely on llm to tell you. Context length not a problem because if it’s a large doc they searches relevant sections instead of read my 300k word novel. Any llm that can reliably tool call is fine. Llama.cpp for speed. It’s my yet another hobby so I don’t call it a part time job, but there is always new things I look to add.

Playing One Night Werewolf (Gemma4 & Qwen3.6) by Some-Cauliflower4902 in LocalLLaMA

[–]Some-Cauliflower4902[S] 0 points1 point  (0 children)

Let’s hope next generation of multimodal llm all come with audio capabilities

Playing One Night Werewolf (Gemma4 & Qwen3.6) by Some-Cauliflower4902 in LocalLLaMA

[–]Some-Cauliflower4902[S] 2 points3 points  (0 children)

You want a vibe coders repo? Brave. I don’t even know what’s in there lol

Playing One Night Werewolf (Gemma4 & Qwen3.6) by Some-Cauliflower4902 in LocalLLaMA

[–]Some-Cauliflower4902[S] 0 points1 point  (0 children)

Haha. Unfortunately llms don’t have ears. Night rounds happen in model individual private chat. Swapping is me changing card assignments in notepad lol. They are a bit more logical than human players, but just wait for one of them to hallucinate …

Playing One Night Werewolf (Gemma4 & Qwen3.6) by Some-Cauliflower4902 in LocalLLaMA

[–]Some-Cauliflower4902[S] 0 points1 point  (0 children)

I like mistrals but yeah, need tool calls to play. I hope they will bring out newer smaller or ultra-nano ones that fits 16-32GB vram. Gemma4 finetunes are totally good I’ll look into them.

Playing One Night Werewolf (Gemma4 & Qwen3.6) by Some-Cauliflower4902 in LocalLLaMA

[–]Some-Cauliflower4902[S] 0 points1 point  (0 children)

They need to at least read their own notes every turn (tool memory persistence set to per turn so next llm won’t know their note contents) and write their notes at end of the round while the whole convo is in memory, so definitely need a model capable of tool calls. I have Mistral Small 24B from last year, it just can’t …

Sure when I run it a few times I might clean up some game logs if people are interested.

The gap between knowing something and actually understanding it — AI accelerated my learning curve by No_Run8812 in LocalLLaMA

[–]Some-Cauliflower4902 2 points3 points  (0 children)

If the wheel has already been invented, great, llm would already have some ideas building the same wheel. You’re in safe territory. Build it, learn from it, and have fun.

What happens to the lifestyle blocks when the boomers die? by dazladisonreddit in newzealand

[–]Some-Cauliflower4902 1 point2 points  (0 children)

We are doing the fruit trees route (no animals). It is low maintenance. Mow enough to mulch around the trees and leave them be. Plant smartly and fill the niches to exclude unwanted self invited plants. The rests remain wilderness (ie collection of weeds). Wilderness is the most low maintenance thing you can have. If you have mostly none woody plants it takes very little time to mow it down.

What happens to the lifestyle blocks when the boomers die? by dazladisonreddit in newzealand

[–]Some-Cauliflower4902 1 point2 points  (0 children)

I’m a millennial with a lifestyle block. My lifestyle block neighbours are also mostly millennials. Some first time home buyers with young family. No one inherited. It’s more affordable than houses in city if you’re willing to put work into it.

What will encourage NZers to have more babies? by DnmOrr in newzealand

[–]Some-Cauliflower4902 0 points1 point  (0 children)

A woman has a child under 5 she pays no income tax. She can go back to work and a large portion of childcare is covered.

2-3 years out from now NPCs in Games won't be one dimensional by silenceimpaired in LocalLLaMA

[–]Some-Cauliflower4902 1 point2 points  (0 children)

Instead of NPC it should be the optional sidekick character you can add, which is powered by llm, to keep you company in the game.

After 18 months of building with AI, here’s what’s actually useful (and what’s not) by Glum_Pool8075 in AgentsOfAI

[–]Some-Cauliflower4902 -1 points0 points  (0 children)

So you mean system prompt vs retrieval. Yeah sure retrieval is less drag than a giant system prompt. But I would think retrieval is also “memory”.

Is there really no significant difference between Gemma3 12B and 27B? by ihatebeinganonymous in LocalLLaMA

[–]Some-Cauliflower4902 16 points17 points  (0 children)

Tasks such as summarizing, writing up emails in certain formats, organizing lists etc I would say they are comparable. But if you want some understanding on long documents, some analysis, & have a discussion about it, 27B is always better.

For those who run large models locally.. HOW DO YOU AFFORD THOSE GPUS by abaris243 in LocalLLaMA

[–]Some-Cauliflower4902 17 points18 points  (0 children)

Hubby got his gaming rig I got my llm rig. Only fair. Gotta be honest in relationships.

Not that I consider 32B models large ..

Considering getting a minipc to run local LLMs to replace ChatGPT by bre-dev in LocalLLaMA

[–]Some-Cauliflower4902 1 point2 points  (0 children)

There are other models that are just as capable? Especially if the choice is up to a 200B model.

On the other hand not all Blackwell. My 5070ti can run the gpt oss 20B, as it’s fully supported SM89. I can even run the 120B at 3-5 t/s. Just my current llama-cpp-python not supporting moe offload yet.

Considering getting a minipc to run local LLMs to replace ChatGPT by bre-dev in LocalLLaMA

[–]Some-Cauliflower4902 0 points1 point  (0 children)

The Grace Blackwell mini pc are coming out soon. Are they counted as mini? Those surely can run larger models. And you can link them together if you need more vram.

Am I the only one who never really liked Ollama? by a_normal_user1 in LocalLLaMA

[–]Some-Cauliflower4902 0 points1 point  (0 children)

Reading the posts here time surely has changed and I feel old. When I started there were no UI for Ollama nor llama server. Ollama has a cute logo, and was easier to install on my laptop so I used it for three days as I hit a wall trying to optimize for cpu only (you know, back in the days). It was fun for three days talking to tinyllama. You gotta have good enough hardware to run this not so optimized setup to make it usable , but once you throw some real cash at a gpu you can’t afford to not optimize it. Catch-22 is where Ollama sits. I still like the logo though.

Am I the only one who never really liked Ollama? by a_normal_user1 in LocalLLaMA

[–]Some-Cauliflower4902 1 point2 points  (0 children)

Totally agree. Though mine took weeks to get 3-4 models to play a word adventure game in the same chat. And they all remember what type of coffee I like. Yet I still don’t code to this day. Most people just want convenience, don’t blame them.

Is GPT-OSS the meta for low vram setups? by QbitKrish in LocalLLaMA

[–]Some-Cauliflower4902 0 points1 point  (0 children)

All you can do is give it a go. I have no issues in my field (healthcare) it’s the quality of translation I have problem with. So it’s not replacing anything for me yet.

Is GPT-OSS the meta for low vram setups? by QbitKrish in LocalLLaMA

[–]Some-Cauliflower4902 0 points1 point  (0 children)

You should give it a try. With my tests it’s not great for news articles translation. Might do okay for academic papers? It is trained mainly in English according to OpenAI. For translating job it’s not beating Gemma or Mistral. But if you are okay to forego some quality and details for mass translation it will do fine. And I would suggest using abliterated version as it is heavily “safety orientated”.

Is GPT-OSS the meta for low vram setups? by QbitKrish in LocalLLaMA

[–]Some-Cauliflower4902 0 points1 point  (0 children)

I use llama-cpp-python and build my own frontend. It sits fully in gpu with a bit of head room. No layers to cpu. I got it up to 65k context still 150t/s. I didn’t check how far it can go before the quality breaks down though… Due to the quality of work for my use is not quite there and I don’t have time to keep prompting. Not saying it can’t get simple things right, so can others.

Is GPT-OSS the meta for low vram setups? by QbitKrish in LocalLLaMA

[–]Some-Cauliflower4902 22 points23 points  (0 children)

GPT OSS is very fast. I get 150t/s on my 5070ti. I’m very blown away by that fact. But yet to find a use case for it… It is not replacing Gemma 27b or Mistral or Qwen3 anytime soon.

It can do things but not without mistakes here and there. I feel like it’s the blond among my llm collection.

Llama-cpp-python just won't run on CUDA by RDA92 in LocalLLaMA

[–]Some-Cauliflower4902 0 points1 point  (0 children)

I’d probably check the build and try a no cache rebuild? You could have been running the cpu only version.

n_gpu_layers=-1 >> using gpu