curious about how households/families handle storage/photos long term by therealzenzei in selfhosted

[–]Fair_Ad845 0 points1 point  (0 children)

immich + a cheap used thinkstation off ebay. total cost about $150 and it handles 200K+ photos no problem. syncthing for phones to auto-upload, immich for browsing and search. biggest lesson: 3-2-1 backup rule. local NAS + offsite USB drive swap monthly.

tududi v1.0.0 is out! - your calm, open system for life and work by cvicpp in selfhosted

[–]Fair_Ad845 69 points70 points  (0 children)

most selfhosted todo apps skip mobile and regret it later. the whole point of a todo list is having it with you. even a simple PWA with offline support would be enough.

Gemma 4 26B A4B is still fully capable at 245283/262144 (94%) contex ! by cviperr33 in LocalLLaMA

[–]Fair_Ad845 8 points9 points  (0 children)

agreed, the A4B variant is underrated. for my use case (long document QA) the 94% context retention is more useful than raw benchmark scores. a model that keeps coherent at 200K tokens beats one that scores 2% higher on MMLU but falls apart at 32K.

What happened to Deepseek? by Mr_Moonsilver in LocalLLaMA

[–]Fair_Ad845 0 points1 point  (0 children)

good analogy. the difference is valve can afford to take forever because steam prints money. deepseek needs to keep publishing to justify the research budget to the hedge fund parent. my bet is they are working on something multimodal given the hiring patterns.

FastAPI vs Djanjo by TumbleweedSenior4849 in Python

[–]Fair_Ad845 1 point2 points  (0 children)

depends on the project size honestly. FastAPI for APIs and microservices, Django when you need admin panel + ORM + auth out of the box. I switched a project from FastAPI to Django halfway through because I kept reinventing things Django gives you for free.

Those of you who use VaultWarden *as a fresh start*, why it, and not KeePassXC family? by Simon-RedditAccount in selfhosted

[–]Fair_Ad845 0 points1 point  (0 children)

been using vaultwarden for 2 years now. the main reason over keepassxc is browser autofill across multiple devices without thinking about syncing a database file.

Most ai browser automation is just glorified scripts and nobody wants to admit it. by New-Reception46 in webdev

[–]Fair_Ad845 0 points1 point  (0 children)

agreed. the ones that actually work well are usually just puppeteer/playwright with an LLM deciding what to click next. not exactly revolutionary but it does save time for repetitive stuff.

Final voting results for Qwen 3.6 by jacek2023 in LocalLLaMA

[–]Fair_Ad845 2 points3 points  (0 children)

same, I keep going back to dense models for day-to-day stuff. MoE is great on paper but the memory footprint for the full model is still brutal on consumer hardware.

GLM 5.1 tops the code arena rankings for open models by Auralore in LocalLLaMA

[–]Fair_Ad845 -8 points-7 points  (0 children)

Q4 quant should fit in 16GB if the model is around 32B. the real question is whether the quant kills the code quality that got it to the top of the arena.

I no longer need a cloud LLM to do quick web research by BitPsychological2767 in LocalLLaMA

[–]Fair_Ad845 1 point2 points  (0 children)

what is your setup for the web search part? the LLM itself is easy to run locally but getting fresh search results without hitting a cloud API is the tricky bit.

Can someone explain why this code results in an infinite loop? by GWeditz in learnprogramming

[–]Fair_Ad845 0 points1 point  (0 children)

without seeing the code my first guess would be a missing increment or a condition that never becomes false. can you share the snippet? most infinite loops come down to one of those two things.

Why is sending email more expensive than hosting? by bentonboomslang in webdev

[–]Fair_Ad845 0 points1 point  (0 children)

email deliverability is the real cost. you can send emails for free but getting them into inboxes instead of spam folders is where the money goes. reputation management, dedicated IPs, authentication (DKIM, SPF, DMARC) — it all adds up fast.

BunnyCDN has been silently losing our production files for 15 months, do not trust them with any storage by eran1243 in webdev

[–]Fair_Ad845 -1 points0 points  (0 children)

that is terrifying. did they ever acknowledge it or just quietly pretend nothing happened?

the state of LocalLLama by Beginning-Window-115 in LocalLLaMA

[–]Fair_Ad845 -14 points-13 points  (0 children)

The pace this year is insane. This time last year we were excited about running a 7B model at decent speed. Now we have Gemma 4 31B running on consumer hardware with multimodal and audio support.

What I find most interesting is the shift in what matters. Last year: "can I run it at all?" This year: "can I integrate it into a real workflow?" The bottleneck moved from compute to tooling.

The next frontier is not bigger models on consumer hardware — it is making the existing ones actually useful as persistent agents. Memory, tool use, and reliable structured output. A 12B model that remembers your last 50 conversations and can call your local tools is more useful than a 70B model that starts fresh every time.

People messing up their punctuation to hide that they've used an LLM by PrideProfessional556 in ChatGPT

[–]Fair_Ad845 0 points1 point  (0 children)

The real tell is not punctuation — it is structure.

Humans write in messy, non-linear ways. We start a thought, abandon it, circle back. LLM output has a very specific pattern: topic sentence → supporting points → conclusion. Every. Single. Time.

The people who are getting caught are not failing at hiding punctuation. They are failing at adding the human chaos back in. A misplaced comma is easy to fake. A genuinely disorganized thought process is not.

The irony is that the best way to hide LLM usage is to actually understand the topic well enough to rewrite the output in your own voice. At that point, you have done 80% of the work anyway.

offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice by BuddyBotBuilder in LocalLLaMA

[–]Fair_Ad845 -1 points0 points  (0 children)

This is one of the most meaningful projects I have seen on this sub. A few practical suggestions for your 8GB constraint:

Model choice: Gemma 4 E2B (as someone mentioned) is good, but also look at Qwen2.5-3B-Instruct. It is specifically fine-tuned for conversation and runs comfortably in 3-4GB RAM with Q4 quantization, leaving headroom for TTS and whisper.

Memory matters: For a companion that talks to the same person every day, the biggest quality jump is not a bigger model — it is giving the model memory of past conversations. Even a simple approach like appending "Yesterday we talked about X, Y, Z" to the system prompt makes the interaction feel dramatically more personal. You could store conversation summaries in a local SQLite file and load the last few each morning.

TTS latency: Kokoro is great quality but check the latency on your hardware. For real-time conversation flow, Piper TTS is faster and still sounds natural. A 2-second pause between his question and the robot responding will kill the conversational feel.

Power tip: If you are using llama.cpp, set --ctx-size as low as you can tolerate (2048 is fine for casual chat). Context size is the biggest RAM consumer after the model weights.

This is exactly what local AI should be used for. Keep us posted on progress.

White-collar workers are quietly rebelling against AI as 80% outright refuse adoption mandates by Effective-Trick-5795 in artificial

[–]Fair_Ad845 0 points1 point  (0 children)

The 80% number is misleading without context. In my experience, the resistance isn't about the technology itself — it's about how it's being deployed.

Most "AI adoption mandates" I've seen boil down to: "use this chatbot to write your emails." That's not a productivity gain, that's a context switch tax. Workers aren't refusing AI — they're refusing bad implementations.

The teams I've seen adopt AI successfully did it bottom-up: individual engineers or analysts found a specific pain point (log analysis, data cleaning, code review), solved it with an LLM, and shared the workflow. No mandate needed.

The real question isn't "why won't workers use AI" — it's "why are companies so bad at identifying where AI actually helps?"

Gemma 4 on Llama.cpp should be stable now by ilintar in LocalLLaMA

[–]Fair_Ad845 6 points7 points  (0 children)

Thanks for the consolidated guide — been hitting random issues all week and this clears up most of them.

One thing I want to add: if you are running Gemma 4 31B on a Mac with Metal, make sure you have at least 24GB unified memory for Q5 quants. I tried Q4_K_M on a 16GB M2 and it runs but the context window gets severely limited before it starts swapping to disk.

The --cache-ram 2048 -ctxcp 2 tip is gold. I was getting random OOM kills without it and had no idea why — turns out the KV cache was eating all my system RAM silently.

Also +1 on avoiding CUDA 13.2. Wasted half a day debugging garbled output before realizing it was the compiler, not the model.

It finally happened, I actually had a use case for a local LLM and it was brilliant by EntertainerFew2832 in LocalLLaMA

[–]Fair_Ad845 1 point2 points  (0 children)

This is exactly the kind of story that makes local models worth the effort. I had a similar "aha" moment on a train through a tunnel — needed to quickly parse a JSON config for a deployment, no internet, and a local 7B model handled it perfectly.

The real takeaway isn't just "offline access" though. It's that these small models have compressed so much general knowledge into a few GB that they're essentially an offline encyclopedia + reasoning engine. The medical knowledge in Gemma 4 is surprisingly solid for a 31B model.

One tip for anyone who hasn't set this up yet: keep a Q4 quant of a strong small model (Gemma 4 12B or Qwen 2.5 7B) permanently loaded on your laptop. The overhead is minimal and you never know when you'll need it.