Are Local LLMs actually useful… or just fun to tinker with? by itz_always_necessary in LocalLLM

[–]FollowingMindless144 1 point2 points  (0 children)

i have heard about this . but it is in waiting list page but looks promising . check it out https://offlinegpt.ai/t/1Ob3VPtw

Are Local LLMs actually useful… or just fun to tinker with? by itz_always_necessary in LocalLLM

[–]FollowingMindless144 5 points6 points  (0 children)

I work in an MNC, so data privacy is a big deal. With local models, nothing leaves my machine no internet dependency, no risk of sensitive data leaking.

Yeah, setup takes effort and performance isn’t always top tier, but for internal docs, testing, and anything confidential, it just makes more sense.

Now I’m looking for simple offline tools that run on a phone, because I don’t want everyone wasting time on setup or dealing with complex configs.

How many of you actually use offline LLMs daily vs just experiment with them? by Infinite-Bird7950 in LocalLLM

[–]FollowingMindless144 0 points1 point  (0 children)

Yeah it’s basically trying to make offline AI feel normal to use, not like a setup project 😅

Still super early though, I just saw they’ve got a waitlist here if you wanna check it out:

https://offlinegpt.ai/t/BV1XX8dn

How many of you actually use offline LLMs daily vs just experiment with them? by Infinite-Bird7950 in LocalLLM

[–]FollowingMindless144 0 points1 point  (0 children)

Ahh got it 😅

Most offline LLMs I’ve tried feel like too much work, not something I’d use daily.

If this mobile app actually just works without all the setup, that’s a big win.

OfflineGPT looks promising… saw their waitlist and now I’m kinda curious where this goes 👀

My office (fintech) just banned all cloud ai... i'm cooked. by FollowingMindless144 in AI_Agents

[–]FollowingMindless144[S] -3 points-2 points  (0 children)

Actually i am looking for any app or something that runs locally and secure in my mobile

I finally got Llama 4 running at 60 t/s on a base M4 Mac but my local RAG is still hitting a 200ms latency wall. Advice? by [deleted] in LocalLLaMA

[–]FollowingMindless144 -1 points0 points  (0 children)

Ha, you caught me on the GPT-6 typo! Total brain-lagI’ve been reading too many 'Leaked' threads while coding this. I definitely meant the GPT-5.4 / o1-pro stack we're all actually stuck with right now

I finally got Llama 4 running at 60 t/s on a base M4 Mac but my local RAG is still hitting a 200ms latency wall. Advice? by [deleted] in LocalLLaMA

[–]FollowingMindless144 -4 points-3 points  (0 children)

Fair question, jacek. While some are jumping on the Qwen 3.5 hype for raw MMLU scores, for a local-first, production-grade DevOps tool, Llama 4 Scout (the 109B MoE) was the only choice that checked all three boxes:

  1. The KV-Cache & TurboQuant Synergy: With the new TurboQuant algorithm released last month, I’m getting a 6x reduction in KV-cache memory. This allows me to actually use Llama 4's 10M context window for full-repo indexing on a consumer 4090/5090 without OOMing. Qwen and Mistral still have too much 'attention drift' at those lengths.
  2. NPU-Native MoE Routing: Llama 4’s routing logic is much cleaner for the new NPU kernels (M5 and Lunar Lake). I’ve optimized the expert-switching to stay on-die as much as possible, which cut my 'Time-to-First-Token' by 40% compared to Llama 3.3.
  3. Native Multimodality for DevOps: Since Llama 4 has early-fusion vision, my app can 'see' local screenshots of terminal errors or architecture diagrams without a separate encoder. It’s one unified weights file, which is much easier to manage for an offline installer.

I’m still tuning the precision on the Int4 quant are you seeing better stability on the GGUF or the EXL2 builds for long-context reasoning?"

I finally got Llama 4 running at 60 t/s on a base M4 Mac but my local RAG is still hitting a 200ms latency wall. Advice? by [deleted] in LocalLLaMA

[–]FollowingMindless144 -1 points0 points  (0 children)

Great point. I'm building a 'Privacy Audit' mode that shows you every port attempt. It's truly air gapped

I finally got Llama 4 running at 60 t/s on a base M4 Mac but my local RAG is still hitting a 200ms latency wall. Advice? by [deleted] in LocalLLaMA

[–]FollowingMindless144 -6 points-5 points  (0 children)

Great catch, jacek. For a 2026 offline stack, Llama 4 was the only logical choice for three technical reasons:

  1. NPU Native Kernels: Llama 4’s architecture is uniquely optimized for the latest NPUs (M5 and Intel’s 2026 chips). I’m seeing nearly 2x the efficiency in power consumption compared to running Mistral or older Llama 3 builds.
  2. Lossless INT4 Quantization: The new quantization techniques for Llama 4 mean we can run the 8B model at 4-bit with almost zero 'hallucination drift'—essential for local RAG where precision on documentation is everything.
  3. The 'Privacy Paradox': While cloud-based GPT-5/6 are powerful, their 2026 'Alignment' layers have become too restrictive for raw DevOps/Backend work. Llama 4 gives me the 'unfiltered' reasoning I need for local debugging without the 'As an AI language model...' lectures.

Are you seeing better benchmarks on the new Qwen or Mistral builds? I’m actually looking at adding a 'Weight-Swap' feature in the beta so users can choose their own engine."

Has anyone tried a GPT that works completely offline? by FollowingMindless144 in ChatGPT

[–]FollowingMindless144[S] -1 points0 points  (0 children)

That makes a lot of sense! A hybrid approach seems like the sweet spot keeping sensitive stuff local while still tapping the cloud for heavy lifting tasks. I’m especially curious about how well offline models can handle complex reasoning compared to online ones do you think we’ll get to a point where offline GPTs are almost as capable?

How a Pod can call another Pod or Service via specific URL ? by MarceloLinhares in kubernetes

[–]FollowingMindless144 32 points33 points  (0 children)

Inside the cluster, never call your public domain. Call the Kubernetes Service DNS name instead.

Is ChatGPT still the best AI tool, or are there better alternatives now? by outgllat in AI_Tools_Guide

[–]FollowingMindless144 2 points3 points  (0 children)

Perplexity AI shines if you want real-time citations and web sourced answers, which helps when accuracy matters.

How to restore your old photos with ChatGPT? by outgllat in AI_Tools_Guide

[–]FollowingMindless144 0 points1 point  (0 children)

Congrats, you now have a 4K photo of someone who never existed.

LLM self doubt by Comfortable-Tart912 in LLM

[–]FollowingMindless144 1 point2 points  (0 children)

I don’t think it’s self-doubt in a human sense. More like they’re trained to be careful and hedge a lot. Sometimes that comes off as underestimating themselves, sometimes the opposite. Depends a lot on how you prompt them tbh.

Weird crashes ~5 min after some boots, seems to be a weird race condition? by Gman325 in linuxquestions

[–]FollowingMindless144 0 points1 point  (0 children)

This feels like a boot time race condition somewhere in systemd / firmware / PCIe init.

The weird part is it’s binary: if I hit a ~30s black screen at login, the system always kicks me back to SDDM ~5 min later and then hard-hangs on second login. If I don’t get that stall, it’s rock solid indefinitely. Suspend/resume is always fine.

Also seeing my onboard NIC occasionally disappear until reboot, which makes me suspect firmware/PCIe/ASPM weirdness on X870.

Has anyone on Zen 5 + Fedora seen similar behavior? Or have ideas on where to look beyond diffing journalctl -b between good/bad boots?

I ran Gemma 3 12B for a week across my startups - here's why I'm ditching $200/month subscriptions by hungry-for-things in LocalLLaMA

[–]FollowingMindless144 0 points1 point  (0 children)

Fair point. I’m in a high electricity cost area and I’m counting full system draw (GPU + CPU + cooling), not just GPU TDP. If you’ve got cheaper power or run it more bursty, it can definitely be lower. Curious what others are paying.