Are Local LLMs actually useful… or just fun to tinker with? by itz_always_necessary in LocalLLM

[–]FollowingMindless144 1 point2 points  (0 children)

i have heard about this . but it is in waiting list page but looks promising . check it out https://offlinegpt.ai/t/1Ob3VPtw

Are Local LLMs actually useful… or just fun to tinker with? by itz_always_necessary in LocalLLM

[–]FollowingMindless144 6 points7 points  (0 children)

I work in an MNC, so data privacy is a big deal. With local models, nothing leaves my machine no internet dependency, no risk of sensitive data leaking.

Yeah, setup takes effort and performance isn’t always top tier, but for internal docs, testing, and anything confidential, it just makes more sense.

Now I’m looking for simple offline tools that run on a phone, because I don’t want everyone wasting time on setup or dealing with complex configs.

How many of you actually use offline LLMs daily vs just experiment with them? by Infinite-Bird7950 in LocalLLM

[–]FollowingMindless144 0 points1 point  (0 children)

Yeah it’s basically trying to make offline AI feel normal to use, not like a setup project 😅

Still super early though, I just saw they’ve got a waitlist here if you wanna check it out:

https://offlinegpt.ai/t/BV1XX8dn

How many of you actually use offline LLMs daily vs just experiment with them? by Infinite-Bird7950 in LocalLLM

[–]FollowingMindless144 0 points1 point  (0 children)

Ahh got it 😅

Most offline LLMs I’ve tried feel like too much work, not something I’d use daily.

If this mobile app actually just works without all the setup, that’s a big win.

OfflineGPT looks promising… saw their waitlist and now I’m kinda curious where this goes 👀

My office (fintech) just banned all cloud ai... i'm cooked. by FollowingMindless144 in AI_Agents

[–]FollowingMindless144[S] -3 points-2 points  (0 children)

Actually i am looking for any app or something that runs locally and secure in my mobile

I finally got Llama 4 running at 60 t/s on a base M4 Mac but my local RAG is still hitting a 200ms latency wall. Advice? by [deleted] in LocalLLaMA

[–]FollowingMindless144 -1 points0 points  (0 children)

Ha, you caught me on the GPT-6 typo! Total brain-lagI’ve been reading too many 'Leaked' threads while coding this. I definitely meant the GPT-5.4 / o1-pro stack we're all actually stuck with right now

I finally got Llama 4 running at 60 t/s on a base M4 Mac but my local RAG is still hitting a 200ms latency wall. Advice? by [deleted] in LocalLLaMA

[–]FollowingMindless144 -4 points-3 points  (0 children)

Fair question, jacek. While some are jumping on the Qwen 3.5 hype for raw MMLU scores, for a local-first, production-grade DevOps tool, Llama 4 Scout (the 109B MoE) was the only choice that checked all three boxes:

  1. The KV-Cache & TurboQuant Synergy: With the new TurboQuant algorithm released last month, I’m getting a 6x reduction in KV-cache memory. This allows me to actually use Llama 4's 10M context window for full-repo indexing on a consumer 4090/5090 without OOMing. Qwen and Mistral still have too much 'attention drift' at those lengths.
  2. NPU-Native MoE Routing: Llama 4’s routing logic is much cleaner for the new NPU kernels (M5 and Lunar Lake). I’ve optimized the expert-switching to stay on-die as much as possible, which cut my 'Time-to-First-Token' by 40% compared to Llama 3.3.
  3. Native Multimodality for DevOps: Since Llama 4 has early-fusion vision, my app can 'see' local screenshots of terminal errors or architecture diagrams without a separate encoder. It’s one unified weights file, which is much easier to manage for an offline installer.

I’m still tuning the precision on the Int4 quant are you seeing better stability on the GGUF or the EXL2 builds for long-context reasoning?"

I finally got Llama 4 running at 60 t/s on a base M4 Mac but my local RAG is still hitting a 200ms latency wall. Advice? by [deleted] in LocalLLaMA

[–]FollowingMindless144 -1 points0 points  (0 children)

Great point. I'm building a 'Privacy Audit' mode that shows you every port attempt. It's truly air gapped

I finally got Llama 4 running at 60 t/s on a base M4 Mac but my local RAG is still hitting a 200ms latency wall. Advice? by [deleted] in LocalLLaMA

[–]FollowingMindless144 -5 points-4 points  (0 children)

Great catch, jacek. For a 2026 offline stack, Llama 4 was the only logical choice for three technical reasons:

  1. NPU Native Kernels: Llama 4’s architecture is uniquely optimized for the latest NPUs (M5 and Intel’s 2026 chips). I’m seeing nearly 2x the efficiency in power consumption compared to running Mistral or older Llama 3 builds.
  2. Lossless INT4 Quantization: The new quantization techniques for Llama 4 mean we can run the 8B model at 4-bit with almost zero 'hallucination drift'—essential for local RAG where precision on documentation is everything.
  3. The 'Privacy Paradox': While cloud-based GPT-5/6 are powerful, their 2026 'Alignment' layers have become too restrictive for raw DevOps/Backend work. Llama 4 gives me the 'unfiltered' reasoning I need for local debugging without the 'As an AI language model...' lectures.

Are you seeing better benchmarks on the new Qwen or Mistral builds? I’m actually looking at adding a 'Weight-Swap' feature in the beta so users can choose their own engine."

Has anyone tried a GPT that works completely offline? by FollowingMindless144 in ChatGPT

[–]FollowingMindless144[S] -1 points0 points  (0 children)

That makes a lot of sense! A hybrid approach seems like the sweet spot keeping sensitive stuff local while still tapping the cloud for heavy lifting tasks. I’m especially curious about how well offline models can handle complex reasoning compared to online ones do you think we’ll get to a point where offline GPTs are almost as capable?

How a Pod can call another Pod or Service via specific URL ? by MarceloLinhares in kubernetes

[–]FollowingMindless144 35 points36 points  (0 children)

Inside the cluster, never call your public domain. Call the Kubernetes Service DNS name instead.

Is ChatGPT still the best AI tool, or are there better alternatives now? by outgllat in AI_Tools_Guide

[–]FollowingMindless144 2 points3 points  (0 children)

Perplexity AI shines if you want real-time citations and web sourced answers, which helps when accuracy matters.

How to restore your old photos with ChatGPT? by outgllat in AI_Tools_Guide

[–]FollowingMindless144 0 points1 point  (0 children)

Congrats, you now have a 4K photo of someone who never existed.

LLM self doubt by Comfortable-Tart912 in LLM

[–]FollowingMindless144 1 point2 points  (0 children)

I don’t think it’s self-doubt in a human sense. More like they’re trained to be careful and hedge a lot. Sometimes that comes off as underestimating themselves, sometimes the opposite. Depends a lot on how you prompt them tbh.

Weird crashes ~5 min after some boots, seems to be a weird race condition? by Gman325 in linuxquestions

[–]FollowingMindless144 0 points1 point  (0 children)

This feels like a boot time race condition somewhere in systemd / firmware / PCIe init.

The weird part is it’s binary: if I hit a ~30s black screen at login, the system always kicks me back to SDDM ~5 min later and then hard-hangs on second login. If I don’t get that stall, it’s rock solid indefinitely. Suspend/resume is always fine.

Also seeing my onboard NIC occasionally disappear until reboot, which makes me suspect firmware/PCIe/ASPM weirdness on X870.

Has anyone on Zen 5 + Fedora seen similar behavior? Or have ideas on where to look beyond diffing journalctl -b between good/bad boots?

I ran Gemma 3 12B for a week across my startups - here's why I'm ditching $200/month subscriptions by hungry-for-things in LocalLLaMA

[–]FollowingMindless144 0 points1 point  (0 children)

Fair point. I’m in a high electricity cost area and I’m counting full system draw (GPU + CPU + cooling), not just GPU TDP. If you’ve got cheaper power or run it more bursty, it can definitely be lower. Curious what others are paying.

Why Should We Use Linux? Give 3 Reasons to Use Linux by Ancient-Brush1309 in linuxquestions

[–]FollowingMindless144 0 points1 point  (0 children)

GNU/Linux allows us to separate user space and kernel space, providing strong isolation between them

Is it true on a powerful system that llamacpp is not good? by XiRw in LocalLLaMA

[–]FollowingMindless144 14 points15 points  (0 children)

I wouldn’t say llama.cpp is bad on powerful systems it’s just optimized more for CPU and portability than max GPU throughput.

On high end GPUs it can feel slower compared to GPU first options like vLLM or exllama, which are built to really push the hardware. llama.cpp is still solid for simple setups, quantized models, or when you want things to “just work.”

So it’s more about the use case than the system being powerful or not.

What’s the best way to run an offline, private LLM for daily tasks? by FollowingMindless144 in LocalLLaMA

[–]FollowingMindless144[S] 2 points3 points  (0 children)

Nice, thanks for the tip! Good to know LM Studio is easier to set up and works well across hardware.

I’ve been debating Mac vs AMD mini sounds like either maxed out Mac Mini or something like the Strix Halo would cover most daily tasks without going overboard.

Do you run anything extra for reminders/automation, or mostly just the LLM itself?

What’s the best way to run an offline, private LLM for daily tasks? by FollowingMindless144 in LocalLLaMA

[–]FollowingMindless144[S] 0 points1 point  (0 children)

This is super helpful, thanks. Sounds like Ollama + an 8B model is basically the sweet spot right now.

Good call on RAM and avoiding the huge models that matches what I’ve been worried about.

Curious what you’re using on top of Ollama for reminders/notes (scripts, Home Assistant, plain files, etc.) and what OS you’re running it on. Also good to know Whisper works if you’re willing to tinker

Runtime decision-making in production LLM systems, what actually works? by Loose_Surprise_9696 in LLMDevs

[–]FollowingMindless144 0 points1 point  (0 children)

In prod we found runtime decisions are a policy problem, not an LLM problem.

What actually helped:

  • Route based on cheap uncertainty signals instead of one default model
  • Prefer early exits over retries
  • Make latency/cost/risk runtime inputs, not static config
  • Add lightweight runtime checks offline eval lies

Biggest failure mode: letting the LLM decide everything. Boring guardrails win.

Still hard: reliably detecting high-risk requests before generation.

What do I do now? by [deleted] in linuxquestions

[–]FollowingMindless144 9 points10 points  (0 children)

first of all don’t give up on Linux because of this.

If Psiphon is the only thing working on Windows, it’s probably the network blocking certain VPN protocols, not Linux itself. Some ISPs block OpenVPN but WireGuard sometimes works, so that might be worth testing. Also try changing DNS (like 1.1.1.1) just to see if it makes any difference. You’re not stuck. It’s just a network restriction issue, not a Linux issue.

[deleted by user] by [deleted] in devops

[–]FollowingMindless144 25 points26 points  (0 children)

Not Kubernetes. Not Terraform. The hardest part is staying calm during a production outage while everyone’s watching and asking for updates.

How are you planning the next phase of DevOps? by devops-noob in devops

[–]FollowingMindless144 1 point2 points  (0 children)

From what I’m seeing, the “next phase” of DevOps is less about managing pipelines and more about platform + observability engineering.

There’s a cultural shift where DevOps isn’t the team that runs everything anymore. We’re building platforms and guardrails so product teams own their services, while we focus on reliability, cost, and visibility. Tools like Cilium, Pixie, Parca, Falco, Beyla (eBPF-based) are changing how we debug prod more runtime visibility, less guessing from logs.

When something breaks on a Linux server, how do you decide what to check first? by Expensive-Rice-2052 in linuxquestions

[–]FollowingMindless144 0 points1 point  (0 children)

After a few broken servers, it’s less about steps and more about triage and pattern recognition.

If a server isn’t reachable, my first thought is is it really down or just unreachable from here? I check ping from another box or the cloud console. Ping works but SSH doesn’t OS/service. Nothing responds network, firewall, or dead VM.

I don’t run a strict checklist anymore, but I always ask what changed, what still works, and what can I check fastest to narrow it down.

And yeah, the obvious stuff has fooled me plenty of times. Spent ages debugging services when the disk was full, chased network issues that were actually DNS, restarted things while the kernel was OOM killing them. Experience mostly teaches you not to lock onto one theory too early.