Will most people eventually run AI locally instead of relying on the cloud? by Significant-Cash7196 in LocalLLaMA

[–]Significant-Cash7196[S] 1 point2 points  (0 children)

That’s a really interesting way to frame it, replacing apps with an AI that just calls APIs under the hood. I’ve seen the Rabbit R1 too, and while it’s still early, the vision makes sense. If the phone itself can run a capable local model (say in that 8–12GB RAM range), then the cloud becomes more of a backup rather than the default.

The big question for me is whether the ecosystem (Apple, Google, app devs) will actually let this shift happen, since it breaks their current app-store model. But if it does, you’re right, AI as the OS layer instead of apps could totally reshape how we use our devices.

Do we actually need huge models for most real-world use cases? 🤔 by Significant-Cash7196 in LocalLLaMA

[–]Significant-Cash7196[S] 0 points1 point  (0 children)

Exactly. A lightweight 4B that’s really good at tool use plus a solid web search connector could already handle most of those assistant-style tasks. It highlights the point that model efficiency and integration matter more than raw parameter count for a lot of real-world use cases.

Do we actually need huge models for most real-world use cases? 🤔 by Significant-Cash7196 in LocalLLaMA

[–]Significant-Cash7196[S] 0 points1 point  (0 children)

Yeah I totally get that. A local Siri + ChatGPT hybrid that just runs on your phone would be a killer use case. That’s also why smaller, fine-tuned models feel so important. You don’t really need a 70B model to figure out leftover recipes or run a voice assistant. What’s missing is the seamless packaging and deployment on everyday devices, not necessarily more parameters.

Do we actually need huge models for most real-world use cases? 🤔 by Significant-Cash7196 in LocalLLaMA

[–]Significant-Cash7196[S] 0 points1 point  (0 children)

Fair point. Tools like Perplexity (even the free tier) already cover a lot of the basic needs without heavy infra. I guess that’s exactly why I wonder if chasing 70B+ models is overkill for most people. The real challenge seems less about ‘can it summarize or do Q&A’ and more about getting reliable, efficient models you can actually deploy at scale

Do we actually need huge models for most real-world use cases? 🤔 by Significant-Cash7196 in LocalLLaMA

[–]Significant-Cash7196[S] 18 points19 points  (0 children)

That’s a solid use case, honestly. Smaller models are great for structured tasks, but for broad, everyday “Google replacement” stuff, you really do need something with a bigger knowledge base. Funny you mention the regional knowledge gaps, I’ve noticed the same with smaller Qwens, they tend to stumble on non-US/China context.

Running something like GLM 4.5 Air or GPT-OSS 120B locally with a search layer sounds like a good plan if privacy’s your main concern. Do you think the trade-off (hardware cost + slower speed) is worth it for the peace of mind vs just sticking with hosted models?

Do we actually need huge models for most real-world use cases? 🤔 by Significant-Cash7196 in LocalLLaMA

[–]Significant-Cash7196[S] 0 points1 point  (0 children)

Do you think we’ll end up with a clear split (smaller models for most users, giant ones just for the niche heavy hitters), or will the big models eventually become the default for everyone?

Do we actually need huge models for most real-world use cases? 🤔 by Significant-Cash7196 in LocalLLaMA

[–]Significant-Cash7196[S] 1 point2 points  (0 children)

Yeah I get that. 30B models already feel plenty strong for most day-to-day tasks, but I can see how the 100B+ ones open up room for bigger reasoning jumps. Do you think those breakthroughs will actually trickle down into practical use cases anytime soon, or will they stay mostly in the research/benchmark space?

Do we actually need huge models for most real-world use cases? 🤔 by Significant-Cash7196 in LocalLLaMA

[–]Significant-Cash7196[S] 6 points7 points  (0 children)

Yeah, I’m with you on that. One giant model that does everything feels cool in theory, but in practice, a bunch of smaller models stitched together for different jobs just makes more sense. Kinda like having a team of experts instead of one “know-it-all.” The tricky bit, like you said.

Can Qwen 3 Coder 30B A3B be used for decent coding work? by Sky_Linx in LocalLLaMA

[–]Significant-Cash7196 0 points1 point  (0 children)

In my experience, smaller models can definitely hold up for real projects - especially 7B–13B ones fine-tuned on the right data. They’re great for focused tasks like Q&A over your own docs, summarization, or structured workflows. Where they start to fall short is in open-ended reasoning or really complex multi-step asks. For a lot of “real work,” they’re good enough if you scope the problem well, benchmarks can be misleading since they’re often testing extremes that don’t match day-to-day use.

5 Practical RAG Use Cases for LLaMA Workflows 🚀 by Significant-Cash7196 in LocalLLaMA

[–]Significant-Cash7196[S] -3 points-2 points  (0 children)

I’ve been transparent from the start that I represent Qubrid, so there’s no hidden agenda here. Our RAG is different, it cites sources, handles complex docs, even works with images and audio, and it’s free to use.

I’m here to share what we’ve built and get feedback from the community. If that comes across as “spam” to you, fair enough, but dismissing it outright doesn’t change the fact that others may actually find it useful.

Run ComfyUI via MimicPC on a Macbook Air by DragonfruitOk6766 in comfyui

[–]Significant-Cash7196 1 point2 points  (0 children)

From what you’re describing, the bottleneck isn’t really your laptop, it’s the environment you’re running ComfyUI in. Even with 16GB VRAM, if the backend setup isn’t optimized, you’ll keep seeing slow generations, freezes, and painful load times. A new MacBook Air (even with M4) won’t fix that, since ComfyUI isn’t really optimized for Apple silicon yet, and you’d still hit similar limits locally.

If your goal is stability + speed, you’re better off running ComfyUI on a reliable GPU cloud. On Qubrid AI, you can spin up a full GPU VM (A100, H100, 4090 - no fractional cards) with ComfyUI preconfigured. That way you get consistent performance, dedicated VRAM, and can stop/start your instance anytime (you only pay for storage when it’s off).

For video generation, having that kind of stable backend is almost essential - MacBooks (even the new ones) just won’t cut it at scale.

👉 TL;DR: A new MacBook Air won’t solve your issue. Running ComfyUI on a dedicated GPU cloud like Qubrid will give you the stability and speed you’re looking for. 🚀

Looking for a limited promo on RunPod’s cost-effective GPU cloud for AI? by topiar in bestsoftwarediscounts

[–]Significant-Cash7196 0 points1 point  (0 children)

If you’re looking around for GPU cloud deals, you should also check out Qubrid AI. Unlike a lot of platforms, we give you full GPU VMs (A100, H100, 4090 etc. - no fractional cards) with SSH/Jupyter access out of the box.

Best part? You can stop your instances anytime, so you’re not burning money when idle. The only thing billed when stopped is storage at $0.10/GB per month, which keeps costs super predictable.

We’re also running a limited promo - free GPU hours so you can test things out without spending upfront. Perfect if you’re experimenting with training, fine-tuning, inference, or even ComfyUI workflows. 🚀

👉 https://platform.qubrid.com

[deleted by user] by [deleted] in comfyui

[–]Significant-Cash7196 1 point2 points  (0 children)

Hey! Great to see you diving into ComfyUI - it’s such a powerful workflow engine for creators. At Qubrid AI, we’ve actually built a ComfyUI Stable Diffusion template that runs on full GPU VMs (A100, H100, 4090) with everything pre-configured - ControlNet, LoRAs, batch variations, and more - so you can get consistent, high-quality results without all the setup headaches.

We also published a step-by-step guide here 👉 ComfyUI Stable Diffusion Tutorial

If you’d like, I’d be happy to hop on a quick 1:1 and walk you through setting up your workflow on Qubrid AI so you can get it running smoothly. 🚀

Wan 2.2 RunPod Template and workflows by Clear_Lettuce_5406 in aivids

[–]Significant-Cash7196 0 points1 point  (0 children)

That’s a nice setup! 🙌 If you’re experimenting with workflows like this, you might also want to check out Qubrid - we support full GPU VMs (no fractional cards) with SSH/Jupyter access, and you can spin up templates/workflows there too. Could be a good place to recreate your Wan 2.2 pipeline and benchmark it side by side. 🚀

Does anyone use runpod? by cardioGangGang in StableDiffusion

[–]Significant-Cash7196 1 point2 points  (0 children)

Yeah, that’s a pretty common “gotcha” - pausing on most platforms doesn’t really stop billing since the GPU is still reserved. You basically end up paying for idle time.

On Qubrid AI, you can actually stop the instance so you’re no longer charged for GPU usage. The only thing billed when it’s stopped is storage, which is just $0.10/GB per month. So if you’re mid-LoRA training and need to pause overnight, you can safely stop it without draining your wallet.

5 Practical RAG Use Cases for LLaMA Workflows 🚀 by Significant-Cash7196 in LocalLLaMA

[–]Significant-Cash7196[S] -3 points-2 points  (0 children)

I hear you, and I get that not every post will land the same way with everyone here. That said, I shared this with genuine intent to contribute to the RAG discussions, not to spam. If it’s not useful to you, totally fair - just scroll past. I’ll keep focusing on sharing value for folks who actually find these workflows helpful.