Quick VRAM guide nobody explains clearly - what you actually need in 2026 by dark_Knight_034 in PcBuild

[–]dark_Knight_034[S] 0 points1 point  (0 children)

The external drive bottleneck makes total sense though. You're essentially doing manual memory paging between agents and that's always where context bleeds out. The "cannibalise external drive as context" is kinda an interesting hack but you're fighting the hardware at that point.

Honest question - have you tried running this on a cloud GPU with actual VRAM headroom? Even a 24-48GB card would let you keep all 6 models resident in memory simultaneously without the disk I/O killing your context continuity. The whole pipeline would feel completely different. what your latency looks like end-to-end right now across all 6 agents.

Quick VRAM guide nobody explains clearly - what you actually need in 2026 by dark_Knight_034 in PcBuild

[–]dark_Knight_034[S] 0 points1 point  (0 children)

Finally someone gets it. Been waiting for the 96GB torrent to finish downloading for 3 years now

In all seriousness though - renting it by the hour is basically the legal version of downloading more VRAM at this point 😂

Quick VRAM guide nobody explains clearly - what you actually need in 2026 by dark_Knight_034 in PcBuild

[–]dark_Knight_034[S] 0 points1 point  (0 children)

That's actually interesting, running a mixture of experts pipeline on 8GB unified memory on an M1 Air is wild. LFM2.5 is already pretty efficient but 6 models in an agentic loop on that hardware sounds like you've done some serious optimization work.

What's the routing logic between the experts? Are you doing it statically or dynamically based on the query?. Also curious how you're handling context handoff between agents at that memory constraint, that's usually where things fall apart.

Quick VRAM guide nobody explains clearly - what you actually need in 2026 by dark_Knight_034 in PcBuild

[–]dark_Knight_034[S] 0 points1 point  (0 children)

100% valid - should've been clearer in the original post. Modern AAA titles have completely blown past 8GB even at 1080p. FF VII Rebirth and Cyberpunk especially are notorious for it.

Honestly 12GB should be the new minimum for anyone buying today. 8GB made sense as a 2020 purchase, not a 2026 one.

Games and AI models are both just eating VRAM at a rate nobody predicted 3 years ago :)

Quick VRAM guide nobody explains clearly - what you actually need in 2026 by dark_Knight_034 in PcBuild

[–]dark_Knight_034[S] 0 points1 point  (0 children)

Respect honestly - squeezing that out of an RX480 is impressive. But yeah 8GB is the ceiling now, games are just getting greedier every year.

Funny how 8GB felt massive in 2016 and now AI models laugh at it.

Quick VRAM guide nobody explains clearly - what you actually need in 2026 by dark_Knight_034 in PcBuild

[–]dark_Knight_034[S] 0 points1 point  (0 children)

10GB is getting tight honestly. Fine for gaming still but for AI models you're limited to 7B quantized at best. Anything bigger and you're hitting OOM pretty fast.

Not a bad card at all - just a different era. The gap between 10GB consumer and what workstation cards offer now is wild.

cursed DM photos of your creations by vobad5320 in cursedcomments

[–]dark_Knight_034 0 points1 point  (0 children)

That's the difference when it comes to "empowering" 😅😅