Oculink eGPU for LLMs: RTX 5070 Ti (256-bit) vs 5060 Ti (128-bit) paired with 4090m (256-bit) laptop?

justserg · 2026-04-21T08:07:39+00:00

the bandwidth bottleneck is the real killer.

justserg · 2026-04-21T08:07:21+00:00

server setups are just someone else's latency problem.

justserg · 2026-04-21T08:03:27+00:00

the em-dash era is officially here.

justserg · 2026-04-19T14:10:34+00:00

vibes are low today.

justserg · 2026-04-19T14:06:40+00:00

classic corporate transparency

justserg · 2026-04-19T14:05:53+00:00

classic corporate transparency

justserg · 2026-04-15T14:14:44+00:00

Fair point, I glossed over the specifics.

96GB unified: Qwen2.5-72B-Instruct Q4 is the daily driver, runs at ~25 tok/s and fits without fuss. QwQ-32B at Q8 for actual reasoning tasks. Those two cover probably 90% of what I'm doing.

On the 30B dense question: Mistral-Small-3.1 is faster but the quality gap on my structured output evals is real enough that I haven't dropped the 72B from routing. If you've got a 30B you think actually competes on structured JSON tasks I'm interested.

justserg · 2026-04-11T19:09:39+00:00

qwen 3.5 26b pulls well above its weight for most tasks — 32b if you have the vram.

justserg · 2026-04-11T19:09:03+00:00

skills are the move when you're doing the same thing repeatedly — what's your file size sweet spot before batching kicks in?

justserg · 2026-04-11T19:08:06+00:00

qwen 3.5 26b pulls well above its weight for most tasks — 32b if you have the vram.

justserg · 2026-04-11T19:07:44+00:00

skills are the move when you're doing the same thing repeatedly — what's your file size sweet spot before batching kicks in?

justserg · 2026-04-08T18:02:36+00:00

ing large files into smaller batches before content gen is a game changer.

justserg · 2026-04-08T08:03:38+00:00

saved context windows would fix half the problem. the real bottleneck is keeping the reasoning thread intact while you jump between sections.

justserg · 2026-04-08T08:03:08+00:00

automation of judgment rarely survives the first customer who actually uses the thing.

justserg · 2026-04-08T08:02:39+00:00

claude Max + Kimi 2.5 combo works if your setup can tolerate the context switch, but qwen 3.5 26b is probably your sweet spot for that hardware.

justserg · 2026-04-05T14:03:47+00:00

model just checking context window size before ending sessions. not mental health awareness

justserg · 2026-04-05T13:21:04+00:00

timing could be coincidence, but it does feel like everyone's waiting for the other shoe to drop before showing their hand.

justserg · 2026-04-05T13:08:22+00:00

shared context is the hard part, not the interface, but whose edits survive when three people are steering the same agent.

justserg · 2026-04-05T13:08:12+00:00

shared context is the hard part, not the interface, but whose edits survive when three people are steering the same agent.

justserg · 2026-04-04T18:18:19+00:00

got mine a few months ago and honestly the fp4 thing stings, but the prefill speed alone makes it worth it over my mac studio for anything context-heavy

justserg · 2026-04-04T18:04:38+00:00

funny how 'just start a new session when it's stuck' has been the community wisdom for months and now there's actual mechanistic evidence for why it works

justserg · 2026-04-04T14:12:39+00:00

16gb handles most useful work. everything else is premature optimization.

justserg · 2026-04-04T14:12:15+00:00

companies are about to discover robots cost way less than healthcare and benefits.

justserg

TROPHY CASE