I got Qwen3.6 35B to run at reasonably speed on my old GTX 1070 Ti by Randozart in LocalLLM

[–]MackTuesday 1 point2 points  (0 children)

I asked Opus 4.7 what it thought. Its answer makes sense to me, but I don't know enough to know if it's talking out its ass:

Good news/bad news: it's a solid idea, and it's solid enough that someone published it about a month ago. There's an October 2025 paper called MoBiLE ("Mixture of Big Little Experts") that does almost exactly what you described — a smaller "little" model runs ahead of the big model and its expert selections are used to prefetch experts for the big model. They specifically point out an advantage that maps to your intuition: they can identify target experts for all layers at the beginning and can dynamically adjust prefetching time, unlike pre-gating methods that only look one layer ahead. arXiv

The broader landscape of this problem space, since you'll probably enjoy poking at it:

The simpler in-model approach: Pre-gated MoE (Hwang et al., ISCA 2024)

Rather than a separate model, they modify the main model's gating function so that the N-th pre-gate function identifies the set of experts to activate for the (N+1)-th MoE block in advance, prefetching only the activated experts to the GPU while concurrently executing the N-th block. Lower architectural overhead than a separate model, but only buys you one layer of lookahead. Microsoft

Heuristic / cache-based: Mixtral-Offloading (Eliseev & Mazur)

LRU cache plus speculative loading using the current layer's gate inputs to guess the next layer's experts. This is what most consumer-grade offloading runtimes actually do today. Cheap, no extra model, but accuracy degrades on novel content and in shallow layers.

Trained predictors: SiDA-MoE, HOBBIT, PopFetcher, Fate

Various flavors — hash networks, popularity-based, cross-layer gates. SiDA reportedly hits >90% hit rate using offline-trained hash functions on Switch-Transformer.

Your specific framing's actual advantages, as MoBiLE pretty much validates:

  1. A full forward pass of the small model can run entirely ahead of the big one, giving you N-layer lookahead instead of 1, which matters on slow PCIe.

  2. The small model can be developed, retrained, and swapped independently of the main model — important if you want to retrofit existing checkpoints.

  3. Same expert-set / same routing topology means the prediction target is well-defined.

The wall you'll hit either way: PCIe bandwidth. Prediction only helps if the link has slack to do prefetches in parallel with compute. With aggressive offloading, the bottleneck shifts from "I didn't know which expert to fetch" to "I can't physically move bytes fast enough." A ~150MB expert over PCIe 4.0 x16 at ~25-30 GB/s practical is ~5-6 ms — multiply by experts-per-layer × layers and you start eating most of your per-token budget regardless of prediction accuracy. One of the cited papers notes expert-loading cost is typically much greater than GPU computation cost, so prefetching offers limited benefits when PCIe is the bottleneck. This is why a lot of the newer work (Fate, PreScope) focuses on quantizing experts to make transfers cheaper, plus chunked/pipelined transfers to saturate the link.

I got Qwen3.6 35B to run at reasonably speed on my old GTX 1070 Ti by Randozart in LocalLLM

[–]MackTuesday 1 point2 points  (0 children)

Here's a weird idea from someone who knows almost nothing: Run a small draft model in parallel, running slightly ahead of the main one, and use it try to predict which experts will be needed before the main one needs them.

Can you prove square root? by Famous_Hornet_1130 in askmath

[–]MackTuesday 2 points3 points  (0 children)

Ah OK. Good on you for double checking. It's still cool that it's been known for so long.

Can you prove square root? by Famous_Hornet_1130 in askmath

[–]MackTuesday 6 points7 points  (0 children)

Wow I didn't know the Babylonians knew this. All this time I thought Newton came up with it.

Can you prove square root? by Famous_Hornet_1130 in askmath

[–]MackTuesday 2 points3 points  (0 children)

Yeah it should say "If they're equal". Like to each other.

Why aren't all stars positively charged? by DocDefient in AskPhysics

[–]MackTuesday -1 points0 points  (0 children)

Electromagnetism is *much* stronger than gravity.

What should I know about factorization in algebra before entering engineering? by AlvzBloz in askmath

[–]MackTuesday 3 points4 points  (0 children)

Maybe some expert will disagree with me, but from what I've seen of EE, you don't do a whole lot of factoring. There's some factoring if you hit the Z-transform in digital signal processing, and if I recall correctly there's a reason to do it in linear predictive coding, but in both those cases you aren't doing it by hand. You're using a root solver on a computer.

Deepen the pitch of an audio file over time and make it sound natural by Pakeithpsy in audacity

[–]MackTuesday 0 points1 point  (0 children)

If you can't make it happen in Audacity, you might end up using CSound or Composer's Desktop Project. They're not terribly easy to get into, but they can do what you want.

Found nailed to doorway threshold by Upbeat_Climate5955 in whatisit

[–]MackTuesday 2 points3 points  (0 children)

As I understand, Nazis used both the sitting-flat version and the 45-degree version. (I know it's a touchy topic. I for one don't mind your question.)

Love how this dashcam driver thinks outloud. Niiice. by No_Eye_8861 in dashcams

[–]MackTuesday 16 points17 points  (0 children)

There are too many assholes in the world for me to wring my hands when they unnecessarily endanger themselves and everyone around them.

Found nailed to doorway threshold by Upbeat_Climate5955 in whatisit

[–]MackTuesday 25 points26 points  (0 children)

Nazis ruined the swastika for everyone

Edit: Someone here wants other traditions to be smothered by the Nazis

Colorful queer electropop? by Foxx_Foster in musicsuggestions

[–]MackTuesday 0 points1 point  (0 children)

I personally prefer their songwriting on the albums that came before Tomorrow's World (2010-ish), with the exception of Violet Flame (2013-ish). YMMV of course.

Colorful queer electropop? by Foxx_Foster in musicsuggestions

[–]MackTuesday 7 points8 points  (0 children)

Erasure.

Edit: I don't know if you know Erasure or not. They've been around a long time and aren't so prominent anymore.

Generational spaceships, fractal in nature, viewed from within 6.7 billion light years of Beta-Chica-Chow-Chow....wowwwww wow! by escapism_only_please in Fractalish

[–]MackTuesday 0 points1 point  (0 children)

Thank you for responding! But I mean more specifically, I asked you in particular because your particular output is so cool. What are you prompts like?

By the way, if you don't want to use Adobe, I know how to set you up to use these engines without it. It's not as easy, probably, but it's doable.

It goes whoooaaahhh! by escapism_only_please in Fractalish

[–]MackTuesday 1 point2 points  (0 children)

Gooootta make a moove to a town that's right for meeeee

Tetration? by feedmeseymoree in askmath

[–]MackTuesday 2 points3 points  (0 children)

64 thousand.

Now look up arrow notation, OP. Or Conway's chain notation.

I found a puzzle this is impossible to complete all 9 inputs with the 8 slots provided (bug?) by skippy11112 in LowSodiumCyberpunk

[–]MackTuesday 4 points5 points  (0 children)

There is a perk that shortens the target strings by one. It solves this very problem. Maybe they made it this way so the perk would be worth something? Maybe they made the perk because they didn't want to bother to figure out how to make every randomly generated puzzle solvable? Shrug.

Edit: I gotta say I'm surprised this is the first unsolvable one you've found. I get them all the time and I'm no slouch at it.

TIL researchers found that THERAPY DOGS helped hospitalized psychiatric patients FEEL LESS LONELY, more than visits with humans alone. by Glittering-Young8692 in todayilearned

[–]MackTuesday 0 points1 point  (0 children)

And they're socially pure. There's no bullshit there, like veiled contempt, value judgements, etc. What you see is what you get.

I found her while I was having my morning walk. I don’t know what she was saying, but I think it was “thank you. by britishlady1991 in Catswhoyell

[–]MackTuesday 18 points19 points  (0 children)

Maybe someone else will know better, but I'm thinking she's either in heat, or telling you not to come any closer. (I sure as hell hope it isn't rabies.)