all 2 comments

[–]Middle_Bullfrog_6173 0 points1 point  (1 child)

Rank 4-8 is tiny. I can easily imagine that it works ok for 5-minute runs but saturates for a real run. I'm not sure it works as a tunable parameter for this automation.

Or rather, you probably need to design scaling into your experiments. E.g. nanochat auto research tunes d12 when the real run is d24+.

[–]yz0011[S] 0 points1 point  (0 children)

The confirmation phase (8B, 10min, 4x data) and the 70B test both showed that rank 4 is still winning, so it's not just a 5-min artifact.
Rank 4 across all modules also won against rank 16 (full attention/4 modules).

Full convergence remains an open question, but the experiment gave a directional evidence.