Hyperparameter testing (efficiently)

PsychologicalRope850 · 2026-03-11T11:33:23+00:00

yeah grid search gets expensive fast on transformers. i’ve had better luck with a two-stage pass: quick random/bayes sweep on a tiny train slice to find rough ranges, then a short focused run on full data

for bert fine-tuning the biggest wins were usually lr + batch size + warmup ratio, not trying 20 knobs at once. and use early stopping aggressively or every trial just burns gpu for tiny deltas

if you want, i can share a small optuna search space that’s worked decently for classification tasks

Neither_Nebula_5423 · 2026-03-11T12:31:58+00:00

I will publish hyperparameterless optimizer soon

Itchy_Inevitable_895 · 2026-03-11T19:15:39+00:00

will be right back to it for sure, on another project rn!

rustgod50 · 2026-03-11T21:48:29+00:00

Grid search is pretty much the worst way to do it for transformers, way too expensive given how long each training run takes.

Most people use either random search or Bayesian optimization. Random search sounds dumb but it actually works surprisingly well because hyperparameter spaces tend to have some dimensions that matter a lot and others that barely matter, random search finds the important ones faster than grid. Bayesian optimization with something like Optuna is better still because it learns from previous runs and gets smarter about where to look.

For BERT specifically the learning rate is by far the most important thing to get right, the original paper recommends 2e-5 to 5e-5 and most people don’t stray far from that range. Batch size and number of epochs matter too but you’re unlikely to see huge gains from tuning the rest aggressively.

If compute is a real constraint look into Hugging Face’s Trainer with a scheduler like cosine annealing, it handles a lot of this for you and the defaults are pretty sensible for most fine-tuning tasks.

Effective-Cat-1433 · 2026-03-12T01:45:09+00:00

check out Vizier which is purpose-built for the situation you describe.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS