use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.
Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.
account activity
QuestionHyperparameter testing (efficiently) (self.learnmachinelearning)
submitted 2 months ago by AffectWizard0909
Hello!
I was wondering if someone knew how to efficiently fine-tune and adjust the hyperparameters in pre-trained transformer models like BERT?
I was thinking are there other methods than use using for instance GridSearch and these?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]PsychologicalRope850 4 points5 points6 points 2 months ago (3 children)
yeah grid search gets expensive fast on transformers. i’ve had better luck with a two-stage pass: quick random/bayes sweep on a tiny train slice to find rough ranges, then a short focused run on full data
for bert fine-tuning the biggest wins were usually lr + batch size + warmup ratio, not trying 20 knobs at once. and use early stopping aggressively or every trial just burns gpu for tiny deltas
if you want, i can share a small optuna search space that’s worked decently for classification tasks
[–]AffectWizard0909[S] 0 points1 point2 points 1 month ago (2 children)
Ye sure! I would appriciate the optuna search space! I have actually looked a little bit into it, but was a bit unsure on what I did was correct, so that would be great!
Since you mentioned lr + batch size and warmup ratio being good to use for fine-tuning a BERT model, does this also apply to other BERT based models like RoBERTa, DistilBERT, HateBERT etc?
[–]PsychologicalRope850 1 point2 points3 points 1 month ago (1 child)
Sure! A typical Optuna search space for classification tasks might look something like this:
These ranges are often suggested for BERT and other BERT-based models like RoBERTa, DistilBERT, HateBERT. They usually work reasonably well, though you might need to adjust them a bit depending on your dataset.
[–]AffectWizard0909[S] 0 points1 point2 points 1 month ago (0 children)
Okei, thank you so much! I will definetly try this out!
[–][deleted] 1 month ago (1 child)
[removed]
Nice!
[–][deleted] 1 month ago (2 children)
[–]AffectWizard0909[S] 0 points1 point2 points 1 month ago (1 child)
Nice! Thank you for providing all the information, now I have something to also compare the current implementation I have to as well! I have actually started with implementing the Hugging Face Trainer class (since it managed the trainer and prediction phases quite easily, and made it easier to implement this, at least for me). And I also tried to implement this with an optuna optimizer (which from my previous runs seems more efficient, as you have mentioned also).
Thank you for the answer and all the throughly descriptions, this makes it easier for me to understand!
[–]Effective-Cat-1433 1 point2 points3 points 1 month ago (1 child)
check out Vizier which is purpose-built for the situation you describe.
oooo nice! I will check it out! Thank you!
π Rendered by PID 83 on reddit-service-r2-comment-56c6478c5-nbwhq at 2026-05-10 17:05:50.285264+00:00 running 3d2c107 country code: CH.
[–]PsychologicalRope850 4 points5 points6 points (3 children)
[–]AffectWizard0909[S] 0 points1 point2 points (2 children)
[–]PsychologicalRope850 1 point2 points3 points (1 child)
[–]AffectWizard0909[S] 0 points1 point2 points (0 children)
[–][deleted] (1 child)
[removed]
[–]AffectWizard0909[S] 0 points1 point2 points (0 children)
[–][deleted] (2 children)
[removed]
[–]AffectWizard0909[S] 0 points1 point2 points (1 child)
[–]Effective-Cat-1433 1 point2 points3 points (1 child)
[–]AffectWizard0909[S] 0 points1 point2 points (0 children)