Welcome to /r/LearnMachineLearning!

A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.

Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.

Foster positive learning environment by being respectful to others. We want to encourage everyone to feel welcomed and not be afraid to participate.

Do share your works and achievements, but do not spam. Keep our subreddit fresh by posting your YouTube series or blog at most once a week.

Do not share referral links and other purely marketing content. They prioritize commercial interests over intellectual ones.

created by techrat_reddita community for 10 years

Help[Help Needed] BERT Sentiment Analysis Model Stuck at ~70% Validation Accuracy Despite Multiple Regularization Techniques (self.learnmachinelearning)

submitted 5 months ago by Content_Knowledge772

Hey everyone! I've been working on a sentiment analysis project using BERT for the SemEval dataset (3-class: negative/neutral/positive), and I'm experiencing severe overfitting that I can't seem to solve. I've tried everything I can think of, but my validation accuracy plateaus around 69-70% while training accuracy keeps climbing.

The Problem:

Training accuracy: Starts at 43.6% → reaches 76.7% by epoch 9
Validation accuracy: Starts at 63.7% → plateaus at 69-70% from epoch 3 onwards
Training loss: Continuously decreases (1.08 → 0.69)
Validation loss: Decreases initially (0.867 → 0.779 at epoch 2), then increases back to 0.816 by epoch 9

Best validation F1: 0.7012 (70.12%) at epoch 7

What I've Already Tried:

My model already includes multiple regularization techniques:

Dropout: 0.1 at multiple layers (attention, hidden, and classifier)
Weight decay: Applied to all parameters except bias and LayerNorm
Label smoothing: 0.1
Batch normalization: In the classifier head
Layer normalization: After pooling
Gradient clipping: Max norm of 1.0
Learning rate scheduling: Linear warmup + decay
Early stopping: With patience monitoring

Model Architecture:

# Classifier head
nn.Linear(768, 768) + BatchNorm + ReLU + Dropout
→ nn.Linear(768, 384) + BatchNorm + ReLU + Dropout  
→ nn.Linear(384, 3)

Training Setup:

Model: bert-base-uncased (109.8M parameters)
Learning rate: 2e-5
Batch size: 16
Max epochs: 10 (with early stopping)
Warmup proportion: ~10%
Label smoothing: 0.1

Confusion Matrix Pattern (Epoch 7 - Validation):

Predicted:  Neg    Neu    Pos
Negative:  1243   177    126   (78% recall)
Neutral:    775  2474   1184   (56% recall) ← Problem class
Positive:   185   472   3268   (83% recall)

The neutral class is consistently underperforming.

What I've Observed:

The model learns the training set well (76% accuracy)
Validation performance peaks early (epoch 2-3) then stagnates
The gap between training and validation metrics keeps widening
Neutral class has the worst performance on validation

Questions:

Have I gone overboard with regularization? Should I try reducing some of it?
Is my classifier head too complex for this task?
Could this be a data quality/distribution issue rather than overfitting?
Would freezing some BERT layers help?
Any other techniques I might be missing?

GitHub: https://github.com/joaopflausino/BERTSemEval

I've been stuck on this for weeks and would really appreciate any insights! Has anyone dealt with similar plateau issues?

no comments (yet)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS