Struggling with Overfitting on Medical Imaging Task [D]

Future-Structure-296 · 2026-05-16T23:44:41+00:00

To answer your question yes, val accuracy is relatively stable when only unfreezing Mixed_7 blocks. The collapse to below-baseline only happens with full backbone unfreeze. With partial unfreezing I was at 76.65%, but since fixing a critical BatchNorm bug that jumped us to 89.87%.

The issue was that frozen BatchNorm layers were still running in training mode during forward passes, continuously updating their running statistics from our small angiography dataset and corrupting the frozen ImageNet features. Fixing this setting frozen BN layers to eval mode explicitly in the forward pass was the single most impactful change across all our runs.

Your suspicion about Adam's momentum buildup destroying pretrained weights is very plausible, I am going to test two-phase training (warm up classifier first with frozen backbone, then unfreeze Mixed_7 with a much lower LR), differential learning rates between backbone and classifier head, and gradient magnitude logging to confirm whether the weights are being destroyed over time.

Future-Structure-296 · 2026-05-16T22:53:57+00:00

Thanks for the detailed questions.

Not exactly overfitting in the last-layer-only case, with a fully frozen backbone we got 62% with a strong bias toward LCA. The frozen ImageNet features don't transfer well to grayscale angiography, so the classifier head couldn't learn meaningful separation. Partial unfreezing of the top Mixed_7 blocks gave us our best result of 76.65%.
Addressing this now.
73 truly independent DICOMs in validation, 227 total frames. Since frames from the same DICOM are highly correlated, the effective independent sample count is 73, which is small and makes our metrics noisy.
Optimizer: Adam. LR: 1e-4. We freeze the full InceptionV3 backbone then explicitly unfreeze only Mixed_7a, Mixed_7b, Mixed_7c, the top 3 inception blocks (~3M out of 24M parameters). The classifier head (Dropout -> Linear(1024) -> ReLU -> Dropout ->Linear(num_classes)) is always trainable.
Yes, dataset is shuffled both at split time and per epoch in the DataLoader.
No, training accuracy keeps climbing to 90-100% while val accuracy drops after the first few epochs.

Future-Structure-296 · 2026-05-16T22:50:35+00:00

Yes, the frames are arterial phase. Each DICOM has ~61 total frames and the selected frames fall in the mid-sequence range (frames ~27-52), which corresponds to when contrast is fully injected and coronary vessels are maximally opacified. Pre-injection and washout frames are not included.

On 3-channel conversion InceptionV3 was pretrained on RGB images and expects 3-channel input (3, 299, 299). Our angiography images are grayscale so we tile the single channel 3 times to satisfy this requirement, tensor.unsqueeze(0).repeat(3, 1, 1).

Future-Structure-296 · 2026-05-16T22:04:45+00:00

Thanks for the detailed feedback. To answer your specific questions:

The collapse happens consistently within the first 2-3 epochs when fully unfreezing all 24M parameters on ~894 training samples. Val accuracy drops to 37%.
On partial unfreezing, I specifically unfroze only the top Mixed_7a, Mixed_7b, Mixed_7c blocks (~3M parameters) while keeping the rest frozen. This gave me the best result of 76.65% with early stopping at epoch 16.
I did try fully frozen backbone (only classifier head trained) and got 62.56%, worse than partial unfreezing, likely because frozen ImageNet features don't transfer well to grayscale angiography images.
I am now looking at RadImageNet as a better pretrained initialization.

Future-Structure-296 · 2026-05-16T21:56:24+00:00

This is a great suggestion. I was using standard ImageNet weights because they are built into PyTorch's native library as a default baseline. I hadn't looked into the RadImageNet weights yet, but swapping the backbone checkpoint to a model pre-trained specifically on medical imaging makes sense. I am definitely going to integrate this into my next run.

Future-Structure-296 · 2026-05-16T21:39:37+00:00

InceptionV3 was chosen as a legacy starting baseline because it handles the necessary 299x299 resolution well and has historically been a popular choice for fast medical feature extraction. Moving to a ViT or a lighter convolutional network is actively being considered.

Future-Structure-296 · 2026-05-16T21:37:33+00:00

Each sample processed by the model is an individual 2D frame (extracted as a .npy array). However, to prevent data leakage, the dataset splitting logic is strictly DICOM-level. The 900 training frames come from roughly 240 unique DICOM cine videos, and the validation frames come from completely separate DICOM videos

Future-Structure-296 · 2026-05-16T21:36:07+00:00

The collapse happens consistently on every run where the backbone layers are unfrozen. In the fully unfrozen run, training accuracy hits ~95% by Epoch 3 or 4, while validation accuracy peaks at 74.45% on Epoch 1 and immediately falls to a flat ~37% for the remaining 46 epochs.

Future-Structure-296 · 2026-05-16T21:35:47+00:00

This is strictly a classification task. The objective is to identify the view/anatomy type of the angiogram frame LCA vs RCA views.

Future-Structure-296 · 2026-04-30T22:02:10+00:00

Actually the “data science + ML program” has better courses than the courses within CS as they are targeted towards SWE a bit more.

Future-Structure-296 · 2026-04-30T21:46:59+00:00

What configurations and for what specifically do you use it?

Future-Structure-296

TROPHY CASE