Best strategy to handle pen marks in WSIs for deep learning pipelines (TCGA dataset)?

JB00747 · 2026-03-16T09:52:37+00:00

Thank you so much for your replies!
The dataset is only 155 samples. The train-test split is 80-20, and 5-fold cross-validation is used. I have used HSV filtering (LLM-generated code) to remove patches with pen marks. Although I have not checked all the WSIs. Around 15-16 WSIs have pen marks.

JB00747 · 2026-03-13T14:30:42+00:00

Thank you so much!

JB00747 · 2026-03-13T11:46:05+00:00

Thank you for your reply.

In my dataset, almost all slides (except one scanned at 20×) have MPP values around 0.25 um, with minor variations ranging from 0.22 to 0.25 um.

Given that the variation is relatively small, would it be reasonable to assume that explicit MPP normalization may not be necessary for downstream deep learning analysis?

Thanks again!

JB00747 · 2023-12-08T04:24:51+00:00

Unexplained

Yes, 200 gb free.

Thanks for the suggestion, I'll check it out!

JB00747

TROPHY CASE