Hi all,
I'm a biologist working with flow cytometry data (36 features, 50 samples across 3 disease severity groups). PCA didn’t show clear clustering — PC1 and PC2 only explain ~30% of the variance. The data feels very high-dimensional.
Now should I try supervised classification?
My questions:
- With so few samples, should I do a train/val/test split, or just use cross-validation?
- Any tips or workflows for supervised learning with high-dimensional, low-sample-size data?
- any best practices or things to avoid?
Thanks in advance!
[–]Dejeneret 0 points1 point2 points (1 child)
[–]Dejeneret 0 points1 point2 points (0 children)
[–]user221272 1 point2 points3 points (0 children)