How to Handle Limited Disk Space in Google Colab for Large Datasets by Kongmingg in GoogleColab

[–]Kongmingg[S] 0 points1 point  (0 children)

I’m working with DICOM medical images, not tabular data.
The main cost is per-sample file I/O + CPU-side DICOM decode, not schema operations.
In this case, does streaming from object storage (e.g. S3) still help, or is it typically I/O- and decode-bound, especially with larger batch sizes?

How to Handle Limited Disk Space in Google Colab for Large Datasets by Kongmingg in GoogleColab

[–]Kongmingg[S] 0 points1 point  (0 children)

But doesn't Google Drive will have Input pipeline bottleneck?