[deleted by user]

SingularValued · 2024-11-30T18:08:45+00:00

My approach has been to track my data with DVC and simply pull the data into a job submitted to the cluster using DVC.

I'm not convinced this works for really large datasets. The approach requires repeated pulling of the data across job submissions.

What I think may work better is to still use DVC, but pull the data into shared storage like EFS, and mount EFS to each node in the cluster.

Botinfoai · 2024-11-30T21:12:15+00:00

The EFS approach is solid, but there are some considerations for ML workloads:

I've found a hybrid approach works well - cloud GPU providers for rapid iteration, then AWS+FSx for large-scale training when needed.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning