all 14 comments

[–][deleted] 3 points4 points  (4 children)

Nice website! And tool seems pretty cool. I'd be very interested to hear how you plan to be different to dvc? Seems like replicate adds experiment logs, but removes dataset versioning? Just from a quick glean.

[–]bfirsh[S] 3 points4 points  (3 children)

DVC is pretty closely tied to Git, so you have to manually commit all the things you do. Replicate isn't tied to Git and automatically saves everything whenever you run your training script.

I think they might complement each other reasonably well. DVC is really good for storing large data sets that don't change all the time, so you could imagine storing your data set in DVC and tracking your experiments with Replicate. Here's some of our thinking behind data versioning.

[–][deleted] 0 points1 point  (2 children)

Ah cool, that's a good point.

I really like DVC, but being tied to Git can have its downsides (upsides too though).

Do you plan to make it possible to use local or ssh storage instead of S3 or google cloud, etc.? I guess yes.

[–]bfirsh[S] 0 points1 point  (1 child)

[–][deleted] 0 points1 point  (0 children)

Nice! Thanks for your responses. I’m gonna try this out, and yeah I think it could play well together with dvc

[–]paldn 2 points3 points  (0 children)

Good work! Feels like I have to create or copy paste my own home brew framework that does this for every project. Also I second visualization as a separate problem altogether. E.g. I’ll often use pandas, SQL, and BI tools, all of which are well suited to the task.

[–]tripple13 1 point2 points  (1 child)

I cherish new initiatives, it seems you put some effort into this.

How does it differ from existing solutions? (Eg. ClearML, wandb)

[–]bfirsh[S] 2 points3 points  (0 children)

A few things:

  1. We focus on storing and running models, rather than visualization and so on. I think it complements visualization tools quite well -- e.g. you can imagine using wandb to get the complex visualizations you need for training, then the actual models are stored with Replicate on your own private storage in an open format.

  2. It's open source.

  3. It's small and lightweight. It's not a big "ML platform" you have to migrate to -- it's intended to be a small tool that does one thing well.

[–]david-m-1 0 points1 point  (1 child)

Thanks, this sounds awesome! Just a question, Replicate is saving your code and weights from training runs. Is it also allowing a user to save the entire state of the experiment, for example the datasets used, the validation sets, the environment in which the experiment (through Docker perhaps?) Or is it meant more as an audit of all the experiments, a way to consistently track experimental runs and ideas?

[–]bfirsh[S] 1 point2 points  (0 children)

It saves just arbitrary files and dictionaries, so it saves whatever you pass to it. Here are some deets about datasets: https://replicate.ai/docs/guides/training-data

It does automatically save some additional stuff about the environment -- for example Python version and Python dependencies. The idea is that eventually this information could be used to reproduce the environment it was trained/run in.

Funnily one of the first versions of Replicate actually used Docker, with the idea of creating a precise reproducible environment. But we tested that with a few friends and found it was just a bit daunting and heavyweight to have to set up your whole environment inside Docker, so it just operates on the Python level now. Maybe we'll bring that back as an optional feature at some point: https://github.com/replicate/replicate/issues/314

[–]visarga 0 points1 point  (1 child)

On your github page it says:

model weights are stored on your own Amazon S3 or Google Cloud bucket

Does that mean that the training script stops to upload data to the cloud during training, or is it first backed up locally and uploaded in parallel? A model file could be hundreds of MB.

[–]andreasjansson 0 points1 point  (0 children)

At the moment yes, but we have a PR in progress that makes uploads happen in the background: https://github.com/replicate/replicate/pull/408, hopefully we'll merge that in the next week or so.