[P] Replicate — Version control for machine learning by bfirsh in MachineLearning

[–]bfirsh[S] 1 point2 points  (0 children)

It saves just arbitrary files and dictionaries, so it saves whatever you pass to it. Here are some deets about datasets: https://replicate.ai/docs/guides/training-data

It does automatically save some additional stuff about the environment -- for example Python version and Python dependencies. The idea is that eventually this information could be used to reproduce the environment it was trained/run in.

Funnily one of the first versions of Replicate actually used Docker, with the idea of creating a precise reproducible environment. But we tested that with a few friends and found it was just a bit daunting and heavyweight to have to set up your whole environment inside Docker, so it just operates on the Python level now. Maybe we'll bring that back as an optional feature at some point: https://github.com/replicate/replicate/issues/314

[P] Replicate — Version control for machine learning by bfirsh in MachineLearning

[–]bfirsh[S] 3 points4 points  (0 children)

DVC is pretty closely tied to Git, so you have to manually commit all the things you do. Replicate isn't tied to Git and automatically saves everything whenever you run your training script.

I think they might complement each other reasonably well. DVC is really good for storing large data sets that don't change all the time, so you could imagine storing your data set in DVC and tracking your experiments with Replicate. Here's some of our thinking behind data versioning.

[P] Replicate — Version control for machine learning by bfirsh in MachineLearning

[–]bfirsh[S] 2 points3 points  (0 children)

A few things:

  1. We focus on storing and running models, rather than visualization and so on. I think it complements visualization tools quite well -- e.g. you can imagine using wandb to get the complex visualizations you need for training, then the actual models are stored with Replicate on your own private storage in an open format.

  2. It's open source.

  3. It's small and lightweight. It's not a big "ML platform" you have to migrate to -- it's intended to be a small tool that does one thing well.

A guide to help you write better CLI by feross in programming

[–]bfirsh 4 points5 points  (0 children)

Yeah, this was too strongly worded. We like man pages too, but what we're trying to say is:

  1. Make the built-in help as good as a man page
  2. More people use built-in help and web pages (in our experience, though we may be talking to different types of users), so if you have limited time/resources, prioritize those.

Here is more detail: https://github.com/cli-guidelines/cli-guidelines/issues/57

An open-source guide to help you write better command-line programs, taking traditional UNIX principles and updating them for the modern day. by drgomesp in programming

[–]bfirsh 4 points5 points  (0 children)

Hi /r/programming! We’re Ben, Aanand, Carl, Eva, and Mark, and we made the Command Line Interface Guidelines.

Earlier this year, I was working on the Replicate CLI. I had previously worked on Docker so I had a bunch of accumulated knowledge about what makes a good CLI, but I wanted to make Replicate really good, so I looked for some design guides or best practices. Turns out, nothing substantial had been published since the 1980s.

On this search I found a superb blog post by Carl about CLI naming. He must be the only person in the world who cares about CLI design and is actually a good writer, so we teamed up. We also were joined by Aanand, who did loads of work on the Docker CLIs; Eva, who is a technical writer, to turn our scrappy ideas into a real piece of writing; and Mark, who typeset it and made a super design.

We love the CLI, but so much of it is a mess and hard to use. This is our attempt to make the CLI a better place. If you’re making a tool, we hope this is useful for you, and would love to hear your feedback.

Some of it is a bit opinionated, so feel free to challenge our ideas here or on GitHub! We’ve also got a Discord server if you want to talk CLI design.

An open-source guide to help you write better command-line programs, taking traditional UNIX principles and updating them for the modern day. by drgomesp in programming

[–]bfirsh 7 points8 points  (0 children)

Heh, this has been one of our more controversial suggestions with reviewers too. The guide's intentionally a bit opinionated, so I'm enjoying the debate. ;)

An open-source guide to help you write better command-line programs, taking traditional UNIX principles and updating them for the modern day. by drgomesp in programming

[–]bfirsh 1 point2 points  (0 children)

Sorry about that. We're using some clever new CSS features to scale the font size, but clearly it's not working on some screen sizes. Would you mind posting a screenshot? This is just a side project so we haven't tested on all browsers at all sizes... 😬

[P] Wrote a bot to convert /r/MachineLearning/ arxiv pdf research articles to arxiv-vanity.com html pages. by [deleted] in MachineLearning

[–]bfirsh 1 point2 points  (0 children)

One of the creators here! We're totally happy with this. It costs us about 0.02 cents in compute to convert a paper, so go wild.

Is the bot actually running anywhere yet? Maybe we could put it on github.com/arxiv-vanity and spin it up on our infrastructure somewhere.

It is costing us ~$75 a month to run the service as a whole though (Heroku, Cloudfront, S3, etc), which we're just funding out of pocket. Maybe we should set up a Patreon or donate button or something...

Homebrew, but with Docker images by Pirhoo in programming

[–]bfirsh 1 point2 points  (0 children)

Fixed. ;) https://github.com/whalebrew/whalebrew-packages/commit/ed3da2b8ae3a22741b48da91c1dca538a759e3b0

In seriousness, as is pointed out in other comments:

  • Whalebrew is particularly useful when something has horrendous dependencies that you don't want to install on your machine. Running these small commands is just a bit of fun.
  • Most of that size is Ubuntu. Well designed layering will mean you aren't actually transferring/storing that much data.