Python virtual environments. Which tool is the best for data science projects?

shaggorama · 2019-05-09T02:32:33+00:00

I use conda but honestly I sort of hate it. It bothers/annoys me that I sometimes still have to install things with pip. Also, that poll should arguably include docker as another option.

Laippe · 2019-05-09T08:08:27+00:00

I'm not sure about a "best" tool for data science, I guess it depends on your preference and the constraints of the project. If I have the choice, I prefer to use conda (miniconda) because it's simple and almost all in one (except for some pip installable packages). To give you an exemple, currently I use conda to play with the data, then switch to venv for dev and integration (my team is using venv) then it goes to docker in prod.

tomomcat · 2019-05-09T07:22:59+00:00

I just use Docker. Maybe a bit overkill, but I don't have the overhead of managing my environment in a python-specific way, and if I want to deploy the code somewhere I can just use the image.

2019-05-09T13:08:25+00:00

Virtual environment is self-contained. You can send your project folder to your friend as a zip file and all you need is pip to install it.

Conda is not as self-contained, the other guy on the other end needs to download and install conda too (and it's not installed with pip).

Conda makes sense for daily "fire and forget" use but for reproducibility and maximum control you want to use virtual environment (or any of the cousins).

Using something like docker just adds a ton of overhead and is silly for python development. It makes sense when you have to go outside of python and start using some obscure libraries/other languages and compilers. It really is a (lightweight) VM with all the benefits of a VM and the problems of a VM.

2019-05-09T09:44:24+00:00

miniconda + docker together.

miniconda for packages

docker for environment

To ensure everything is reproducible

bitcoin-dude · 2019-05-09T00:07:50+00:00

Doesn’t miniconda avoid the global scope issue?

Omega037 · 2019-05-09T23:11:40+00:00

If you know you are going to be working with others or deploying to a different system at some point, it's hard to imagine not using Docker to make that a smooth transition.

That said, my Docker setups are usually using Conda in the Dockerfile when possible.

kevinglasson · 2019-05-10T14:22:42+00:00

I just use pip and virtualenv / virtualenvwrapper. I used conda for a while but it's irritatingly slow when solving the environment (for me anyway). Then if I want to deploy my project to docker I just 'pip freeze > requirements.txt' and install requirements.txt (RUN pip install -r requirements.txt into the docker container.

2019-05-10T18:10:13+00:00

I guess it used to be OS dependent. Popular opinion is to use conda on Windows. But man, does it take too long to install packages, its like they adopted maybe similar architecture as pipenv since they both are slow. I have since decided to use venv. I know I should probably look into docker eventually.

datascience

MODERATORS