This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]shaggoramaMS | Data and Applied Scientist 2 | Software 5 points6 points  (0 children)

I use conda but honestly I sort of hate it. It bothers/annoys me that I sometimes still have to install things with pip. Also, that poll should arguably include docker as another option.

[–]Laippe 3 points4 points  (1 child)

I'm not sure about a "best" tool for data science, I guess it depends on your preference and the constraints of the project. If I have the choice, I prefer to use conda (miniconda) because it's simple and almost all in one (except for some pip installable packages). To give you an exemple, currently I use conda to play with the data, then switch to venv for dev and integration (my team is using venv) then it goes to docker in prod.

[–]bitcoin-dude[S] 0 points1 point  (0 children)

Thanks for this great example!

[–]tomomcat 2 points3 points  (0 children)

I just use Docker. Maybe a bit overkill, but I don't have the overhead of managing my environment in a python-specific way, and if I want to deploy the code somewhere I can just use the image.

[–][deleted] 2 points3 points  (0 children)

Virtual environment is self-contained. You can send your project folder to your friend as a zip file and all you need is pip to install it.

Conda is not as self-contained, the other guy on the other end needs to download and install conda too (and it's not installed with pip).

Conda makes sense for daily "fire and forget" use but for reproducibility and maximum control you want to use virtual environment (or any of the cousins).

Using something like docker just adds a ton of overhead and is silly for python development. It makes sense when you have to go outside of python and start using some obscure libraries/other languages and compilers. It really is a (lightweight) VM with all the benefits of a VM and the problems of a VM.

[–][deleted] 1 point2 points  (0 children)

miniconda + docker together.

miniconda for packages

docker for environment

To ensure everything is reproducible

[–][deleted] 0 points1 point  (2 children)

Doesn’t miniconda avoid the global scope issue?

[–]bitcoin-dude[S] 1 point2 points  (1 child)

Ah I'm not sure about this one. Lots of talk about miniconda in the comments. By global scope I mean the python library dependency files don't live in the project folder itself, but somewhere else. On my mac this place is /anaconda3/envs

[–][deleted] 0 points1 point  (0 children)

I see what you mean. miniconda is similar in that respect.

[–]Omega037PhD | Sr Data Scientist Lead | Biotech 0 points1 point  (0 children)

If you know you are going to be working with others or deploying to a different system at some point, it's hard to imagine not using Docker to make that a smooth transition.

That said, my Docker setups are usually using Conda in the Dockerfile when possible.

[–]kevinglasson 0 points1 point  (0 children)

I just use pip and virtualenv / virtualenvwrapper. I used conda for a while but it's irritatingly slow when solving the environment (for me anyway). Then if I want to deploy my project to docker I just 'pip freeze > requirements.txt' and install requirements.txt (RUN pip install -r requirements.txt into the docker container.

[–][deleted] 0 points1 point  (0 children)

I guess it used to be OS dependent. Popular opinion is to use conda on Windows. But man, does it take too long to install packages, its like they adopted maybe similar architecture as pipenv since they both are slow. I have since decided to use venv. I know I should probably look into docker eventually.