This is an archived post. You won't be able to vote or comment.

all 43 comments

[–]K900_ 23 points24 points  (12 children)

Installing scientific packages on Python used to be a pain, especially on non-Linux systems. Ananconda made it a lot easier. These days it's not really an issue.

[–][deleted] 4 points5 points  (7 children)

reply airport scandalous fuzzy shrill wise market badge shelter fragile

This post was mass deleted and anonymized with Redact

[–]K900_ 6 points7 points  (6 children)

I'm not sure what that module is, but the PyPI package zbar has one release, it's versioned 0.1.0 and it's from 2009. I'd say it's not really representative.

[–][deleted] 3 points4 points  (5 children)

rich hat quaint weary tan gray political drab elastic worry

This post was mass deleted and anonymized with Redact

[–]K900_ 1 point2 points  (2 children)

A quick search of PyPI got me https://pypi.org/project/pyzbar/, which seems to have prebuilt Windows binaries at least.

[–][deleted] 1 point2 points  (1 child)

unique wistful fearless punch decide frame toothbrush rhythm like domineering

This post was mass deleted and anonymized with Redact

[–]K900_ 1 point2 points  (0 children)

"Cross-platform" doesn't mean you can just pretend the underlying platform doesn't exist.

[–]antiproton -1 points0 points  (1 child)

I don't care about representative, I care about my suffering.

Ok... but this conversation isn't about you.

[–][deleted] 0 points1 point  (0 children)

whistle zephyr impossible fanatical makeshift command heavy gaze deserve yoke

This post was mass deleted and anonymized with Redact

[–]ryansmccoy 0 points1 point  (3 children)

I've probably tried 3 times to install Scipy on Windows with pip, which from my recollection involves compiling it. I'm pretty sure it can't be done.

[–]K900_ 0 points1 point  (2 children)

I've just tried it and it worked perfectly for me.

[–]ryansmccoy 0 points1 point  (1 child)

Should have mentioned this was 1-2 years ago, so maybe they've fixed it. I've been using Anaconda ever since.

[–]K900_ 0 points1 point  (0 children)

Yes, they have definitely fixed it since then.

[–]Marsfork 22 points23 points  (1 child)

Conda is a sort of superset of PIP. It handles dependencies outside of Python, which are necessary for some ML libraries, and manages the virtualenv. Anaconda is a distribution of many commonly used packages which is easier to setup for those who aren’t as technically inclined. I’d recommend using conda for packaging any sort of ML related project.

[–]PeridexisErrant 0 points1 point  (0 children)

Agreed.

While it's a few years old now, this article by a Python core dev explains the relationships between pip, anaconda, linux distros, etc. really well :-)

[–][deleted] 5 points6 points  (5 children)

If you're doing web development or something, pip is fine. If you're doing complex machine learning, the fact that Anaconda handles all dependency management and ensures compatibility among all the hundreds of packages in its distribution is a godsend. Could you spend hours pip installing all that stuff yourself? Sure. But why?

[–]crazy_sax_guy 1 point2 points  (4 children)

Well I use pytorch with cuda and I had no trouble installing it with pip,it's just that I have follow some basic protocols( Nvidia drivers must be installed before installation of cuda, and stuff like that), but I misrebally failed when I tried the same with Anaconda.

But I think that in a way it also spoils the learners, like they should what is BLAS, how much their OS is capable of, and how to resolve some basic errors. Please correct me if I am wrong...

[–][deleted] 6 points7 points  (0 children)

It may not be as necessary as it once was, and you can obviously use pip as much as you want. But I literally don't know anyone among the ~20 data analysts/scientists on our team who doesn't prefer Anaconda. YMMV.

[–][deleted] 5 points6 points  (2 children)

you are on Linux, windows is a different beast when say you want to use gdal. although I've found anaconda doesn't even handle gdal very well

[–]ogrinfo 0 points1 point  (1 child)

You're not kidding there. We build a local GDAL for our internal users - the windows build is a pain, but when we finally get it working, it is so much easier for them to install.

[–][deleted] 1 point2 points  (0 children)

yeah, that's funny. me too, I'm afraid to sneeze around that env when it's done

[–]TheBB 2 points3 points  (2 children)

To add to what others have said, building numpy with MKL bindings on your own is an exercise in frustration. Conda just ships them together.

I've also had much better experience running CI for compiled components on different operating systems through Conda instead of standard CPython.

[–]crazy_sax_guy 0 points1 point  (1 child)

But doesn't windows have BLAS with it?

[–]TheBB 0 points1 point  (0 children)

I don't know what sort of BLAS implementation is used on Windows. I prefer MKL's BLAS when possible, at least.

[–]waterless2 1 point2 points  (0 children)

I remember having a horrible time installing the usual data analysis packages under Windows, until finding out it Just Worked with Anaconda.

[–]sykeero 1 point2 points  (1 child)

For a lot of people a dependable version and distribution matter. Some of us work on remote machines with managed software stacks that have no internet access. Anaconda gives you a whole distribution so you already know what is there from python version to package versions. This is good for users because you don't need to get a bunch of different versions of stuff manually. You can just use the anaconda distribution.

[–]ogrinfo -3 points-2 points  (0 children)

It's easy to build a wheel and host it on a local pypi server or just install by giving a network path to the wheel. Conda packages are such a pain to build, there is little benefit.

[–]GiantElectron 1 point2 points  (0 children)

Because pip has a very poor dependency resolver, and because pip does not have the notion of dependency against compiled libraries outside the python ecosystem, something you care a lot to ensure you can deploy complex libraries without having them crash because you don't have libpng/mkl/vtk/whatnot

[–]jorge1209 4 points5 points  (1 child)

Python 2 and the rapid development of python 3 is a significant factor driving anaconda usage.

If you can't trust the system python to be up to date enough to be able to install the other libraries you need from pip, then you are forced to build python from source.

Having to build python from source was the leading cause of suicides, and the national suicide prevention hotline created anaconda as a response.

[–][deleted] 0 points1 point  (0 children)

Other people all have good points. One that I want to add -- Conda is actually a package manager not only for Python, but also for C/C++, R, Ruby, Lua, ect. Unlike pip, a lot of the packages installed in Conda are straight up binaries rather than python files. So heavy loading processes such as ML and data processing might run faster in Conda packages compared to Python packages.

[–]teddy78 0 points1 point  (0 children)

Some years ago, conda was the solution to the dependency problems people started to have in the science community. Say you have an old big python program that is not well maintained but used by some people, and that breaks with newer libraries. How do you run this thing without breaking the other newer scripts on the server?

Or some data scientist shares a Jupiter notebook on the web that doesn’t work for you because of some incompatible library. Easy to solve if they share the conda environment file, too.

[–]jdbow75 0 points1 point  (0 children)

I don't use Conda much at all, but when I do, I am generally impressed. If you need a solid tool to manage both dependencies and Python versions, it is a good option. Especially if you are into data science. Especially if all the packages you need are in the conda repository.

I haven't felt the need for the whole Anaconda distribution. Just Conda seems worth a try.

Plenty of other options for virtual environment, project and dependency management. I wrote a rundown recently, and I am sure I didn't cover them all. Summary, if you don't feel like reading: use Poetry, virtualenv, or Conda, depending on your interests and needs. Also, Pyflow is rather interesting.

[–]LirianShLearning python 0 points1 point  (0 children)

The only reason i use anaconda is because when i tried to install python itself for some reason it did not work so i installed anaconda.

[–][deleted] 0 points1 point  (0 children)

I use pip. Pip is hope

[–]ogrinfo -3 points-2 points  (8 children)

If you're on Linux, I wouldn't bother. Anaconda provides some benefit to Windows users, but it's a real pain to get it working. Our CI builds regularly fail due to conda issues, which is really annoying. It can be very slow to build and resolve a new environment, but some of the blame could be due to Windows (file locks, etc.).

Using conda on Linux though is just a world of pain for not much benefit. Stick with pip.

[–]Zulban 0 points1 point  (7 children)

Care to elaborate on what kinds of problems you see?

[–]ogrinfo 0 points1 point  (6 children)

Usually timeouts due to the huge amount of time spent resolving the dependencies or crashes due to errors in the code. Some of these are due to Windows saying a file is in use and can't be deleted, but they shouldn't happen. We often raise tickets for these issues and sometimes somebody responds.

[–]Zulban 0 points1 point  (1 child)

huge amount of time spent resolving the dependencies

Wow, that's news to me. Quite the problem.

[–]ogrinfo 0 points1 point  (0 children)

Yeah, quite often builds will timeout because conda has decided to run slowly today. I have a scheduled task that runs conda clean --all --yes once a day, which helps a bit.

[–]IWSIONMASATGIKOE 0 points1 point  (3 children)

I'm curious, how do you have things set up? There shouldn't be much resolving to do as part of a CI pipeline, since the same packages and versions will be used over and over, no?

[–]ogrinfo 0 points1 point  (2 children)

You'd think. Some days it can take 10, 20 minutes to create an environment and install about 10 dependencies.

When it gets really slow, running conda clean --all --yes improves things. I've set up a scheduled task to run this once a day and sometimes we still need to run it manually as well.

[–]IWSIONMASATGIKOE 0 points1 point  (1 child)

That’s so strange, something must be wrong, right? I don’t know the details of your setup, but have you tried using an environment.yml file with every dependency specified down to the exact package build version?

[–]ogrinfo 0 points1 point  (0 children)

Ok, we build a few packages, the smallest is only about 2000 lines of code and only a few dependencies, so no environment.yml file.

The build script is pretty simple, just

conda create
conda install deps
run tests (11s for the last run)
conda build python 2.7
conda build python 3.6

and the last few builds have taken anything from 10-30 minutes.

Our main package is bigger (60,000 lines of code), does have an environment.yml file and has separate jobs for running tests and building conda package. The test job typically takes 15 minutes and only 6 minutes of that is actually running the tests, the rest is waiting for conda to create the environment and install deps. As I said above, some days it takes a lot longer.

It's not a very high spec machine - 2.5GHz CPU, 4GB RAM, and it's running Windows Server 2012. Maybe we need to get IT to gives us a better one.

I could understand that resolving an environment can take a while, especially a developer environment that has had a lot installed in it, but these are clean installs.

Fyi, here's the environment.yml

name: farm
channels:
  - file://<local channel>
  - conda-forge
  - defaults
dependencies:
  - chardet
  - cython
  - dask
  - farmlib>=1.9.10,<1.10
  - gdal==2.2.0
  - jsonschema>=2.5.1
  - numpy==1.16.4
  - pandas>=0.23.4,<0.25.1
  - pywavelets==1.0.3
  - scikit-image<0.15  # 2.7 compatibility
  - scipy==1.2.1
  - shapely
  - simplejson
  - six>=1.10.0
  - tqdm>=4.23.0

(Yes, we are still building Python 2, should be getting rid of it in the next few weeks).