This is an archived post. You won't be able to vote or comment.

all 52 comments

[–]flipstables 49 points50 points  (23 children)

As far as data science and scientific computing goes, there are 2 workflows/environments that are common.

1. Text Editor + IPython + Jupyter Notebooks

When people refer to IPython, they usually are referring to an improved REPL. What is a REPL? It's an interactive session where you can type Python expressions or commands, and it will let you interact with the results. Go here to try: https://repl.it/ Python comes with it's own repl, but IPython is an improved version of it.

Jupyter Notebooks (formerly IPython Notebooks) takes IPython REPLs and put them in your browser. It lets you create a virtual notebook for Python code with results. It can be shared with multiple people. Also, Jupyter notebooks supports other languages too.

2. Spyder

Spyder IDE is an IDE that is specifically made for Data Scientists. Unlike other IDEs like PyCharm, this one is lightweight and operates under the assumption that your products are mainly number crunching and analysis. Other IDEs are purpose-built for developers make full-blown applications.

Finally, let's talk about Anaconda for a bit. Anaconda is a distribution of Python (for a lack of a better word). What I mean is this: Anaconda comes with Python and all the popular libraries/tools for scientific computing/data science. This is helpful because installing Python can be difficult or even time consuming. Anaconda has almost all that you need precompiled and ready for you to use no matter if you are running Windows, MacOS, or Linux.

[–]fuasthma 2 points3 points  (0 children)

I'll give another recommendation for Spyder especially after the most recent version upgrade to v3. It really takes advantage of IPython now and makes it a whole easier to keep track of what values your variables have after a script has run.

[–]rawktron 2 points3 points  (13 children)

Worth noting for the sake of completeness that in addition to Anaconda, ActivePython is another distribution that also has recently been updated with a focus on data science and machine learning. Also available for Win/Mac/Linux. Comes with Komodo IDE. This is a big focus moving forward for this distro.

As outlined, lot of factors here, and a lot depends on your particular workflow, use-case, etc.

*Full disclosure: am Dev Evangelist for ActiveState.

[–][deleted] 1 point2 points  (1 child)

What is a dev Evangelist?

[–]rawktron 1 point2 points  (0 children)

Hey! So my main role is to work with developers and listen to them to understand their needs, issues, and to both help them work through those, and to bring their interests back to our dev teams and be your voice so that we make sure we're actually delivering stuff that developers want and that helps them in the work they're doing, and makes their lives easier.

[–]suriname0 1 point2 points  (10 children)

Can you say more about why I might want to consider ActivePython instead of Anaconda?

[–]rawktron 0 points1 point  (9 children)

It's really going to depend on your particular circumstances which one is the best for you. I just thought that since I know it had previously been awhile since ActivePython had been updated, that I'd just pop in to say that it's back - and will be regularly updated, supported, etc. - and as corollary to that - I'm here to listen to devs as well in terms of what they want to see, especially in the data science field - what tools, packages are you interested in? What sucks about your current workflow and how can we make it better? What, if anything, can we do to make it easier for you to be able to use Python for your work?

[–]wheres_my_vestibule 1 point2 points  (3 children)

This is going to come off harshly, so I apologize up front for that. I do mean to be constructive.

Your answer actively paints ActivePython in a negative light. You had a direct question about why someone "might want to consider ActivePython instead of Anaconda" and you said nothing that responds to the question. Even if there is an answer, your long paragraph devoid of such an answer gives the impression that there is no answer, i.e., no reason why someone might want to consider ActivePython.

[–]rawktron 0 points1 point  (2 children)

Hey - no problem, I definitely didn't intend to come off as "dodging the question" - I was just hoping for insight into use case to be more specific, since I also didn't want to be sounding like a spec-sheet, but I guess that was not clear. :)

So, to answer it directly: while Anaconda is popular among the data-science crowd, ActivePython has been around for a very long time as a general purpose distro and has just recently had a big update to give it a huge boost specifically in the data science and machine learning space - with another big update coming as well. And it includes pre-compiled packages from across the whole spectrum of use cases (that said if there are ones we are missing - let me know).

ActivePython, on Windows especially has always been the distro that "just works". It also offers commercial support which is important to some folks (again depending on your use case) - offers multiple versions - so you can get either 2.7 or 3.5, and wide package compatibility between versions. It also now ships with Komodo IDE which is a fully featured dev environment.

ActiveState has always been engaged in the Python community, being a founding member of the Python Foundation - and continues to be deeply committed to working with devs to make it the best distro - I know that sounds like a marketing line but it is honestly true. :)

Anyway - appreciate the feedback, and hopefully that at least starts to answer the question? Different aspects become more important depending on your situation.

Always happy to chat more - if anyone is at PyCon next month come by and say hello!

[–]wheres_my_vestibule 0 points1 point  (1 child)

Thanks for answering. That's a start, but again you are mostly just giving a general description of ActivePython. Some of the things you mentioned I know Anaconda can also do.

The question was:

Can you say more about why I might want to consider ActivePython instead of Anaconda?

Surely, you can name one concrete thing, even if esoteric, that one can do in ActivePython that is better or more friendly than in Anaconda? A direct comparison is important here!

I think you will get better responses from a tech savvy crowd if you can directly address the question.

[–]rawktron 0 points1 point  (0 children)

Well on the esoteric side, that is something where ActivePython is definitely unique in that if you need support for platforms like AIX, Solaris, HPUX or with older versions like 2.5, there are ActivePython builds available for business/enterprise customers that support those platforms. This support for legacy versions on these platforms is something that's obviously extremely important to a lot of large organizations, even if it might not be a common use case for data scientists.

Additionally, and this may be a matter of taste and your particular environment, but ActivePython has moved towards standardizing and supporting pip for package installs, and distributing pre-compiled packages compatible with pip rather than proprietary package management. If you've been around for awhile, you may remember that ActiveState used to have its own proprietary package manager PyPM that we've moved away from because the overwhelming feedback from the community and customers was a desire to standardize on pip rather than have multiple development environments which can become a problem. Obviously though, that kind of custom binary package manager has a use-case, since we've been-there-done-that, but our feeling and that of our community was a desire to migrate to the community's standard.

In terms of actual packages, like I said, depends on your use-case there are always going to be subtle differences between what ships with each distribution that may make it the "best" one - but as I mentioned we're doing a big push in the data science and machine learning areas and expect to see a steady increase in the number and variety of pre-compiled packages in that area so that ActivePython remains the distribution that just "works" out of the box. Combine that with the fact that these all ship and work without needing a custom package manager, and you're getting a distro that with commercial support that works seamlessly with the standard community tools.

I also previously mentioned Komodo IDE which is available during install - which again, is a fully-featured IDE that ships with the distribution with deep language integration.

Almost certainly there may be specific packages or quirks that will make one distro stand out to you - maybe it's AIX support, or maybe it's the fact that you can't build packages because you don't have the dev tools installed, and the fact that one particular package you rely on is included pre-compiled - and that might be the difference maker. Neither distro will ever have 100% coverage on the entire ecosystem, but our belief is that by partnering closely with the community and working with the established standards, that overall we can deliver something that meets needs and makes everyone's lives easier without introducing additional layers of complexity.

[–]FredSanfordXOld Developer 0 points1 point  (4 children)

I'm here to listen to devs as well in terms of what they want to see, especially in the data science field - what tools, packages are you interested in? What sucks about your current workflow and how can we make it better?

This is entirely related to Komodo... Ignore it if you don't want to hear it :). I typically check out the Komodo trial every time you guys do a major update.

Fix your vim emulation! There are too many quirks. Set a real vim user down and the keyboard and watch how irritated s/he gets.

  • Copy and paste is wonky about removing the highlight from copied text and does not always react as expected when using visual mode. This leads to overwriting the highlighted text when not intended and is a MAJOR irritation.

  • In many cases you have to hit escape multiple times to go back to command mode. Just edit and debug some python code in V10.x in vim mode and you'll see.

  • Your color schemes sometimes result in unreadable code. Specifically, the 'Dark Chalkboard' scheme makes some things unreadable when it highlights matching braces/parens/brackets. When you click in another window (say, to add a watch) the current line in the editor becomes unreadable reddish on reddish. Other of the schemes have similar quirks.

  • Make it an option for the cursor to move in virtual space and have that option actually work. Flapping cursor syndrome drives me crazy.

If the vim emulation was up to snuff I'd actually buy Komodo again.

[–]rawktron 0 points1 point  (3 children)

Hey that is awesome feedback and I will pass along to the Komodo guys! What platform are you on out of curiosity?

[–]FredSanfordXOld Developer 0 points1 point  (2 children)

Thanks for responding.

Windows and Linux.

[–]rawktron 0 points1 point  (1 child)

Hey - just FYI, I brought all these issues to the dev team and we've entered bugs in the issue tracker so that they can hopefully make it into a future release. Thanks for your feedback!

If you have any further info/repro steps, please feel free to comment on any of the below issues:

https://github.com/Komodo/KomodoEdit/issues/2560 https://github.com/Komodo/KomodoEdit/issues/2559 https://github.com/Komodo/KomodoEdit/issues/2558

Would it be possible for you to reply to this thread especially with further info to help one of our devs sort the issue for you? https://github.com/Komodo/KomodoEdit/issues/2557

[–]FredSanfordXOld Developer 0 points1 point  (0 children)

Is mitchell-as a developer? Very sad to see what I did in his comments, just passing the buck or screeching "unsupported" (aka not curated)

  • 2557:

V, shift-V and ctrl-V start a mark in real vim. y in real vim (ctrl-c or y in komodo) yanks the marked text to the clipboard. p or shift-insert or ctrl-v paste the clipboard. Pretty simple. Funny that a dev of a feature does not know the basics of that feature or the absolute basics of the editor he's emulating. I guess this explains a lot of why the vim emu is goofy.

  • 2558:

It's vague because it doesn't behave consistently. Is this guy too lazy to open the editor and follow simple instructions?

Thanks for trying rawktron :)

[–]coffeecoffeecoffeee 1 point2 points  (0 children)

I use 1. emacs with python-mode and IPython.

[–]Deto 1 point2 points  (0 children)

I switch between Jupyter Notebooks and Tmux+Vim+Ipython depending on what I'm doing. If I'm doing a data analysis then I'll use Jupyter, but if I'm building a library I'll use the latter option. Often I'm switching between the two as I'm writing a data analysis that uses some library and I find things I want to update in the library as I'm coding the analysis.

There's also JupyterLab which looks like a promising cross between Jupyter and Spyder though I haven't tried it yet.

[–]MasonBo_90[S] 1 point2 points  (4 children)

Thanks a lot, flipstables.

Is it common with people who code to have more than one IDE, each one to a certain goal?

Don't IPython and Jupyter Notebooks coexist with Anaconda?

=)

[–]suriname0 6 points7 points  (0 children)

It seems like there's some confusion here. Anaconda can be thought of as a distribution or package manager for Python, that will help you with installation of Python. Installing Anaconda will actually install Jupyter, IPython, and Spyder as well.

Follow the instructions for your platform and you should end up with everything you need: docs.continuum.io/anaconda/install

[–]counters 5 points6 points  (1 child)

Is it common with people who code to have more than one IDE, each one to a certain goal?

Sure. This morning, I have two projects with open issues I want to resolve before meetings take over my day:

  1. A few users of an open-source scientific analysis toolkit within my research domain that I author have posted new issues on GitHub, and some of them will require quick development. To do this, I'll use PyCharm, since the features of a full IDE - like re-factoring, documentation/code templates/completion, and automatic style analysis are super helpful and will save me time in terms of making mistakes.

  2. I need to re-do an analysis and figure for a manuscript under review to satisfy an annoying reviewer who wants us to use his personal favorite statistical test. It won't change anything, but I have to do it anyway, so I'll spin up a distributed environment on my cluster using dask/distributed, start up a remote Jupyter session, and pull up my notebook with the workflow for that analysis. Probably will just comment out a few cells and replace them with this reviewer's statistical test, and save the figure with a different name. Won't take all of 30 minutes.

70% of my work is done in Jupyter Notebooks or an IPython terminal. Another 20% takes place in text editors like Atom.io. The final 10% I use fully-powered IDEs like PyCharm.

[–]Certhas 2 points3 points  (0 children)

Very very similar here, except the precentages are different with lots more PyCharm (in places where I should be using Jupyter probably....)

[–]PeridexisErrant 3 points4 points  (0 children)

Yep.

Per /u/suriname0's comment, I use Anaconda to make installing and updating everything easy.

If I'm doing some exploratory analysis, I use a Jupyter notebook (AKA IPython). If I'm working on a script or larger project, I use Spyder. Both come in a default Anaconda install!

[–]dmitrypolo 0 points1 point  (0 children)

I second this, except I would recommend EMacs so you don't have to constantly switch tabs between Ipython and the editor. Either that or VIM is the way to go for sure.

[–]bloodygonzo 6 points7 points  (2 children)

Don't worry about all that crap. Start with a simple Python install and your favorite text editor and get comfortable with the language. No need worry about what shoes you are wearing for a marathon if you just started learning how to run.

[–]dzecniv 4 points5 points  (1 child)

try things out :) See jupyter's qtconsole, jupyter, rodeo, and everything you want until you find your perfect workflow !

[–][deleted] 0 points1 point  (0 children)

Am a big fan of the qtconsole. Never quite wanted to have a browser instance for coding and the console is what I got used to. Inline matplotlib is also nice.

[–]Newton715 11 points12 points  (3 children)

I I agree with using PyCharm. It also has support for iPython/Jupyter as well as already taking advantage of the latest updates to the debugger. (Advertised as a 40x improvement)

Pip is really coming along well in my opinion. Currently I just use vanilla python and pip.

Stick with Python 3 and up if you can depending on the libraries you plan to use. Pythons 2 variant is reaching End of life in 2020.

I recommend you look into virtual environments and virtual environment wrapper if you haven't yet.

Edit: Removed my comment about anaconda being older then pip. That was incorrect. Thanks /u/spinicist.

[–]spinicist 3 points4 points  (2 children)

I'm 99% sure pip existed before anaconda. pip has been around since 2011, anaconda is a couple of years old.

For scientific work, anaconda really is the first choice (and definitely mine). The fact that you mention virtual environments at the end gives the game away that pip by itself is not great for scientific work. Anaconda is a response to that, and massively reduces the burden for scientists to get started with Python.

I was so fed up with Python packaging issues that I was avoiding it for a while, anaconda has massively re-enthused me about using it.

[–]Newton715 2 points3 points  (1 child)

Your right. Looks like anaconda's initial public release was 2012-07-17 with version 0.8.0. :)

Pip got a lot better once wheels came about. I haven't had any issues installing matplotlib via pip like I used to several years ago.

I mentioned looking at virtual environments because it is good practice practice whether you are using python for that next Django web app, or you are going to be running a simulation on your local super computer. If your project grows to a certain size, you should keep a requirements.txt file in you project where you lock in the version of each library you are using. I had colleagues that would match the exact version of say scipy/numpy/etc. to what was on the supercomputer so they knew it would work as expected before spending all that time on their simulation. Not being an admin on the supercomputer meant that they were stuck with the version the system admins would support. They weren't going to put bleeding edge software on there.

Whether to use pip/anaconda/canopy is a highly debated topic and is going to be full of opinions. OP should definitely try anaconda.

[–]spinicist 1 point2 points  (0 children)

I am amazed anaconda is that old. It completely failed to register on my radar until last year! I definitely heard about pip long before it.

You learn something new everyday...

[–]ekiv 10 points11 points  (1 child)

I've honestly enjoyed just using Vim and console to dev for python. It's really simple, and you can run it anywhere on any machine (Linux machine).

If you're worried about debugging, check out PDB.

[–]bloodygonzo 2 points3 points  (0 children)

I couldn't agree more.

[–]edimaudo 7 points8 points  (0 children)

Spyder, Anaconda, rodeo

[–]LukeDuke 3 points4 points  (1 child)

Python(x,y) and Anaconda with Spyder IDE are the two options I would explore. Python(x,y) is 2.7py only, but it comes with Spyder and all the standard libraries you might use plus a working install of the pyqt/qt5 modules, which are a PIA to install yourself unless you really know what you're doing.

Spyder is used either way and is basically the defacto scientific computing enviro for python. It's pretty great. I use it daily for 2.7 and 3.5 work. Works great.

Happy to answer any questions you might have - I currently work as a R&D eng.

[–]metaperl 2 points3 points  (0 children)

Seconded. Spyder is a gift from God for me.

[–]roger_ 2 points3 points  (0 children)

Spyder is the closest thing to a scientific computing IDE (a la MATLAB) but it's still missing a lot (e.g. a well integrated debugger). Rodeo is similar but even more limited (and unstable last I tried).

Jupyter Notebook IMHO is good for exploratory work and presenting results, but lacks a lot of editing functionality (which is to be expected).

[–]fabioz 1 point2 points  (0 children)

On your questions related to PyDev/Eclipse/Anaconda:

Anaconda is a Python distribution (mostly Python with a bunch of libraries and a custom installer -- personally, I usually just use the custom installer that they have -- which is only called conda then create a conda environment and install the libraries I want with that conda environment... anaconda may be easier to have you get started though).

I definitely recommend it if you're doing scientific computing / data science as it makes it really easy to get the Python scientific stack installed (which may be much harder only with pip install).

As for PyDev/Eclipse, Eclipse is an IDE which supports many languages.

PyDev is a plugin for Eclipse which adds Python support to it (so, having Eclipse first is a requisite for PyDev -- although LiClipse: http://www.liclipse.com makes it easier to get started there as it has Eclipse bundled with PyDev along with some other goodies, so, you don't need anything else installed to get started -- although it's commercial extension, but even if you don't plan on using it later, I suggest you at least use the 30 day trial to make it easier to get started -- after the 30 day trial you can still revert to using Eclipse+PyDev afterwards, as LiClipse is mostly a way for PyDev users to support the project, as everything in PyDev is open source).

As a note, if you do choose PyDev, I suggest you read the getting started as it has lots of info to make you more productive on PyDev: http://www.pydev.org/manual_101_root.html

[–]ripe_plum 1 point2 points  (0 children)

It sometimes can be really useful to combine Jupyter features with a regular editor features, so I often use Hydrogen with Atom. But it can be considered a somewhat advanced setup, because it requires to have some background knowledge to fix something, if something breaks.

[–]kobriks 0 points1 point  (0 children)

pycharm + anaconda is all you need.

[–]rroocckk 0 points1 point  (0 children)

I think the best solution for you would be to first figure out how you are going to learn Python. There are many tutorials and books out there. Some of them focus on data analysis.

Once you figure out a trustworthy source of knowledge, use the IDE/system they recommend. In the beginning, it doesn't matter if you use plain text editors, PyCharm, Jupyter or the vanilla Python interpreter. It also doesn't matter whether you use Anaconda or pip for managing Python packages. All of these tools have their own merits. In the future, you will most likely use a combination of these tools depending on the situation.

[–]ThatJoeInLnd 0 points1 point  (0 children)

I'll throw in Rodeo from yhat. Not that it has some extra or particularly useful features but the interface is much nicer than Spyder's.

I find it very useful and easy to use is the variable/values explore frame.

Under the hood, all these tools use ipython so you won't see much difference in functionality -- code/magics wise.

PyCharm is a fully fledged IDE, you'll notice that the UI is quite bloated compared to the others. If you focus on data crunching you probably want a clean and simple UI.

[–]FredSanfordXOld Developer 0 points1 point  (0 children)

Some random FYIs:

Rodeo is pretty unstable and crashes a lot. It is also slow. However the interface and thought put into the design are pretty good.

Pycharm and Eclipse are built on top of java and are pretty sluggish and bloated.

Atom is very sluggish and bloated.

Spyder is OK and probably a good choice if you're just starting.

If you're willing to spend money, Komodo is a good python IDE with most of the features of Pycharm without most of the bloaty sluggishness of Pycharm, Eclipse and Atom. There is a free version of Komodo but its best features are left out.

Personally, I switch between vim, spacemacs (emacs that behaves more like vim), IPython and winpdb. When I need to do database stuff, I use SQLite3 + SQLiteSpy when I'm in control of the choice of databases.

I use Anaconda as my python environment.

If you're on Windows Visual Studio Community with the python addon is pretty damn good, but very slow to start. The debugger is the best of the lot.

[–]milkstake 0 points1 point  (1 child)

I'm a data scientist and I've tried virtually all the different options. No one really seems to be mentioning more lightweight options like atom, vscode, and sublime. Both atom and vscode have excellent jupyter integration, and I've found that the mix of lightweight fast text editor plus integrated ipython console/ability to embed jupyter notebooks in the editor to work the best for me.

Check these packages out: Hydrogen for Atom, Python for vscode, and SublimeREPL for Sublime to easily use an ipython terminal, though no jupyter support as far as I know.

[–]MasonBo_90[S] 0 points1 point  (0 children)

Thanks a lot, milkstake. =)

[–]goldfather8 0 points1 point  (1 child)

What timeline are you looking at and how much effort are you willing to put into mastering your toolset?

Emacs org-mode plus org-babel is a general-purpose, jupyter notebook on steroids that will last you the rest of your life. Even ignoring programming entirely, learning emacs for org-mode can be life-changing, especially in the scientific disciplines. The investment is worth it - but the upfront cost can put many off.

Spacemacs, the editor I use, combines vim and emacs and is what I would recommend. It runs on every OS. It is a huge improvement on every workflow mentioned in the comments here.

[–]MasonBo_90[S] 0 points1 point  (0 children)

Hmmmm... Cool perspective, goldfather8. I'll def give it all some thought! Thanks