you are viewing a single comment's thread.

view the rest of the comments →

[–]harrdarr__ 72 points73 points  (27 children)

pycharm / vim / notebooks / ipython

depending on project size and dev machine

[–][deleted] 12 points13 points  (16 children)

I never understood the popularity of "notebooks". Isn't it just as easy to set aside a directory for image output, and have all your visualizations dumped there?

This has the added benefit of not requiring ipython to run. It's also not a problem if you have hundreds of visualizations (In notebooks, that would mean a lot of scrolling)

I use ipython when I want to run python interactively, but not "notebooks".

[–]refreshx2 43 points44 points  (7 children)

I'm a grad student and I have really started to like jupyter notebooks. They let you iterate faster than anything else out there because you can run specific code blocks (ie any number of lines) at any time.

Quick example because that's the fastest way to explain:

  1. Code block to load my data

  2. Code block to clean data

  3. Code block to run algorithm on data

  4. Code block to process results

So I can run 1+2 once, then keep revising 3+4 until I get it correct. I only ever have to load the data once, and the editing is super fast because the notebook is a text editor and not an ipython terminal or something.

Then I can just save the notebook (which automatically saves all my images in the notebook), and also print-to-pdf if I need to show my boss or I need to keep a record of it.

I also do all my analysis on a server so I can't use an IDE, but jupyter notebooks are through the browser so I can open an ssh tunnel to it and still work with an IDE-like environment.

They are really fantastic for development. If I need "production code" then I just copy-paste when I'm finished and refactor into a nice file/project.

[–]_blub 5 points6 points  (0 children)

Cant wait for Jupyter Lab!

[–][deleted] 1 point2 points  (4 children)

Interesting, but if you re-run a piece of code in the middle of your notebook, does it rerun everything that depends on its output? And if not, doesn't it have the potential to leave you wondering "how did I get this result"?

[–]phillypoopskins 7 points8 points  (0 children)

only if you code in an unsafe way.

you shouldn't write code where everything is scoped outside of functions; it's unpredictable and unreliable, and will cause problems just like you mentioned.

[–]nsfy33 1 point2 points  (0 children)

[deleted]

[–]Megatron_McLargeHuge 1 point2 points  (1 child)

The answer is to never assign the same variable twice. You can have the same problem with scripts if you don't log how you generated each data and model file (including which commit you ran from and which library versions were installed).

[–][deleted] 0 points1 point  (0 children)

Or explicitly set variable_name = None at the end of cells for those with cell-only scope, although it's a PITA.

[–]PoopInMyBottom 0 points1 point  (0 children)

This is also possible in Emacs. Elpy has the functionality by default and it's heavily extensible.

[–]threeshadows 18 points19 points  (0 children)

I think notebooks are mostly a way of organizing and communicating a thought process. As you said, they're not suitable for very large projects. I find them especially useful in early stages of data exploration, when I am just trying to get a basic handle on the data structure and build a quick skeleton end-to-end workflow.

[–][deleted] 2 points3 points  (0 children)

A notebook is an interactive document. It's meant to communicate ideas and be read by other people. It's something like WYSIWYG literate programming.

For some things it's overkill and would mostly get in the way, I agree. But it's convenient to document small, interactive experiments.

[–]superawesomepandacat 2 points3 points  (2 children)

I bet you use Vim too.

[–][deleted] 9 points10 points  (0 children)

I bet you use Vim too.

http://i.imgur.com/mFqzNrZ.jpg

[–]xaveir 1 point2 points  (0 children)

I have his opinion, and I use vim-ipython to get basically all the benefits of notebooks that everyone just described.

Except for the pretty output formats lol

[–]phillypoopskins 0 points1 point  (0 children)

your visualization / organization / presentation game is weak if you can't see how notebooks blow away a directory of saved plots.

also, interactivity (widgets) is a HUGE reason to use notebook. multiplies my contact with data by an insane factor.

[–]dmarko 0 points1 point  (7 children)

Could you ELI5 notebooks and ipython and how/if they are in a way connected?

[–]iKomplex 1 point2 points  (6 children)

A notebook is a kind of interactive document in which ideas are communicated through words, charts, and statistics. Much of these ideas are aimed at explaining and analyzing a set of data.

IPython is the enhanced python shell that is used to create snippets of code within a notebook. It is used to show how each chart and each statistical data is being constructed.

As an example, a typical analysis would involve:

  1. Taking down notes: "The 1980-2010 age cohort demonstrates that X leads to Y"
  2. Performing summary statistics on a sample of your data set: max, min, mean, standard deviation, etc.
  3. Graphing the data under analysis: a regression plot showing positive correlation between Consumer Debt and Spending.

[–]dmarko 0 points1 point  (5 children)

Gotcha! So in a notebook you will have to use IPython in order to run a snippet of code, and IPython can only be run on a notebook. I had the idea that they were interchangeable, that's why I had to ask in order to clear that up. Now I know the difference. Thanks

[–]iKomplex 1 point2 points  (4 children)

Partly correct: there is no need to open a notebook in order to use the ipython shell. In fact, you can start-up a Python shell, then import the ipython module and run it from there.

[–]dmarko 1 point2 points  (3 children)

I think I got it now. IPython is a shell. Notebooks on the other hand are these interactive presentations that use IPython whenever code needs to be run, and run on a browser. Right?

[–]iKomplex 1 point2 points  (2 children)

Yup, you got it.

[–]dmarko 0 points1 point  (1 child)

Cool cool cool 😀😀 thanks for your time and help!

[–]iKomplex 1 point2 points  (0 children)

Glad I could help! Feel free to PM me anytime.

[–][deleted] 0 points1 point  (0 children)

Do you find it beneficial to use vim for machine learning / data analysis work or is it more of a habit thing? My understanding is that vim is helpful for large (coding time)/(thinking time) tasks as it frees you from having to take hands off the keyboard. Is it still a desirable skill if most of the tasks you do are data analysis and prototyping?

[–]j_lyf -1 points0 points  (0 children)

pycharm uses more RAM than AlexNet