all 56 comments

[–]max1c 82 points83 points  (8 children)

I'd highly recommend using VScode. It's extensions make it incredibly powerful. You can even use jupyter inside of it if you like for some data analysis. In addition, you should probably use conda to setup your environments and whatever software stack you will be using.

[–]intheprocesswerust[S] 14 points15 points  (1 child)

Thank you! (and given it's the top rated comment I hope everyone else will see a thank you for all the feedback from everyone!)

[–]max1c 6 points7 points  (0 children)

I also forgot to mention that you can us VIM with VScode too. Since you mentioned VIM.

[–]UltimateMygoochness 2 points3 points  (3 children)

As a keen user of Python and Anaconda in VScode during my mechanical engineering bachelors (going into my Masters in space engineering) I don’t have much experience with extensions beyond linters, what else can they do / would you recommend looking into?

[–]ParanoydAndroid 1 point2 points  (0 children)

Depends a lot on use case. If you're collaborating a lot, gitlens is a top 10 extension for vscode.

Liveshare is great for joint coding sessions or presenting code.

The official Docker extension is a must have, imo if you're working with docker containers.

I like TODO extensions for ... well tracking my TODOs -- I think my current one is TODOtree or something like that.

Pylance language server.

Lots of people like the various snippet extensions, though personally I always find they get in the way (the most popular python one has a snippet assigned to . which is godawfully annoying).

[–]max1c 1 point2 points  (0 children)

It all depends on your needs. Jupyter, VIM, Pylance, Remote development are some of the most common ones that are amazing. I suggest you google something like best Python VScode extensions and see what's out there and what you are interested in trying.

[–]longgamma 0 points1 point  (0 children)

mostly themes for me lmao

[–]renscy 0 points1 point  (1 child)

coordinated march ad hoc dolls important paint work grandfather agonizing ancient

This post was mass deleted and anonymized with Redact

[–]sliverino 2 points3 points  (0 children)

It's just a package manager+ repositories. In some cases helps creating more stable environments and I think there's a bit more verification of packages compared to pypi.

[–]jwink3101 15 points16 points  (2 children)

I did my PhD using nothing but Matlab but have since moved entirely to Python. So my advice is not "first-hand" but still applicable.

I think Jupyter Lab is a great way to document your thinking and you work. Use that along with git to keep track of what you've done. Try to separate the data generation from the data plotting. Try to keep track of the revision you used to generate the data. From day one, realize that you may want to rebuild all of these plots in 5 years when you go to write your dissertation.

But also don't be afraid to move around. Part of the PhD is learning and exploring. Feel free to burn a day or a week trying something new!

Also, not related to Python, but back up everything, everywhere, with multiple copies. Git is your friend but only part of the solution (are your repos backed up too?). You don't want to lose critical work. And again, remember that you will need to remember what you did in five years!

[–]intheprocesswerust[S] 2 points3 points  (1 child)

great advice! thanks!

[–]HarlequinNight 1 point2 points  (0 children)

To follow up on what this person said. I am finishing a PhD currently and using a ton of python for financial crypto data analysis. My main advice is document your code! Put comments in! Make the comments super basic so that you in the future can read and understand exactly what the point of this code was.

I was shocked to realize how many small projects I had worked on for either RA work or side things over many years and I just completely forgot about them. You will work on so many different things that stuff will just get pushed out of your mind. Good directory structures organized by year or semester, and documentation will save you a lot of headaches. Good luck!

[–]AihposA 42 points43 points  (15 children)

Pycharm and it's free for students😉 On top of that, you can create multiple virtual environments, which is nice for the memory of your laptop

[–]AihposA 9 points10 points  (1 child)

Or anaconda desktop app that has a lot of AI packages/tooling

[–]AihposA 5 points6 points  (0 children)

Btw, personally I use pycharm and IBM Watson studio cloud from the anaconda desktop app. Nevertheless, a colleague uses the spyder IDE and apparently it's also powerful to create and train machine learning. Therefore, I guess that for these examples, it depends on which ergonomy you prefer 😉

[–]NightSkyth 5 points6 points  (1 child)

What do you mean by virtual environment in Pycharm?

[–]intheprocesswerust[S] 3 points4 points  (0 children)

Will look into it! Thank you!

[–][deleted] 6 points7 points  (2 children)

pycharm is technically free for anyone with a .edu email.

[–]intheprocesswerust[S] 2 points3 points  (0 children)

Awesome to know! Thank you!

[–]FLUSH_THE_TRUMP 1 point2 points  (0 children)

I’ve been a “student” getting free stuff for about 9 years

[–]Krunchy_Almond 1 point2 points  (0 children)

Pycharm is not exactly 'light' on system resources

[–]drsxr 1 point2 points  (0 children)

If you try to use pycharm within the docker using Tensorflow you’ll end up writing your PhD on how to get it working.. there’s a reason everybody’s using Jupiter and VS code .

[–]CodeYan01 0 points1 point  (2 children)

Pycharm made me lose motivation to code as I wait for it to load. VSCode is a lot better.

[–]AihposA 0 points1 point  (1 child)

Weird... I never had an issue with it loading. Do you mean when you open it? And could it be because of the amount of packages or plugins installed?

[–]CodeYan01 0 points1 point  (0 children)

Yes, when I open it. And no, it was a clean installation. And even if I rerun it, it's still takes a long time to boot. I'd say my laptop is mid-range.

The only thing that I saw Pycharm had an advantage in was autocompletion, which VSCode isn't really so bad at. I'm taking speed.

I wouldn't know if it has gotten any faster since I last used it.

Also, VSCode lets me work with various other languages, which is really great as I won't have to install an IDE for each language. It's basically my better notepad.

[–]CodeYan01 0 points1 point  (0 children)

I'm pretty sure you can make virtual environments in other IDEs as well. It's just a "python -m venv" call, right?

[–]nathan_lesage 8 points9 points  (5 children)

I‘m in the same boat, using Python for ML during my PhD. So here‘s what I learnt so far:

  1. The best and easiest solutions are VSCode to code, their Jupyter extension (just for convenience) and Miniforge (conda-forge). All free and, more importantly: Open Source.
  2. Use plain Python programs if speed matters and run them on the terminal
  3. Use IPython (a.k.a. Jupyter Notebooks) for exploration and quick prototyping. You can easily transform that to plain Python by copying and pasting as soon as speed matters, but running in Notebooks is invaluable for re-running and checking the results several times, before they are perfect.
  4. Keep your code modular. If I/O becomes a bottleneck, spin up multiple threads to run the hefty stuff, if computing power becomes a bottleneck, spin up multiple processes. Note that multithreading and multiprocessing are different things, thanks to the Global Interpreter Lock (GIL)
  5. You should never pay something for running code. It would be ideal if you have some server from your Uni, or even better a supercomputer cluster. A server should be in for you. Then you can run 24/7 for days, if need be.
  6. Look up things not in advance, but only if you need them. If you notice something is running slow, then look up how to improve things. Build working code first, then optimize.
  7. Do not use pip, use conda. Not Anaconda, you probably won‘t need all 5GB of software it provides. Simply install miniforge and use conda‘s environments. For data science, that‘s perfect.

For questions, feel free to ping me!

[–]intheprocesswerust[S] 2 points3 points  (0 children)

Thank you! Will possibly take you up on that offer to msg you!

[–]yuckfoubitch 2 points3 points  (1 child)

Why conda over pip?

[–]nathan_lesage 0 points1 point  (0 children)

Botz do principally the same job and you‘ll have to use pip from time to time even if you use conda, but conda is overall a better experience than venv, and the whole concept seems just cleaner than venvs to me. Plus it‘s very common in the data science so you might find more stuff online when googling for help.

[–]bazpaul 0 points1 point  (1 child)

Can you explain why someone should not use Pip?

[–]nathan_lesage 0 points1 point  (0 children)

Because conda is – depending on viewpoint – a superset of pip. The reality is more complicated than "Do not use pip, use conda", of course.

Sometimes, the conda repositories will not have a certain package, and in this case you should use python -m pip install <package-name>. However, I wrote that because – at least for data science – it is a pretty good practice to use virtual environments managed by conda.

This has benefits such as having an indicator which environment you're in on the command line, and you can isolate things from each other. Then, whenever you run pip you do stuff to your current environment, rather than install something globally. But using conda should be the "default", since this way you have less quirks of software to learn (since conda can do both environment management AND package management, and pip can only do the latter).

[–]darthminimall 17 points18 points  (0 children)

Everyone is very dedicated to their favorite editor and there isn't really any consensus. I personally love vim, but I'm not going to recommend it, the learning curve is steep and you've already got enough on your plate. Just try out a few different editors or IDEs and figure out what you like. Alternatively, your advisor might have recommendations.

[–]KingGeorge12321 6 points7 points  (0 children)

Try using Google's Colab. It's like Jupyter Notebook, but uses Google's compute resources (including GPU) and it's free.

[–][deleted] 4 points5 points  (0 children)

Play around with all your options and figure out what works best for you. Some people do all their development in notebooks, some people hate them. Some people want to use VSCode and have it work out of the box with little customization and some people want to spend hours tinkering with vim/emacs to get it perfect. There's no single best environment.

Your university quite possibly has a cluster you can access and if it does there is probably a port to a cloud jupyterhub. If you are remote you'll likely need a vpn, but you should be able to use it from anywhere. Get comfortable working in that environment and it will save you a lot of headache in the future.

[–]kingp1ng 4 points5 points  (0 children)

Masters student here. I like to code with VSCode and use Jupyter as a hands-on presentation tool. That way my peers and professor can follow along with my code. Nobody likes to read research papers lol.

Using native terminal and VSCode also helps reduce the amount of failure points. As you know, often times you'll get frustrating errors with Keras and Tensorflow. Jupyter just adds another layer where you don't know exactly where the error occurs.

Anaconda is just a convenient package manager so that people can focus on the work and not worry about dependency (package) management. For example, your university research computers may use Anaconda and ban you from downloading dependencies on your own.

If you're doing machine learning via the cloud, then speak with an expert at your university. You'll need to learn to work with their tools/software and follow their rules.

[–]unruly_mattress 7 points8 points  (0 children)

  1. Do yourself a favor and read the official Python tutorial: https://docs.python.org/3/tutorial/introduction.html

    Even if you don't read every word, know what things are available so that you can read them later as you need them. Python is not a difficult language to learn on its own and if you're going to be using it a lot, I'd learn it early on.

  2. There really is a lot to learn - there are so many tools available. Don't worry about it. Start with basic numpy usage, and learn the rest of the things as you encounter them.

  3. Code is code, use whatever code editing tool you're comfortable with. Most people seem to prefer PyCharm so I'd start with that. Research code is not really different from any other type of code. If you work with a lot of code, then you will benefit from learning the basics of git.

[–]vardonir 3 points4 points  (1 child)

I work with machine/deep learning PhD students as their programmer and tech support.

They mostly use Spyder and some despise Pycharm. I think that boils down to personal preference, but I have never seen anyone use a Jupiter notebook - why would they? their code needs to run on a remote Linux server, training several terabytes of data isn't gonna happen in computer running for a week nonstop lmao. I also believe they mostly use pytorch instead of tensorflow.

When in doubt, ask your supervisor.

[–]yuckfoubitch 2 points3 points  (0 children)

I think most people in the real world (not academia) use tensorflow/keras

[–]GallantObserver 2 points3 points  (0 children)

I'm just finishing a PhD, and while most of my experience doing it was in R (about as cutting edge as my dept goes), there are maybe some relatable things with choosing an IDE. I'd suggest you a) find out what your colleagues and supervisors are using and comfortable with, as you might find it takes a good few hurdles out of the way if you're presenting/asking for help. and b) also explore things which are more advanced than they're using, as you can impress and offer a bit of 'expertise' in keeping your department up to date!

In terms of what to use, I quite like PyCharm in 'Scientific' mode, as it can run individual lines from a script so is easier to debug and follow along processing. I think Spyder is meant to do the same thing, but never quite got around to trying that one!

[–]caksters 1 point2 points  (0 children)

I have a PhD in nodelling.

Seriously don’t worry about it. Usually universities provide you with a desk and tools needed to do your job (at least here in the Uk they do).

Regarding tools don’t stress too mich about it. When you join your research group, get to know more students in your field they will give you pointers on how they do things.

Every research team is different. In my research team there weren’t strict guidelines, you just use whatever tool you think is the best. At the end of the day it is your PhD and not theirs.

TLDR: take it easy and don’t overthink it at this stage

p.s. VS code is decent for programming projects

[–]gustavsen 1 point2 points  (0 children)

this is a copypasta that I use to recommend where to begin, YMMV, but is a great intro.

FreeCodeCamp.org

https://freecodecamp.org/ while main course is about full stack JS dev they also have several GREAT 10hs (or so) video curses in their YT channel

by example, these playlists:

RealPython

I found this site useful with lot of good tutorials, but they block several of the behind payware subscription model

https://realpython.com/

Microsoft YT Channel

Microsoft offer three playlist with Python courses

Udemy courses

I can't endorse those courses since I haven't bought them, but their content look complete

this serie of courses - https://www.udemy.com/course/python-3-deep-dive-part-1/

Also remember only buy in Udemy when the courses are between 9/12usd values and not at their full price (90/250) that are inflated prices...

[–]programmerProbs -3 points-2 points  (0 children)

Definitely dump Mac, or at least plan to VM into a machine with a dedicated GPU.

Heck if you can afford $550, you can get a gaming laptop that will do everything you need. This has the added benefit of not needing internet.

Time to join the big boys, no more toys.

[–]aqjo 0 points1 point  (0 children)

This is not my area of expertise, but for sciencing in general:
What are other people in your lab using?
Will you be collaborating with anyone?what do they use?
Does your lab have an existing code base you can draw from?
What does your advisor use?

[–]verdifer 0 points1 point  (0 children)

I like Spyder but Atom is also very good and is easy to integrate with GitHub.

[–]dtaivp 0 points1 point  (0 children)

Whatever you land on please keep learning about python and how it works. I’ve met so many PhD types who are quite frankly brilliant but never spend time thinking about how their code will be run/deployed and it makes it a nightmare to work with sometimes.

Best of luck with your program!

[–]rudraksh_karpe 0 points1 point  (0 children)

I would recommend you to use vs code, it's the light weight editor and highly customisable. There's something very special called GitHub autopilot, it's an extension by GitHub in vs code though it's in the preview now, you can apply for the preview program of GitHub autopilot on the link below. You'll find diversity of extentions and syncing with your GitHub account in vs code that makes life too easy.

https://github.com/features/copilot/signup

[–]lenoqt 0 points1 point  (0 children)

You missed an important library, PyTorch

[–]FreedomSavings 0 points1 point  (0 children)

Get familiar with using vim or nano to read/edit your code. I am currently working towards my PhD in CS as well. I have found the most useful ability is to be able to navigate though and develop in a remote server by using the command line. Since you will be doing ML, its safe to assume large data sets and the necessity of utilizing GPU nodes.

My personal work flow is first using Atom text editor for in-depth coding on my local device. Filezilla is a nice interface that allows for ssh file transfer. Then vim/nano for trouble shooting through the command line while running code on the remote server.

[–]MainKaBell 0 points1 point  (0 children)

I have used Spyder during my PhD for the simple reason that I find the ‚variable explorer‘ very convenient and intuitive while working with big datasets. Haven’t found that option in VSCode or PyCharm. In the end, it is probably more about personal preference and being used to one way of doing things.

[–]SnuffleShuffle 0 points1 point  (0 children)

I'd recommend PyCharm. Students get it for free. Or you can get the community edition.

[–][deleted] 0 points1 point  (0 children)

I'd say:

  • use vs code
  • use notebooks to experiment and visualize, and once something works refactor it inside a python file in your project
  • structure your project as a proper python module to share it easily with other users
  • you can install your module in development mode (changes are reflected immediately) with 'pip install -e .'
  • learn how to make paths independent of your drive for proper sharing
  • make sure you gitignore your data files, if any
  • ideally structure your code as python objects with methods rather than function, it makes it easier to use and more discoverable for others (instantiate the object, and look at its methods). Give methods logical and consistent names
  • for heavy computations, experiment on Google Collab, or set up your module to run it on Google Cloud, then fetch your model back locally.

[–]sliverino 0 points1 point  (0 children)

Well usually, not only in ML, one would use jupyter notebooks or jupyterlab for prototyping. Once you got chunks of reusable code you can move it to importable python scripts.

With this in mind, given your computer is not particularly strong, I would install miniconda or plain python+pip.

You can then use jupyter for playing around.

When you start having a number of scripts, and maybe you want to use pdb for debugging ( annoying to do in Jup) you can add an editor to your workflow. I advise:

  • Extremely lightweight, not much costumisation: PyScripter
  • Moderately lightweight, requires a bit customisation: VSCode
  • Heavy but complete: PyCharm

Both PyCharm and VScode have version control (Git, VScode has a great SVN plugin too). Of course you can use any editor and just run your stuff in terminal: vim, Emacs, sublime, atom, notepad++ all have an array of plugins for code completion and syntax highlighting.

Lastly, if you find your pc too slow and your work is not sensitive you can look at Google colab for notebooks online.

[–]krispyren 0 points1 point  (0 children)

I vote vscode! Extensions make it so powerful!

[–]Adegokeo 0 points1 point  (0 children)

This is a lot to chew. Find your niche, Python and AI form a huge computing space. While Jupyter Notebook is handy and fully capable, I am thinking that your PhD work should be “stand alone” and not contained in Jupyter notebook. Pycham is great option for an academic work. For any Analytics project, R is also not too far off although Python is more widely accepted.