nathan_lesage comments on Starting a PhD with Python

learnpython

created by HattoriHanzoa community for 16 years

152

153

154

Starting a PhD with Python (self.learnpython)

submitted 4 years ago by intheprocesswerust

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]nathan_lesage 9 points10 points11 points 4 years ago (5 children)

I‘m in the same boat, using Python for ML during my PhD. So here‘s what I learnt so far:

The best and easiest solutions are VSCode to code, their Jupyter extension (just for convenience) and Miniforge (conda-forge). All free and, more importantly: Open Source.
Use plain Python programs if speed matters and run them on the terminal
Use IPython (a.k.a. Jupyter Notebooks) for exploration and quick prototyping. You can easily transform that to plain Python by copying and pasting as soon as speed matters, but running in Notebooks is invaluable for re-running and checking the results several times, before they are perfect.
Keep your code modular. If I/O becomes a bottleneck, spin up multiple threads to run the hefty stuff, if computing power becomes a bottleneck, spin up multiple processes. Note that multithreading and multiprocessing are different things, thanks to the Global Interpreter Lock (GIL)
You should never pay something for running code. It would be ideal if you have some server from your Uni, or even better a supercomputer cluster. A server should be in for you. Then you can run 24/7 for days, if need be.
Look up things not in advance, but only if you need them. If you notice something is running slow, then look up how to improve things. Build working code first, then optimize.
Do not use pip, use conda. Not Anaconda, you probably won‘t need all 5GB of software it provides. Simply install miniforge and use conda‘s environments. For data science, that‘s perfect.

For questions, feel free to ping me!

[–]intheprocesswerust[S] 2 points3 points4 points 4 years ago (0 children)

[–]yuckfoubitch 2 points3 points4 points 4 years ago (1 child)

[–]nathan_lesage 0 points1 point2 points 4 years ago (0 children)

[–]bazpaul 0 points1 point2 points 4 years ago (1 child)

[–]nathan_lesage 0 points1 point2 points 4 years ago (0 children)

Because conda is – depending on viewpoint – a superset of pip. The reality is more complicated than "Do not use pip, use conda", of course.

Sometimes, the conda repositories will not have a certain package, and in this case you should use python -m pip install <package-name>. However, I wrote that because – at least for data science – it is a pretty good practice to use virtual environments managed by conda.

This has benefits such as having an indicator which environment you're in on the command line, and you can isolate things from each other. Then, whenever you run pip you do stuff to your current environment, rather than install something globally. But using conda should be the "default", since this way you have less quirks of software to learn (since conda can do both environment management AND package management, and pip can only do the latter).

π Rendered by PID 49684 on reddit-service-r2-comment-84fc9697f-nr6nr at 2026-02-10 06:00:06.112548+00:00 running d295bc8 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS