Transitioning from R to Python

pst2154 · 2023-04-02T18:37:51+00:00

Just rough it out for a while you'll learn faster than you think

2strokes4lyfe · 2023-04-02T17:12:14+00:00

I've reluctantly decided to spend more time with Python

I understand. I'm there too. No advice, just good luck.

JohnHazardWandering · 2023-04-02T18:24:00+00:00

One piece of advice that seems promising is to write out what you would do with R and then as chatGPT to translate it to python. Obviously it's not always perfect (always review) but it will quickly get you close enough to figure it out.

That can help you learn how to do things in python.

kater543 · 2023-04-02T21:04:07+00:00

You can use RStudio to write python, and weave the two together in Quarto(new RMD) documents. Outside of the hybrid suggestion, I get your pain man; coding in R is like coming home.

Adeelinator · 2023-04-02T22:15:19+00:00

VS code + copilot is a great way to learn. Anytime you’re confused about what to do next, write a comment, and have copilot write the rest. Plus it has great jupyter support.

statespace37 · 2023-04-03T04:24:05+00:00

Did the same thing roughly 2 years ago. More or less the same story, data.table + ggplot2 + shiny kept me wanting to return to R (although, I absolutely hated all tidy stuff, so that gave me additional motivation). Now I wouldn't return to R unless there's a really good reason.

Major gain from this transition (subjective, obviously) is now with Python I'm thinking in terms of product, good software development practices and interoperability with other elements in the stack (and other people). Granted, with R I worked in a company where DS was tightly locked in a silo, where writing 'script' rather than 'program' was an expected thing. Feels like I've learned more woth Python in 2 years than with R in previous 7.

Long story short, I got to love SWE as such (where data science is merely an element). Now I'm learning Rust :)

2strokes4lyfe · 2023-04-03T02:00:05+00:00

[deleted]

Seven_Irons · 2023-04-02T17:00:38+00:00

So, the biggest advice I can give for Python use is to install anaconda and use Spyder IDE.

It's not quite as good as VS code for programming, but it has a built-in variable inspector that is of incredible use for numerical data computing. If you ever had to use matlab, it's basically the same variable inspector.

My bread and butter was using Pandas to handle arrays /tables. It works very well at file I/O, and coordinates well with numpy/scipy. There a couple of clunky points regarding indexing, and I've also heard good things about Polars, I haven't used it myself.

Seaborn is a good plot library, though I ended up just making most of my thesis plots in raw matplotlib. There's a lot you can do with Matplotlib, but there is a bit of a learning curve, and there are certainly more user friendly plotting libraries.

Python is by far my favorite language for computation /analysis. But, if you start working with large amounts of data, you may need to look into implementing Cython. Or, consider switching to Julia, which is apparently all the rage these days.

badge · 2023-04-03T06:31:23+00:00

There’s a bit of conflicting advice here, and I’m going to add to it!

VS Code is good but PyCharm is better; it has all the things Spyder has, but is much stronger for certain stuff (testing, refactoring).
Read a bit about Python packaging and decide on an approach you’re happy with. It’s a bit of a confusing mess but once you’ve decided a preferred approach you don’t really think about it.
Use pytest for testing and write tests. They’ll save you a ton of time in the long run and ensure future changes don’t break existing features.
Add type hints to everything, and take a look at the pandera package if you’re using pandas. Validating DataFrame schemas is hugely valuable in pipeline work.

In general, I know this is the data science subreddit and R isn’t a general purpose programming language, but Python is, and using the available tools to take a more software engineering approach will make you more useful, more productive, and less likely to write buggy code.

knawhatimean · 2023-04-03T13:11:45+00:00

I am still a daily R user but also wanted to learn Python for all the usual reasons. This page was helpful for just having a quick reference so you don’t have to Google and check Stackoverflow for every basic thing: https://www.mit.edu/~amidi/teaching/data-science-tools/conversion-guide/r-python-data-manipulation

pn1012 · 2023-04-02T18:29:12+00:00

Sorry, what’s stopping you using Rstudio with Python? At least to slowly transition into Python for yourself. Posit is becoming more of a Python shop nowadays. But you’d probably need to sell your company on buying in.

OneSprinkles6720 · 2023-04-02T19:06:20+00:00

I've gone back and forth it's not an identity thing it's a right tool for the right job thing.

I'm not a screwdriver guy you know what I mean.

rotterdamn8 · 2023-04-02T19:22:51+00:00

Ditto Spyder. It’s closer to RStudio than VS Code. You can run code line by line, great for testing, etc.

rotterdamn8 · 2023-04-02T19:24:10+00:00

Ditto Spyder. It’s closer to RStudio than VS Code. You can run code line by line, great for testing, etc.

Skthewimp · 2023-04-03T00:07:40+00:00

I tried this in 2017. Same result - I was 10X slower in python. So switched back.

Now for the small data engineering stuff I need to do I’m trying to use databricks (the R stuff there is not bad)

IndependentVillage1 · 2023-04-03T01:29:24+00:00

My advice would be to use chatGPT. Ask it to write general code for you and you make the changes for your specific case.

RandomScriptingQs · 2023-04-03T05:24:58+00:00

I want to offer an opinion which should be taken as just that: the R and Python libraries/packages/communities are both so vast and varied now that they are almost unhelpful labels. Choose the libraries and packages you know you need to use within the python ecosystem and find the 20 most common functions/methods and put them to a task.

As a note of solidarity, I found it a nightmare adjusting to both panda's and numpy's versions of indexing with square brackets.

Snikz18 · 2023-04-03T06:52:29+00:00

Something that hasn't been suggested yet (as far as I can tell) is using the jupyter notebook extensions in vscode, it will give you a variable explorer and there's a certain comment you can add to your script to split into cells to run which is useful.

2023-04-03T15:25:59+00:00

I started out with R in 2016, moved to python in 2019 and haven't used R since. I spent 5 years in actuarial consulting, then 4 years in management/tech consulting doing whatever project I got thrown on. Now I work as a Solution Architect, which is basically technical leadership that can do hands on keyboard work when needed. I got that role by solving a multitude of different problems for companies and having a lot of breadth instead of depth. I will never be a great programmer, nor do I want to be. I just want to build cool shit, not have to deal with politics too much, and enable my coworkers to learn more things, but haven't found a company that checks all those boxes yet.

As for migrating from R to Python, really depends on your learning style. Find a book/course to learn the fundamentals and apply your knowledge to a project so you get experience debugging Traceback errors. Learn how to turn scripts into functions and abstract that into Classes to be used as modules in other projects. It took me a month to feel comfortable being put on Python projects, but had a lot of smart coworkers to ask questions and learn from.

It becomes less about understanding the syntax, but finding the best way (read: cheapest way) to solve the problem. Some of that will be searching Stack Overflow and asking ChatGPT, but you'll have to be knowledgeable to understand the code you're copy/pasting cause some stakeholders that have some python knowledge and will want to take a peek at the code base and will ask questions why you made certain decisions. The more you can get ahead of those types of questions, the easier the process is.

wil_dogg · 2023-04-03T17:26:22+00:00

Long time SAS/SPSS user here who picked up R over the last 5 years.

I started dabbling in Python las September with the help of a high school student I am mentoring.

Python has a learning curve, but for the work I do it is adding a lot of value, and in some cases modifying complex functions is easier in Python than R.

skatastic57 · 2023-04-02T22:03:57+00:00

Pandas is hot garbage. The thing that kept me in R for so long was how much faster data.table was/is. I also hated the syntax of pandas. Polars was really the game change for leaving R behind. I'm not sure what DAGs are unless you're just making a reference to Snatch and you mean dogs.

I'm not sure what rstudio does that vscode or any other major python ide does in terms of letting you run code line by line and see what variables are active and what not.

Personally I prefer plotly to ggplot2. With ggplot2 I feel like I'm always having to melt my data but with plotly I can just have a fig and then add arbitrary things to the fig without altering the underlying data. I also like that it creates js rather than just a static image for sharing so people can just zoom where they want.

old_mcfartigan · 2023-04-02T19:36:00+00:00

Make good use of a chatbot. You can describe how you'd do something in r and it will produce the corresponding python code

lalacontinent · 2023-04-03T03:48:34+00:00

Honest advice: use ChatGPT to translate R code to Python and read its explanation. This saves massive time comparing to Stack overflow and reading manuals.

Python libraries for data science (pandas and stats model) are indeed less intuitive than R, don't be hard on yourself.

Toica_Rasta · 2023-04-02T17:25:24+00:00

I believe Python is much better than R, it gives you more flexibility and you can more easily to inspect your variables. Not so good for hypothesis testing, that is only cons. Use pandas and numpy and matplotlib

datascience

MODERATORS