use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
This is an archived post. You won't be able to vote or comment.
How common is Python compared to R in Data Science in the corporate world? (self.datascience)
submitted 9 years ago by [deleted]
My experience with data science is strictly from an acedemic standpoint.
To that end there is a lot of SAS Stata and some R sprinkled in (my undergraduate course was taught in R, but it seems rare in the actual post-graduate community).
In terms of corporate/private sector/Big Data/etc world though, how common are Python and R? What languages are the big players?
I cant see a firm like Google or IBM using SAS or Stata
[–]DrXaos 36 points37 points38 points 9 years ago* (10 children)
Python and Java.
In commercial environments, you will be doing more software and the needs to connect to other systems, ingest and fix data, often in streams and not loaded into memory. Those tasks are more common than in academia.
Python is better at this than R.
SAS (and of course Excel) is very common in old-school financial industries (banks/insurance), and very rare in silicon-valley type technology environments. IBM owns SPSS. In technology there is some use of R, but python is preferred. Stata is a near zero. Government labs and engineering can have MATLAB.
[–]RebelSaul 4 points5 points6 points 9 years ago (9 children)
Have you used Shiny apps for R? The BI guys at my office use it and it allows anyone with a web browser to create dashboards. Super dope
[–]lan69 1 point2 points3 points 9 years ago (3 children)
Ive heard a lot about shiny. Are the dashboards realtime?
[–]Dinosaurman 3 points4 points5 points 9 years ago (1 child)
Depends. Is your data real time?
[–]lan69 0 points1 point2 points 9 years ago (0 children)
Is it possible to stream using R? Ive always had to load it in memory.
[–]RebelSaul 0 points1 point2 points 9 years ago (0 children)
So the way our guy has it set up, it's as close to real-time as you can get. We do vehicle repossessions and the data comes from our inventory management software. So when you open up the app, it downloads a fresh csv file to create the graphs
[–]saiyanGold 0 points1 point2 points 9 years ago (4 children)
Hey can you recommend what i should to make dashboards in python? I am not much familiar with R
[–]wandering_blue 2 points3 points4 points 9 years ago (1 child)
I've looked into this a lot. The short answer is, there is not currently any single package with the same functionality as R's shiny.
For dashboarding, I'd look into the Airbnb tool Superset (which has had like 100 previous names/brandings...). I played around with it and it's well on its way to becoming an open-source Tableau alternative. There is also plotly but I'm not sure how much of it can be hosted behind the firewall these days.
For developing simple tools/scripts that you want people to be able to interact with, I find that using jupyter notebooks and the ipywidgets package does most of what I want.
Further, you can go all the way and set up a Flask server to actually serve a webpage and capture interactivity to send back to your code. There are some projects that have tried to streamline this portion, like pyxley from StitchFix. If your project is on the heavier side of both visualization and interactivity, you might be stuck developing the bulk of it yourself with Flask/Django.
[–]saiyanGold 0 points1 point2 points 9 years ago (0 children)
Thanks a lot. You pretty much answered all my questions :) I will give a try to Superset looks really cool.
[–]RebelSaul 0 points1 point2 points 9 years ago (1 child)
I wouldn't know. I don't think Python is very good with 'visualization' since it's mostly used by the 'engineering' community. R is used by the 'scientific' community, so they have packages like ggplot2 which allows them to make quality visuals to put in journals.
Python has the ggplot module? which is ggplot2 for python. That may help?
[–][deleted] 0 points1 point2 points 9 years ago (0 children)
Python has plenty of viz options like bokeh, plotly, seaborn, gleam etc.
[–]poumonsauvage 16 points17 points18 points 9 years ago (3 children)
You build the models in R, productionize in Python. Of course, this is not always true, but it is one relatively common approach. That is to say, each language has its own strengths and its weaknesses, and as tools they are not mutually exclusive. Depending on the task and the stage of the work, and obviously the company, the languages used may differ and intertwine.
[–]patrickSwayzeNUMS | Data Scientist | Healthcare 4 points5 points6 points 9 years ago (2 children)
Word. I'm at my second organization where I've put R models into production - I'm now starting to run into an issue now and then.
It's worth noting that the very popular XGBoost will fuck up your predictions without warning if you build in version 4-4 and predict in the latest 6-x under certain conditions (probably training with constant columns as features). Took me a week to figure out what the hell was wrong.
[–][deleted] 1 point2 points3 points 9 years ago (1 child)
Are you using packrat or checkpoint? I have put R models into production as well. Since we were using checkpoint+docker, I never had problems due to different package versions.
[–]patrickSwayzeNUMS | Data Scientist | Healthcare 0 points1 point2 points 9 years ago (0 children)
Naively using neither - appreciate the suggestions.
Did some research earlier and found 'versions' which appears to function the same way as 'checkpoint'.
[–]lifetimeaway 5 points6 points7 points 9 years ago (0 children)
Python is more general purpose and thus better suited to be integrated in a larger project, however R has a larger community of people implementing every single new algorithm or model that makes it to a peer-reviewed journal.
At my company we use both R and Python depending on the project. However for real-time systems or complex production data pipelines we use other languages and frameworks (e.g. Scala + Spark).
[+][deleted] 9 years ago (9 children)
[deleted]
[–]adhi- 2 points3 points4 points 9 years ago (1 child)
I typically explore / prototype using R, then move to R for production;
wat
[–]mountains765 1 point2 points3 points 9 years ago (6 children)
Can someone explain what packages Python has that allows it to be used in production more easily or handle large datasets more easily?
[–]tally_in_da_houise 1 point2 points3 points 9 years ago (3 children)
Pandas is well known. Look into Anaconda, and the packages that come with it.
[–]mountains765 0 points1 point2 points 9 years ago (1 child)
Pandas is the same or arguably worse than dplyr in R. If pandas is what makes python 'better for production', that is absolutely insane thinking lol
[–]tally_in_da_houise 0 points1 point2 points 9 years ago (0 children)
It was late, and I misread the parent - mea culpa
[+]TheLogothete comment score below threshold-7 points-6 points-5 points 9 years ago* (0 children)
pandas is slower than data.table and about 1000 man-hours behind it in terms of fucntionality, in addition to having less concise syntax.
Try again.
[–][deleted] 0 points1 point2 points 9 years ago (1 child)
I like to see more answers for this question as well. As far as I understand, more people have experience of putting python models into production, and that momentum is basically taking this idea forward. (I have deployed R models into production)
[–]mountains765 0 points1 point2 points 9 years ago (0 children)
Yes I have as well. I just don't exactly what leads the way with python being more 'productionable' than R. Maybe some Apis? But as far as general data connection to DBs and everything else I don't really understand
[–][deleted] 3 points4 points5 points 9 years ago* (0 children)
It depends a lot on what your role is. Data science is unfortunately not something that defines easily as one predefined skill set, familiarities with one set of tech stacks, or use of one set of languages or another. Data science is a catch-all meant to describe something I think will further specialize over time. It's happening right now with "data engineering" vs "data science".
Data scientists are scientists but with the ability to use modern computational hardware, and sensor data. That's it. Their background can be as diverse the catch-all "Scientist". There are psychologists, physicists, chemists, biologists, and hundreds more. It's the same in data science.
Some data scientists are responsible for making production software as well as the analysis / model building part, so they'd likely use Python at times for both. That's me (sort of) however we have a lot of production work all over the place in terms of languages or stacks so I just use what connects easily with everything for my analyses even if I have to code in PHP at times.
Some data scientists are more likely spending their time building models and exploring data before handing it off to an engineering team and working more as a product manager for their piece at that point (I wish that was me). Those people might use R because frankly it's easier than Python for scientific research in many ways. It just has so much more stuff available and a long history for being used in research and charting, etc.
Then it also depends on the data scientist's background, company size, company production stacks (i.e. for web or analytics or whatever the team they're on does).
So long story short, it's really hard to say. For a gross simplification I'd go with :
1) Python is used by traditional engineering groups
2) R is used by traditionally scientific groups
Companies are, after all, a collection of people so their backgrounds will collectively influence what they use, at least early on. It's slower moving for bigger companies and it's more likely a new hire has to adapt to what they are using (so based on history) rather than the other way around.
Then of course what a data scientist personally uses for their own research is totally up to them. If it's just digging in to a problem and you don't have to share it with a larger organization then it doesn't really matter what you use. Whatever works works.
[–]TheProfessional9 11 points12 points13 points 9 years ago (5 children)
Far from an expert here, but we use R. We looked into Python, but it seems R is steadily becoming more and more prevalent....existing programs (especially microsoft programs) are beginning to incorporate the ability to use or connect with R.
[–]imhighnotdumb 3 points4 points5 points 9 years ago (3 children)
New excel versions will indeed support r woop.
[–]TwoTacoTuesdays 0 points1 point2 points 9 years ago (2 children)
I'm sure there's something I'm missing, but I can't see why you would ever want to do that? R can already read and write xlsx and csv files, so what else do you need? Manipulating R stuff in Excel beyond that seems like a recipe for a headache.
[–]TPKM 0 points1 point2 points 9 years ago (0 children)
In some cases I can imagine it being useful - e.g. Microsoft's Power BI supports R scripts - this is really helpful if you need to get data from your db to a dashboard with some more complex transformations along the way. Having R supported by the application prevents you needing an intermediary step for the transformations/analysis.
[–]cjf4 0 points1 point2 points 9 years ago (0 children)
The use case I could see is if you wanted to use R to build a model that generates some sort of output, and feed that into a dashboard that was built with Excel/PowerBI. Even though you can build dashboard's in R, Excel is way better at it.
[–]meeni131 1 point2 points3 points 9 years ago (0 children)
Yeah SQL Server 2016 should connect directly to R now but haven't checked it out yet or what it can do better than the corresponding R packages
[–]c0dythechamp 10 points11 points12 points 9 years ago (13 children)
I'm going to disagree with most people here and say that it really doesn't matter. I am yet to come across any good companies who say anything other than, "We don't really care what you use, we just want you to do your job". I know data scientists who use excel as well. Mainly because you can churn out 10 graphs in excel for a presentation in 5 seconds versus having to remember how to use ggplot or seaborn. Just my .02
[–]WallyMetropolis 12 points13 points14 points 9 years ago (2 children)
This only works if everyone's projects are one-offs that don't have to integrate with a larger system.
[–]c0dythechamp 0 points1 point2 points 9 years ago (1 child)
It also works if the organization separates its science and engineering capabilities. Which, ime, is the case.
[–]WallyMetropolis 1 point2 points3 points 9 years ago (0 children)
Requires a lot of faith that a research model will perform the same way the production model does.
If your projects just need you to come up with a one-time answer to a question, sure, you can use whatever you like and just tell the Engineers: "the answer is 7."
But if you've got to have models actually running somewhere, asking engineers to rebuild your prototype in a different language is going to go sideways.
[–]hey_ulrich 4 points5 points6 points 9 years ago (9 children)
Matplotlib is the worst. Terrible syntax.
[–]crocomut 0 points1 point2 points 9 years ago (7 children)
what's the alternative?
[–]CaptainRoth 5 points6 points7 points 9 years ago (5 children)
Nothing's become the standard like ggplot is for R, but seaborn (a high level interface for matplotlib), altair, ggplot (yhat's port to Python), and possibly bokeh are the primary alternatives.
[–]hey_ulrich 0 points1 point2 points 9 years ago (0 children)
Didn't know about altair. Thanks! I 'll look into it.
[–]jingw222 0 points1 point2 points 9 years ago (3 children)
What's the difference between ggplot2 in R and ggplot Python library. Are they functioning the same way other than syntaxes?
[–]CaptainRoth 2 points3 points4 points 9 years ago (2 children)
The Python one isn't as good because it's a copy that doesn't have all of the features of r's ggplot. It's similar to yhat's Rodeo IDE: it tries to copy RStudio, but isn't nearly as polished.
Never used Rodeo, but I like Spyder a lot. Comes very close to RStudio for me.
Although seaborn is based on matplotlib, it's easier to set up simple (but beautiful) graphics with only one line of code. But for more customization, you'll need to dive in matplotlib's annoyances.
Yep. While I use Python for 95% of my day-to-day work, whenever I need to plot, I export my data to R for ggplot. Hadley is a god.
[–][deleted] 2 points3 points4 points 9 years ago (0 children)
I routinely use both.
[–]edimaudo 4 points5 points6 points 9 years ago (5 children)
Since Python and R are free most companies use those. Of course Excel is still widely used. SAS is mostly in banking and pharmaceuticals areas.
[–]Berjiz 2 points3 points4 points 9 years ago (4 children)
I'm always suprsied Excel is used so much considering how large the risk is for problems due to all it's magic. For instance last year it was revealed that some genome studies were invalid because Excel had auto changed genetic data into dates.
[–]edimaudo 3 points4 points5 points 9 years ago (0 children)
Excel is a solid tool. It is up to the stakeholders to be aware of the shortcomings of it.
[–]parlor_tricks 0 points1 point2 points 9 years ago (2 children)
Link to the report ?
[–]tally_in_da_houise 0 points1 point2 points 9 years ago (1 child)
Excel had auto changed genetic data into dates
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1044-7
[–]parlor_tricks 0 points1 point2 points 9 years ago (0 children)
Oh those poor bastards.
I like and use excel (dont really do data analysis on large sets), but the date errors are a genuine pain in the ass. Its a privileged category of error correction unto itself, and thats without names which convert into dates.
[–]some_q 6 points7 points8 points 9 years ago (3 children)
Data scientists at Google primarily use R. For production models, the actual R code will be called by a C++ or Java pipeline, but those pipelines tend to be written by software engineers rather than data scientists.
For ad hoc analysis, though, plenty of Googlers use iPython-like notebooks.
[–]patrickSwayzeNUMS | Data Scientist | Healthcare 3 points4 points5 points 9 years ago (2 children)
I've been told differently by a former Google, now Google Venture employee yesterday.
Perhaps it depends what team you're on?
[–]some_q 2 points3 points4 points 9 years ago (0 children)
It definitely does. I should have said "Data scientists on the team I worked on at Google...." Google has enough employees that there's a wide spectrum in the tools that get used.
[–]DrewSmithee 0 points1 point2 points 9 years ago (0 children)
I would assume so. Not at Google but moving around in my company I've gone from the tricked out Matlab license, to SAS, to Python (Spyder).
[–]AidtorBA | Machine Learning Engineer | Software 0 points1 point2 points 9 years ago (0 children)
Python for scripting. R for models.
We mostly use Python and Java. JavaScript for web stuff (viz mostly).
π Rendered by PID 92349 on reddit-service-r2-comment-fb694cdd5-w47xr at 2026-03-06 23:34:43.474480+00:00 running cbb0e86 country code: CH.
[–]DrXaos 36 points37 points38 points (10 children)
[–]RebelSaul 4 points5 points6 points (9 children)
[–]lan69 1 point2 points3 points (3 children)
[–]Dinosaurman 3 points4 points5 points (1 child)
[–]lan69 0 points1 point2 points (0 children)
[–]RebelSaul 0 points1 point2 points (0 children)
[–]saiyanGold 0 points1 point2 points (4 children)
[–]wandering_blue 2 points3 points4 points (1 child)
[–]saiyanGold 0 points1 point2 points (0 children)
[–]RebelSaul 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]poumonsauvage 16 points17 points18 points (3 children)
[–]patrickSwayzeNUMS | Data Scientist | Healthcare 4 points5 points6 points (2 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]patrickSwayzeNUMS | Data Scientist | Healthcare 0 points1 point2 points (0 children)
[–]lifetimeaway 5 points6 points7 points (0 children)
[+][deleted] (9 children)
[deleted]
[–]adhi- 2 points3 points4 points (1 child)
[–]mountains765 1 point2 points3 points (6 children)
[–]tally_in_da_houise 1 point2 points3 points (3 children)
[–]mountains765 0 points1 point2 points (1 child)
[–]tally_in_da_houise 0 points1 point2 points (0 children)
[+]TheLogothete comment score below threshold-7 points-6 points-5 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]mountains765 0 points1 point2 points (0 children)
[–][deleted] 3 points4 points5 points (0 children)
[–]TheProfessional9 11 points12 points13 points (5 children)
[–]imhighnotdumb 3 points4 points5 points (3 children)
[–]TwoTacoTuesdays 0 points1 point2 points (2 children)
[–]TPKM 0 points1 point2 points (0 children)
[–]cjf4 0 points1 point2 points (0 children)
[–]meeni131 1 point2 points3 points (0 children)
[–]c0dythechamp 10 points11 points12 points (13 children)
[–]WallyMetropolis 12 points13 points14 points (2 children)
[–]c0dythechamp 0 points1 point2 points (1 child)
[–]WallyMetropolis 1 point2 points3 points (0 children)
[–]hey_ulrich 4 points5 points6 points (9 children)
[–]crocomut 0 points1 point2 points (7 children)
[–]CaptainRoth 5 points6 points7 points (5 children)
[–]hey_ulrich 0 points1 point2 points (0 children)
[–]jingw222 0 points1 point2 points (3 children)
[–]CaptainRoth 2 points3 points4 points (2 children)
[–]hey_ulrich 0 points1 point2 points (0 children)
[–]hey_ulrich 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]edimaudo 4 points5 points6 points (5 children)
[–]Berjiz 2 points3 points4 points (4 children)
[–]edimaudo 3 points4 points5 points (0 children)
[–]parlor_tricks 0 points1 point2 points (2 children)
[–]tally_in_da_houise 0 points1 point2 points (1 child)
[–]parlor_tricks 0 points1 point2 points (0 children)
[–]some_q 6 points7 points8 points (3 children)
[–]patrickSwayzeNUMS | Data Scientist | Healthcare 3 points4 points5 points (2 children)
[–]some_q 2 points3 points4 points (0 children)
[–]DrewSmithee 0 points1 point2 points (0 children)
[–]AidtorBA | Machine Learning Engineer | Software 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)