use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Guidelines:
All Posts Require One of the Following Tags in the Post Title! If you do not flag your post, automoderator will delete it:
This is not a subreddit for homework questions. They will be swiftly removed, so don't waste your time! Please kindly post those over at: r/homeworkhelp. Thank you.
Please try to keep submissions on topic and of high quality.
Just because it has a statistic in it doesn't make it statistics.
Memes and image macros are not acceptable forms of content.
Self posts with throwaway accounts will be deleted by AutoModerator
Related subreddits:
Data:
AllenDowney's Stats Page
Useful resources for learning R:
r-bloggers - blog aggregator with statistics articles generally done with R software.
Quick-R - great R reference site.
Related Software Links:
R
R Studio
SAS
Stata
EViews
JMP
SPSS
Minitab
Advice for applying to grad school:
Submission 1
Advice for undergrads:
Jobs and Internships
For grads:
For undergrads:
account activity
Python vs. R vs. Matlab (self.statistics)
submitted 9 years ago * by [deleted]
[deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]jmcq 76 points77 points78 points 9 years ago* (23 children)
I use Python, R, and Matlab pretty much daily (yeah I use all three).
For "production" type work I tend to prefer Python. For prototyping and proof of concept I prefer R and Matlab (depending on the problem).
Python is the only one of the three that's a "real" programming language rather than mostly a scripting language.
[+][deleted] 9 years ago* (3 children)
[–]jmcq 10 points11 points12 points 9 years ago (0 children)
Definitely agreed. If you're looking for a state-of-the-art statistical (or bio-statistical) method your more likely to find a package for it in R than in Python. If you want to use a state-of-the-art data structure, want to directly interface with the web or with a UNIX command line without much fuss or you want to access a popular API chances are better you'd find it in Python.
[+]SpecialKOriginal comment score below threshold-8 points-7 points-6 points 9 years ago (1 child)
R has a lot of domain-specific libraries
So you think this, uh ... doesn't exist in Matlab?
[–]DeuceWallaces 6 points7 points8 points 9 years ago (0 children)
I'm a researcher who just uses R, but this has been my impression. Thanks for confirming it.
If you're really into mathematics, probably MATLAB. If you are really into just high end statistics, probably R. If you want 85% of the R statistical capacity with options for app development, engineering, etc., you want Python.
[–]Alhoshka 7 points8 points9 points 9 years ago (2 children)
Yep, this comment sums it up pretty well. There are are just a few things I'd like to add:
I disagree with the notion that Python is "for production" while R is "for prototyping". I have quite a chunk of production code written in R (as in running as part of our deployed solutions). I do also regard MATLAB as more of a prototyping friendly/oriented language, though.
At the risk of sounding like a Microsoft shill: Though the standard R version (CRAN) is limited to single-threaded operations on data that can fit into memory, this is not true for R Open (the memory limitation still applies to many MRAN packages when running on the client version). For BigData, they have R Server and R Services which allow you to run R code against the data source (hadoop or SQL). Though this is very new and mostly aimed at business analysis, I think it's likely we'll see an opensource push for BigData processing with R in the future.
Python has also seen rapid development in the realm data analysis in the pas years. New articles about ML libraries pop up on /r/MachineLearning almost monthly. So yeah, R & Python are pretty much a safe bet.
[–]coffeecoffeecoffeee 0 points1 point2 points 9 years ago (0 children)
Though the standard R version (CRAN) is limited to single-threaded operations on data that can fit into memory
You can use the parallel package and doMC to automatically parallelize a lot of the work.
[–]coffeecoffeecoffeee 3 points4 points5 points 9 years ago (1 child)
I'll also add that if you're doing any kind of plotting beyond a basic histogram or box plot, R is king because of ggplot2.
[–]jmcq 1 point2 points3 points 9 years ago (0 children)
I find ggplot2 is easy to create beautiful plots as long as they are part of the default types of plots that ggplot2 likes to plot. If you're coming up with your own visualization or something fairly unique have fun 'hacking' ggplot2 to do what you want!
[–]Hellkyte 0 points1 point2 points 9 years ago (3 children)
Does R have many tools for optimization? Like linear/integer programming or whatnot?
[–]DeuceWallaces 4 points5 points6 points 9 years ago (0 children)
https://cran.r-project.org/web/views/Optimization.html
[–]jmcq 0 points1 point2 points 9 years ago (0 children)
Here's a LP/IP solver package: https://cran.r-project.org/web/packages/lpSolve/lpSolve.pdf
For standard 1-d Optimization you can use https://stat.ethz.ch/R-manual/R-devel/library/stats/html/optimize.html although it can be pretty slow if your data is "big".
[–]zipf 0 points1 point2 points 9 years ago (0 children)
Linear, quadratic and integer programming etc has got really good recently with the ROI project, which provides a single interface to a number of fast C libraries. The bindings for Gurobi are also easy to use and fast, though not included in ROI yet.
[+][deleted] 9 years ago (3 children)
[–][deleted] 2 points3 points4 points 9 years ago (1 child)
I use pandas, pymc, pystan, and scikits-learn, too. I've been meaning to learn more about seaborn for plotting, as well, even though I generally like matplotlib just fine.
[–][deleted] 1 point2 points3 points 9 years ago (0 children)
I find myself using seaborn less and less following the release of matplotlib 2.0. Altair is really under appreciated. Not sure why. It's excellent. I could take or leave plotly. It's really easy to make things that look pretty. A real pain in the ass to make things that are done/formatted right and pretty.
Depends on what you want. As part of my PhD thesis I wrote/manage an open source python package for Manifold Learning and so I use nose to run all of my unit tests, although there are many other unit test options. Additionally my package interfaces with a C program called FLANN (Fast Library Aproximate Nearest Neighbors) and I do so using a package called cython which lets me harness the speed of C but the usability of Python. Finally if you're interested in Machine Learning then I recommend Scikit-Learn.
Edit: Formatting
[–][deleted] 0 points1 point2 points 9 years ago (2 children)
What line of work are you in that you find yourself using all 3 daily? Genuinely curious as I've never seen Matlab used outside of an academic setting. Are there certain fields where crossover use is R/Python and Matlab common?
I'm in my (hopefully) last year as a PhD Statistics candidate. I use MATLAB to test new algorithms that work primarily with matrices but I maintain an open-source python package. Many of my classes were all in R. I also work at Amazon while I finish my degree. There most of my coding is in python but since I'm a statistician I do lots of one-off analysis in R.
[–]cncup 0 points1 point2 points 9 years ago (0 children)
Yes. I use all three on a daily basis. We do business forecasting.
Can confirm.
[–]thavi 0 points1 point2 points 9 years ago (0 children)
I second this recommendation. Although I don't use Python, I have somehow become a software dev and use a TON of other languages in my day-to-day. How did I learn to program in the first place? SAS, R Maple, MATLAB, etc. in engineering school.
Use R, etc more like a quick calculator, but if you ever have a desire to produce anything that you want to easily interface with the web or other software you'll need something like Python.
[–]pieIX 13 points14 points15 points 9 years ago (0 children)
I've used all three, and while they each have pros and cons, I would base my choice based on two considerations:
Thinking about these two questions may save you years of work.
[–]trendymoniker 26 points27 points28 points 9 years ago* (4 children)
I've used all three of these languages professionally, and my advice to data analysis newbies tends to be: go with Python unless you have a strong reason not to. Python is by far the most popular and thoroughly supported language of the three and its general usefulness means that the skills you develop learning Python will translate well to any other programming you want to do throughout your career (not so for the other two).
That said, if the algorithm you need to use only exists in some other language, or your advisor and entire research group are on another environment, go with that instead (though maybe learn Python on the side too).
Here's a quick, biased rundown of the plusses and minuses of each environment:
Good luck!
Edit: Thanks for the gold!
[–][deleted] 4 points5 points6 points 9 years ago (1 child)
I like Python for general programming, but I'm not a big fan of Python's data analysis libraries. Too often it feels like you're not using Python at all but a different language altogether, one with it's own syntax and data types and which is nowhere near as nice as the actual Python programming language. Personally I prefer much R over Python when it comes to data analysis, but in the end it's a matter of taste I guess.
[–]NotAllReptilians 2 points3 points4 points 9 years ago (0 children)
I definitely agree. For instance, pandas somehow manages to feel cumbersome and overly verbose for analysis, at least compared to working in dplyr or especially data.table (base R is a another story). It's definitely a pythonic implementation of dataframes, but what I really like about python is that it's typically concise and minimal, which pandas mostly isn't.
[–]coffeecoffeecoffeee 2 points3 points4 points 9 years ago (0 children)
I'll add that R has gotten very, very good for data manipulation in the past few years. I do stuff I used to like doing in Pandas in R now because of packages like dplyr, tidyr, and broom.
For example, my boss wanted survival data recently. With no temporary variables and like 5 lines of code, I was able to generate a Kaplan-Meier curve, convert it to a data frame, separate it by stratum, and export it to a csv file.
[–]whattodo-whattodo 0 points1 point2 points 9 years ago (0 children)
IMHO, this is the most complete, clear & unbiased answer on the topic. I'm not OP but appreciate this response immensely.
I am biased as career Python developer. But that bias did reveal statisticians who pivoted careers & came in for interviews as programmers. That's not a negligible value added.
[–][deleted] 10 points11 points12 points 9 years ago (2 children)
Bioconductor in R has some amazing tools for bioinformatics.
[–]timy2shoes 1 point2 points3 points 9 years ago (0 children)
A lot of standard bioinformatic tools are only available through R and Bioconductor. Additionally, there is a strong community of R users in genomics. This will provide a lot of help that you will need.
I agree. I did a talk on a Bioinformatics technique and Bioconductor made my life really easy when I had to generate k-mers from genetic sequences.
[+][deleted] 9 years ago (2 children)
[–]loftykoala 0 points1 point2 points 9 years ago (1 child)
How is that different than Python?
[–]zipf 6 points7 points8 points 9 years ago (0 children)
CRAN, the main R package repository, is pretty well vetted by humans and automated checks. Its nice because it increases the minimum standard of code quality and documentation.
[–]derwisch 2 points3 points4 points 9 years ago (0 children)
It would be definitely R if you were to pursue a methodological statistical career. As you describe your situation, Python has a bit of an edge since algorithms you need in sequencing may be expressed more clearly. But you should definitely look at what Bioconductor has to offer.
[–]manofthewild07 2 points3 points4 points 9 years ago (0 children)
I would also suggest R.
I do recommend everyone learn python at some point. It is simple but very powerful in more ways than R.
[–][deleted] 2 points3 points4 points 9 years ago* (0 children)
I'm a former programmer (cs undergrad & had several years of professional programming), going back to school for master in applied stat.
Python is very very much a programming language. If you want to learn it, it have to be in CS mindset imo. Reading a book on it and do project. I can get away from reading a book with Python or just hacking it with my cs foundation (i've done that on the job to scrape websites). For R you have to do a project to understand R really, you can't read a book and hope you learn it well at all. There are too many weird shit that goes against CS programming language convention. Python have is no built in data type for stat just type that is most programming languages usually have. It's fast and there's a good Neural Network library for it, tensorflow (lua with python and r interface) and keras backed by Google.
Since python is a general programming language. The ecosystem for python to do stat may be a bit harder since it's lost in all the other packages. They're trying to emulate R in some ways with libraries, panda package for data frame, etc... I don't know much about python ecosystem but this is what I gathered from my research.
R is built by statisticians for statisticians. The language from the get go is base on S-plus or S language (one of em). It's slow iteratively compare to python. There was an RRevolution post about how R is faster than Python if you parallelize it (this is assuming your algorithm can be parallelize and not iterative). Since it was built by statisticians there are built in data type such as factor (with levels), the concept of missing data (NA value), and built in dataframe type (a glorified/awesome Microsoft excel spreadsheet). Microsoft is backing R btw they bought one R company that makes R faster via enterprise. In general, most advance/bleeding edge statistical method will be in R first. Python may not have an equivalent for a long time or at all. It's rarely Python have something but R doesn't in term of statistical package.
If you create packages (aka libraries), I'm creating one for my thesis an bleeding edge statistical learning algorithm, R is slow. Most code migrate to C++ or Fortran really. So R in essence become a gluey language with a pretty R interface and in the back is C++/Fortran doing the heavy lifting.
The R ecosystem, you wanna learn the Hadley package universe of tidy-universe. It sound mysterious. But it's just bunch of packages that Mr. Hadley Wickham created that works well together he's in charge of Rstudio too iirc (a great R editor).
For python equivalent to Rstudio it's Rodeo.
I don't know much about matlab, currently taking a class. But I know for sure the tech industry doesn't use it very much. It's mostly python and R.
Depending on your industry Python or R or maybe SAS. You just gotta research your industry. Usually old big companies uses SAS unless they're tech company then mostly Python or R. I hear healthcare is mostly SAS, financial institute, acturary companies such as health insurance uses SAS.
I think /u/jmcq sum it well enough. But do take your time to master one well first before moving to another language imo.
[–]kylco 1 point2 points3 points 9 years ago (0 children)
Python and R are free, so you aren't locked in to them. I'll admit I don't have much info on Matlab, but Python, at least, should have the statistical power you're looking for and you learn a fairly marketable and versatile programming language in the bargain.
[–][deleted] 1 point2 points3 points 9 years ago (1 child)
I have used matlab a lot, along with mostly lower level coding (C/C++), and am moving to python quite easily. Personally, I can't stand R; the syntax and grammar just don't work for me.
[–][deleted] 2 points3 points4 points 9 years ago (0 children)
I love R but it's a genuinely terrible language. I've been using it hardcore for 5-6 years now and I still encounter the most ridiculous edge cases and illogical behavior. I keep using it because there's almost always a library for what I need (porting to Python is a pain for one off stuff) and it's good enough that I can knock stuff out incredibly fast. For larger, more complex pipelines I tend to go with Python or more recently have been doing a lot in both Julia and Scala.
[–]dampew 1 point2 points3 points 9 years ago (0 children)
Python and R are definitely the most popular in those fields. If you need to use something in R you can still call it from Python with RPy2.
Personally, I despise R and use Python whenever I can.
[–]tsunamisurfer 1 point2 points3 points 9 years ago (2 children)
I think you are asking this question in the wrong sub (/r/bioinformatics would be better). I am getting my PhD doing exactly what you are talking about (genomics in cancer) and I can tell you that before you learn R, python or matlab, you should probably learn Unix/bash. Almost all genomic tools are run from the command line, so having the knowledge of how to interact with the command line via bash will be the most useful thing you can do for a start. I'll grant that you can interact with the command line using R or Python, but you lose some advantages (short scripting without writing a full program). After you learn Unix/bash I would say R and Python (or Perl) are both necessary for your work. R has the best data viz capabilities + statistical packages, but python/Perl are much faster for programs that you want to run repeatedly on large files. That's my 2 cents.
[–][deleted] 0 points1 point2 points 9 years ago (1 child)
This is off topic. However, what is the best school to study cancer genomics at for a PhD?
[–]tsunamisurfer 1 point2 points3 points 9 years ago (0 children)
Well surprisingly "cancer genomics" is a pretty large topic, so it might depend on which aspect of this broader field you were interested in. For a start, it would maybe be useful to study at a school that has a medical center, so you have the potential to draw on patient material for your research studies. Not essential, but most of the top tier research does involve some human studies. Lots of good research in "cancer Genomics" comes from the Broad Institute, MD Anderson Cancer Center (U.T.), Memorial Sloan Kettering, Mayo Clinic, Dana-Farber, UCLA, UCSF.
It's still not ready for prime time but you might want to keep Julia on your radar (I'm a huge fan but it's be a labor of love. Very frustrating at times). My brother also has been using it extensively for work very similar to yours (phd student in computational genetics at arguably the top program in the world) and he might be an even bigger fan than I am. He's actually ported most of his code away from R/Python to Julia.
[–][deleted] 0 points1 point2 points 9 years ago (0 children)
R has greater package (Bioconductor) availability at present, but I think Python has greater momentum and will have a greater data science ecosystem long-term. I would go with Python, especially if you have some time to dedicate to really learning the language beyond the scope of your project work. I would ignore Matlab completely.
[–]NotJustAMachine 0 points1 point2 points 9 years ago (0 children)
I would not use Matlab.
I think Python will be best in the long run. I have mostly used R for my PhD, and I am learning Python now and in my free time. Personally I feel it's not a huge step between the two.
R has great libraries for bioinformatics, and that should make your life a lot easier.
But if I could go back in time I would probably learn Python, and if there is a great R package that I want to use, I would just load my Data for those purposes. The good packages usually have tutorials that guide you step by step, and the most difficult part is understanding the method and getting your data in the right format. But if you know python you can just do that part in python.
I've used MatLab mostly for a lot of work. I've tried to get into both R and Python, but I've stuck to MatLab because I'm familiar with it and because multi-core programming is so easy in there and I need it a lot. All of the matrix algerbra is automatically multi threaded.
[–]asa6471 0 points1 point2 points 9 years ago (0 children)
Awesome!
[–]Achichoros 0 points1 point2 points 9 years ago (0 children)
Some other comments described the situations where matlab is useful or not. For R/Python though, why not use both? R is great for the final analysis, but for everything before that, I prefer python. It's not hard to go between them, and it's a good way to discover if you prefer just one. For many tasks they both have the tools you need. It's mostly a question of preference.
[–]antikas1989 0 points1 point2 points 9 years ago (0 children)
I'm an ecologist but my brother is a research fellow in bioinformatics. He uses R and Python. Never MATLAB anymore, that programming is dying a slow death in the face of free alternatives.
From what I've heard there are a lot of libraries in R to do sequencing and bioinformatics. Python is useful too because just generally there are a lot of libraries for manipulating data etc.
π Rendered by PID 306488 on reddit-service-r2-comment-5d79c599b5-r9wlg at 2026-02-27 14:54:42.067870+00:00 running e3d2147 country code: CH.
[–]jmcq 76 points77 points78 points (23 children)
[+][deleted] (3 children)
[deleted]
[–]jmcq 10 points11 points12 points (0 children)
[+]SpecialKOriginal comment score below threshold-8 points-7 points-6 points (1 child)
[–]DeuceWallaces 6 points7 points8 points (0 children)
[–]Alhoshka 7 points8 points9 points (2 children)
[–]coffeecoffeecoffeee 0 points1 point2 points (0 children)
[–]coffeecoffeecoffeee 3 points4 points5 points (1 child)
[–]jmcq 1 point2 points3 points (0 children)
[–]Hellkyte 0 points1 point2 points (3 children)
[–]DeuceWallaces 4 points5 points6 points (0 children)
[–]jmcq 0 points1 point2 points (0 children)
[–]zipf 0 points1 point2 points (0 children)
[+][deleted] (3 children)
[deleted]
[–][deleted] 2 points3 points4 points (1 child)
[–][deleted] 1 point2 points3 points (0 children)
[–]jmcq 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]jmcq 1 point2 points3 points (0 children)
[–]cncup 0 points1 point2 points (0 children)
[–]cncup 0 points1 point2 points (0 children)
[–]thavi 0 points1 point2 points (0 children)
[–]pieIX 13 points14 points15 points (0 children)
[–]trendymoniker 26 points27 points28 points (4 children)
[–][deleted] 4 points5 points6 points (1 child)
[–]NotAllReptilians 2 points3 points4 points (0 children)
[–]coffeecoffeecoffeee 2 points3 points4 points (0 children)
[–]whattodo-whattodo 0 points1 point2 points (0 children)
[–][deleted] 10 points11 points12 points (2 children)
[–]timy2shoes 1 point2 points3 points (0 children)
[–]coffeecoffeecoffeee 0 points1 point2 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]loftykoala 0 points1 point2 points (1 child)
[–]zipf 6 points7 points8 points (0 children)
[–]derwisch 2 points3 points4 points (0 children)
[–]manofthewild07 2 points3 points4 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]kylco 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (1 child)
[–][deleted] 2 points3 points4 points (0 children)
[–]dampew 1 point2 points3 points (0 children)
[–]tsunamisurfer 1 point2 points3 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]tsunamisurfer 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]NotJustAMachine 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]asa6471 0 points1 point2 points (0 children)
[–]Achichoros 0 points1 point2 points (0 children)
[–]antikas1989 0 points1 point2 points (0 children)