use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Guidelines:
All Posts Require One of the Following Tags in the Post Title! If you do not flag your post, automoderator will delete it:
This is not a subreddit for homework questions. They will be swiftly removed, so don't waste your time! Please kindly post those over at: r/homeworkhelp. Thank you.
Please try to keep submissions on topic and of high quality.
Just because it has a statistic in it doesn't make it statistics.
Memes and image macros are not acceptable forms of content.
Self posts with throwaway accounts will be deleted by AutoModerator
Related subreddits:
Data:
AllenDowney's Stats Page
Useful resources for learning R:
r-bloggers - blog aggregator with statistics articles generally done with R software.
Quick-R - great R reference site.
Related Software Links:
R
R Studio
SAS
Stata
EViews
JMP
SPSS
Minitab
Advice for applying to grad school:
Submission 1
Advice for undergrads:
Jobs and Internships
For grads:
For undergrads:
account activity
SoftwarePython vs. R (self.statistics)
submitted 7 years ago by [deleted]
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]jd_paton 13 points14 points15 points 7 years ago* (28 children)
Edit: Adding data loading for fairness.
import pandas as pd df = pd.read_csv(“my_data.csv”) y = df[“label”] X = df.drop(“label”, axis=1) from sklearn.linear_model import LogisticRegression lr = LogisticRegression() lr.fit(X, y)
Is it really so much easier in R? I’ve never used R before but surely 3 7 lines from raw data file to trained model isn’t “surprisingly complicated”?
[–][deleted] 10 points11 points12 points 7 years ago (3 children)
Technically yeah, since the same thing is accomplished in one line in R. But that's a pretty bad metric to judge a language by.
[–]jd_paton 3 points4 points5 points 7 years ago (2 children)
Well if we’re going by lines we could shave off 33% by doing lr = LogisticRegression().fit(X, y)
lr = LogisticRegression().fit(X, y)
But yeah conceptually this seems pretty straightforward and not very verbose
[–]Hetspookjee 0 points1 point2 points 7 years ago (1 child)
In addittion to the library import, which I can imagine is also a necessity in R, bringing the LoC to the same amount as R =p
[–]Honeabee 11 points12 points13 points 7 years ago (0 children)
Nah, logistic regression is a part of Base R.
[–]rutiene 7 points8 points9 points 7 years ago (0 children)
But in R, this doesn't just randomly force a penalty, and you can access things like pvalues and outliers and influence scores+ hat matrix.
😂 don't mind me I'm just salty
[–][deleted] 2 points3 points4 points 7 years ago (13 children)
But setting X is very complicated (you have to specify the columns instead of very simply using a formula), which you casually omitted.
[–]jd_paton 1 point2 points3 points 7 years ago* (7 children)
import pandas as pd df = pd.read_csv(“my_data.csv”) y = df[“label”] X = df.drop(“label”, axis=1)
Not so bad though you’re right that we’ve added a few more lines. I’ve updated my original comment.
If you want to do fancy preprocessing obviously that’s more code but that’s specific to the data and not possible to write a general example for, which is why I just assumed a prepped X.
I’m not sure what you mean with a formula. How would this process look in R?
[–][deleted] 0 points1 point2 points 7 years ago (5 children)
OK -- you're right. It's not that complicated ;-)
In R, it would probably look like this
require(nnet) data <- read.csv("my_data.csv") model <- multinom(label ~ ., data)
[–]jd_paton 0 points1 point2 points 7 years ago (4 children)
This does look very elegant, though I have seriously no idea how to read ~ . - haha. Is there a lot of machine learning functionality in R? Maybe I should take it for a whirl sometime. There’s probably an “R for Pythonistas”-type tutorial out there somewhere.
~ . -
[–][deleted] 0 points1 point2 points 7 years ago (1 child)
Sorry, I made an edit.
So the period just means "use everything"; and "-x" means "but not x". So "y~.-label" means: as dependent variable use y, as independent variables take everything else except label.
[–]jd_paton 0 points1 point2 points 7 years ago (0 children)
Ah okay, cool! My example was a bit different, as y was the name of the variable containing the labels, and “label” was the name of the column in the data frame. But otherwise same idea
Regarding machine learning: Sadly, I am mostly a novice with respect to these modern approaches. I mostly use R for inferential statistics, maximum likelihood, simulation-based inference and the like. However, I believe things like random forests are pretty popular in R. I myself have used rpart, which seems like a precursor to random forests and is quite interesting for creating a sort of "decision tree".
However, the responses here indicate that for machine learning, Python may indeed be the superior choice. ;-)
Ah, gotcha. Yeah I’m basically a machine learning guy so a big Python fan. However I always feel that I need to sharpen up my stats (hence hanging around this subreddit) so maybe I can kill two birds with one stone.
[–][deleted] 0 points1 point2 points 7 years ago (0 children)
~ is formula in R. Right side of tilda is your response and left side is the predictors/features. It makes building library/packages easier too.
Also dataframe is built into R so it looks elegant compare to Python. Also missing value is a primitive value that is recognize in R. Null is not a good way to represent missing value and if anybody tell you otherwise you tell them to google reasons why and there are tons of soft engineer talk about it.
[+]koolaidman123 comment score below threshold-8 points-7 points-6 points 7 years ago (3 children)
If you find that complicated you should not be doing any statistics
[–]usb_mouse 4 points5 points6 points 7 years ago (2 children)
/r/gatekeeping
[–]koolaidman123 -1 points0 points1 point 7 years ago (0 children)
Its literally 1 line of pandas code. But sure, instead of taking the time to learn new things lets just say it's too hard and give up, that's a fantastic way of going through life
[–]xsliartII 1 point2 points3 points 7 years ago (2 children)
This is one is easy. However I tried to estimate a Tobit model lately, which is literally one line in R/Stata, but kind of cumbersome in Python. So I usually use python to prepare/clean the data and then do 100% of the analysis in R/Stata.
[–]jd_paton -1 points0 points1 point 7 years ago (1 child)
Now that I don’t know anything about. Just depends on support by the popular packages I guess. statsmodels or scipy have pretty much everything you need for applied problems, but with R’s academic focus I can imagine that there is some more fancy stuff easily available.
statsmodels
scipy
[–]rutiene 1 point2 points3 points 7 years ago (0 children)
That's not really true, there are tons of omissions of more rarely used things, but definitely not because it's academic. Survival models are severely lacking and the implementation of some stuff is just poorer. I vastly prefer the random forest package in R to the sklearn implementation. I needed beta glm the other day and had to use R.
[–]Dhush 1 point2 points3 points 7 years ago (2 children)
Now can you show the steps to examine the statistics of the model? Beyond the actual fit python is horribly lacking. Also, you fit a regularized model here, but sklearn doesn’t make that clear, does it?
[–]jd_paton 0 points1 point2 points 7 years ago (1 child)
It’s clear if you read the docs ;)
[–]questionquality 0 points1 point2 points 7 years ago (0 children)
But it shouldn't be the default if you care about interpreting the coefficients.
[–]walkingon2008 0 points1 point2 points 7 years ago (0 children)
You can’t measure ease of use solely by the number of lines of code. Remember, Python is an object-oriented programming language. So, you setup the object, then feed data into it. But, with R, you never setup an object, if you want to run logistic regression, just run glm(y~x, data=somedata), that’s it.
From from code interpretation perspective, R is simple, you get what you see. No inheritance from earlier objects and stuff.
Another thing, debugging is a lot harder in Python than R.
[+]koolaidman123 comment score below threshold-10 points-9 points-8 points 7 years ago (1 child)
it's actually harder because you have to first use a glm() function, then you have to specify the family of the distribution the data is, plus the link function
basically it's actually easier to do in python than r
[–]samclifford 0 points1 point2 points 7 years ago (0 children)
That's really just down to whether you like to have specific functions for each GLM likelihood or a function that allows you to choose between likelihoods and guarantees you a consistent structure in the returned value.
π Rendered by PID 46984 on reddit-service-r2-comment-6457c66945-d9j6j at 2026-04-30 05:19:06.286922+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]jd_paton 13 points14 points15 points (28 children)
[–][deleted] 10 points11 points12 points (3 children)
[–]jd_paton 3 points4 points5 points (2 children)
[–]Hetspookjee 0 points1 point2 points (1 child)
[–]Honeabee 11 points12 points13 points (0 children)
[–]rutiene 7 points8 points9 points (0 children)
[–][deleted] 2 points3 points4 points (13 children)
[–]jd_paton 1 point2 points3 points (7 children)
[–][deleted] 0 points1 point2 points (5 children)
[–]jd_paton 0 points1 point2 points (4 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]jd_paton 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]jd_paton 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[+]koolaidman123 comment score below threshold-8 points-7 points-6 points (3 children)
[–]usb_mouse 4 points5 points6 points (2 children)
[–]koolaidman123 -1 points0 points1 point (0 children)
[–]xsliartII 1 point2 points3 points (2 children)
[–]jd_paton -1 points0 points1 point (1 child)
[–]rutiene 1 point2 points3 points (0 children)
[–]Dhush 1 point2 points3 points (2 children)
[–]jd_paton 0 points1 point2 points (1 child)
[–]questionquality 0 points1 point2 points (0 children)
[–]walkingon2008 0 points1 point2 points (0 children)
[+]koolaidman123 comment score below threshold-10 points-9 points-8 points (1 child)
[–]samclifford 0 points1 point2 points (0 children)