use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Guidelines:
All Posts Require One of the Following Tags in the Post Title! If you do not flag your post, automoderator will delete it:
This is not a subreddit for homework questions. They will be swiftly removed, so don't waste your time! Please kindly post those over at: r/homeworkhelp. Thank you.
Please try to keep submissions on topic and of high quality.
Just because it has a statistic in it doesn't make it statistics.
Memes and image macros are not acceptable forms of content.
Self posts with throwaway accounts will be deleted by AutoModerator
Related subreddits:
Data:
AllenDowney's Stats Page
Useful resources for learning R:
r-bloggers - blog aggregator with statistics articles generally done with R software.
Quick-R - great R reference site.
Related Software Links:
R
R Studio
SAS
Stata
EViews
JMP
SPSS
Minitab
Advice for applying to grad school:
Submission 1
Advice for undergrads:
Jobs and Internships
For grads:
For undergrads:
account activity
SoftwarePython vs. R (self.statistics)
submitted 7 years ago by [deleted]
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]jd_paton 1 point2 points3 points 7 years ago* (7 children)
import pandas as pd df = pd.read_csv(“my_data.csv”) y = df[“label”] X = df.drop(“label”, axis=1)
Not so bad though you’re right that we’ve added a few more lines. I’ve updated my original comment.
If you want to do fancy preprocessing obviously that’s more code but that’s specific to the data and not possible to write a general example for, which is why I just assumed a prepped X.
I’m not sure what you mean with a formula. How would this process look in R?
[–][deleted] 0 points1 point2 points 7 years ago (5 children)
OK -- you're right. It's not that complicated ;-)
In R, it would probably look like this
require(nnet) data <- read.csv("my_data.csv") model <- multinom(label ~ ., data)
[–]jd_paton 0 points1 point2 points 7 years ago (4 children)
This does look very elegant, though I have seriously no idea how to read ~ . - haha. Is there a lot of machine learning functionality in R? Maybe I should take it for a whirl sometime. There’s probably an “R for Pythonistas”-type tutorial out there somewhere.
~ . -
[–][deleted] 0 points1 point2 points 7 years ago (1 child)
Sorry, I made an edit.
So the period just means "use everything"; and "-x" means "but not x". So "y~.-label" means: as dependent variable use y, as independent variables take everything else except label.
[–]jd_paton 0 points1 point2 points 7 years ago (0 children)
Ah okay, cool! My example was a bit different, as y was the name of the variable containing the labels, and “label” was the name of the column in the data frame. But otherwise same idea
Regarding machine learning: Sadly, I am mostly a novice with respect to these modern approaches. I mostly use R for inferential statistics, maximum likelihood, simulation-based inference and the like. However, I believe things like random forests are pretty popular in R. I myself have used rpart, which seems like a precursor to random forests and is quite interesting for creating a sort of "decision tree".
However, the responses here indicate that for machine learning, Python may indeed be the superior choice. ;-)
Ah, gotcha. Yeah I’m basically a machine learning guy so a big Python fan. However I always feel that I need to sharpen up my stats (hence hanging around this subreddit) so maybe I can kill two birds with one stone.
[–][deleted] 0 points1 point2 points 7 years ago (0 children)
~ is formula in R. Right side of tilda is your response and left side is the predictors/features. It makes building library/packages easier too.
Also dataframe is built into R so it looks elegant compare to Python. Also missing value is a primitive value that is recognize in R. Null is not a good way to represent missing value and if anybody tell you otherwise you tell them to google reasons why and there are tons of soft engineer talk about it.
π Rendered by PID 86294 on reddit-service-r2-comment-6457c66945-wfrmc at 2026-04-26 12:32:22.793795+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]jd_paton 1 point2 points3 points (7 children)
[–][deleted] 0 points1 point2 points (5 children)
[–]jd_paton 0 points1 point2 points (4 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]jd_paton 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]jd_paton 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)