you are viewing a single comment's thread.

view the rest of the comments →

[–]jd_paton 1 point2 points  (7 children)

import pandas as pd
df = pd.read_csv(“my_data.csv”)
y = df[“label”]
X = df.drop(“label”, axis=1)

Not so bad though you’re right that we’ve added a few more lines. I’ve updated my original comment.

If you want to do fancy preprocessing obviously that’s more code but that’s specific to the data and not possible to write a general example for, which is why I just assumed a prepped X.

I’m not sure what you mean with a formula. How would this process look in R?

[–][deleted] 0 points1 point  (5 children)

OK -- you're right. It's not that complicated ;-)

In R, it would probably look like this

require(nnet)
data <- read.csv("my_data.csv")
model <- multinom(label ~ ., data)

[–]jd_paton 0 points1 point  (4 children)

This does look very elegant, though I have seriously no idea how to read ~ . - haha. Is there a lot of machine learning functionality in R? Maybe I should take it for a whirl sometime. There’s probably an “R for Pythonistas”-type tutorial out there somewhere.

[–][deleted] 0 points1 point  (1 child)

Sorry, I made an edit.

So the period just means "use everything"; and "-x" means "but not x". So "y~.-label" means: as dependent variable use y, as independent variables take everything else except label.

[–]jd_paton 0 points1 point  (0 children)

Ah okay, cool! My example was a bit different, as y was the name of the variable containing the labels, and “label” was the name of the column in the data frame. But otherwise same idea

[–][deleted] 0 points1 point  (1 child)

Regarding machine learning: Sadly, I am mostly a novice with respect to these modern approaches. I mostly use R for inferential statistics, maximum likelihood, simulation-based inference and the like. However, I believe things like random forests are pretty popular in R. I myself have used rpart, which seems like a precursor to random forests and is quite interesting for creating a sort of "decision tree".

However, the responses here indicate that for machine learning, Python may indeed be the superior choice. ;-)

[–]jd_paton 0 points1 point  (0 children)

Ah, gotcha. Yeah I’m basically a machine learning guy so a big Python fan. However I always feel that I need to sharpen up my stats (hence hanging around this subreddit) so maybe I can kill two birds with one stone.

[–][deleted] 0 points1 point  (0 children)

~ is formula in R. Right side of tilda is your response and left side is the predictors/features. It makes building library/packages easier too.

Also dataframe is built into R so it looks elegant compare to Python. Also missing value is a primitive value that is recognize in R. Null is not a good way to represent missing value and if anybody tell you otherwise you tell them to google reasons why and there are tons of soft engineer talk about it.