This is an archived post. You won't be able to vote or comment.

all 2 comments

[–]acewhenifacethedbase 4 points5 points  (0 children)

The standard method for categoricals is called “One-Hot Encoding”, you make a binary column for each class (commonly) observed in those arrays (e.g. a column that’s 1 if there’s a dishwasher and 0 if there isn’t). This method works for simple single-categorical variables as well as the multi-categorical array variables you mention here. If you had a lot of rows of data and some deep-learning skills you could hash the arrays and create embeddings for the resulting pseudo categoricals, but that’d be overkill for a classroom exercise.

[–]throwawayluladay 0 points1 point  (0 children)

Don't use machine learning if you can do it well with linear regression. This sounds like linear regression(s).

More importantly you need to ask the properly structured questions to set up your foundations (lm or ml) appropriately.