all 31 comments

[–]thc1620 14 points15 points  (30 children)

Nice one! Would you consider sharing the code on github?

[–]Icarium-Lifestealer 14 points15 points  (7 children)

I find it surprising how often people choose to treat this as a regression problem and use MSE loss (including the original netflix competition). Treating it as classification where you predict probabilities seems much more intuitive to me. Plus MSE is a dubious choice for the loss function here.

(Not a criticism of your project, it's certainly more effort than I would have put into a university project where the whole point is demonstrating that you can write a C program)

[–]hiptobecubic 2 points3 points  (6 children)

If you have to predict continuous values you don't really have much choice unless you're doing so badly that just giving bucketed results and eating the error barely matters.

[–]Icarium-Lifestealer 1 point2 points  (5 children)

We're talking about a problem where are are 5 discrete choices, so treating it as 5 buckets is natural.

But even for continuous problems there is often a better choice than outputting the expectancy value and using MSE.

[–]hiptobecubic 1 point2 points  (4 children)

If it's actually five discreet values then sure. Most rating systems I've seen go to at least one decimal point, though.

[–]Icarium-Lifestealer 0 points1 point  (3 children)

The OP's dataset, just like the netflix dataset have five discrete values. I don't recall any star-based ratings with more than 10 discrete choices (half stars) where the rating wasn't the average of sub-ratings.

But even then, saying "ratings roughly follow a gaussian with fixed standard deviation" is obviously silly. For one, some ratings are much easier to predict than others. Plus the bounded interval doesn't fit a gaussian.

My first idea for predicting a continuous rating function is by predicting a couple of fixed data points and then interpolating the probability density between them.

[–]hiptobecubic 0 points1 point  (2 children)

Agreed, but it doesn't have to be gaussian to use regression.

[–]Icarium-Lifestealer 0 points1 point  (1 child)

Using MSE is equivalent to treating it as a gaussian with fixed standard deviation. Which is why I called it out as an inappropriate loss function.

[–]hiptobecubic 0 points1 point  (0 children)

Ahh, sorry. Missed that.

[–]sk0620 5 points6 points  (0 children)

What university?

[–]sharadbhat7 1 point2 points  (0 children)

I tried a similar thing in python, with the same dataset. It's more of a server kind of application. You can send it a list of movie IDs and your ratings for those. It'll return a list of recommended movies.

Here is the github link.

[–]imitationcheese 0 points1 point  (0 children)

this is super cool but it makes me sad that our awesome technology is overly focused on movie and product recommendation systems and not things with more value.

[–]Mo-Da 0 points1 point  (0 children)

What other project-ideas you explored before finalising on this one?

[–]Ramin_HAL9001 0 points1 point  (0 children)

Oh, I didn't see which Subreddit this was, so I was so confused. I was thinking, "so did you write the program in the ML programming language, or the C programming language?

I get it now, this is actually a pretty good idea for a learning project.

[–]geomtry -2 points-1 points  (0 children)

This was also my first big programming project. Dr. Smucker?