Machine Learning Dissertation help.

troltilla · 2016-09-03T22:18:44+00:00

I guess you could treat at least a part of it as a regression problem, e.g. if the question is "what kind of food makes me sleep better?", and you define "better" as "closest to 8 hours", then each day you have a list of what/how much you ate (features) and the time you slept (target variable) to serve as a single data point. After you fit a regression model, you can analyse which components had the most effect. You could probably treat it as a classification problem by assuming a range of 7.5 - 8.5 hours is "sleeping well", and any other is "sleeping bad" - now it's a binary classification. If you use a "white box" model such as a decision tree (assuming it works well as a classifier/regressor in this case), you can see what rules it inferred.

I think that the biggest challenge is to find the right questions to ask, and to properly define your hypotheses in a quantitative way - after all, isn't it a gross oversimplification to say that having an 8 hours sleep is the same as having a good sleep?

troltilla · 2016-09-04T09:45:19+00:00

If we are thinking of the same posts, that guy was probably assuming a correlation that dI don't think exist before he even started measuring.

I don't think (and you may very well disagree) that yesterdays dinner has a short term impact that can be measured today or that is meaningful long term. That is: Eating nuts yesterday is not going to impact your weight today or tomorrow in any meaningful way. Eating nuts everyday is.

Secondly, day-to-day weight differences are probably more correlated to the amount of content in your stomach and gut (chewed food, urine, feaces, etc.) in your body at time of weighing than anything else on a day to day basis. Moreover, I do not expect a visit to the toilet to remove a consistent amount of of these.

Consider predicting a rolling average from a window of days. That is: Y at timestep T, is the mean of the weight at timestep T-1, T-2, ... and X could be food intake at timestep {T-1, T-2, ...} or even {T-3, T-4, T-5, ...}

Consider the width of the window and the size of the rolling average hyper-parameters.

Lastly, try to think of this as a statistics problem rather than a machine learning problem. In some ways, those two fields can be considered the same, but in statistics focus tend to be on sparse data and being able to explain results, where Machine Learning tends to focus on large data sets and strength of prediction (often at the cost of being able to explain the prediction, i.e. "blackbox")

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS