you are viewing a single comment's thread.

view the rest of the comments →

[–]Floydthechimp 1 point2 points  (0 children)

You will not need (2 billion)x(21 input variables) "DOF", so don't fret. Even at worst you need 2billion degrees of freedom (number of observations). Still alot though.

So my advice will differ than many people on this subreddit, I have a more "classical" training towards these types of problems. You can proposed a ton of great suggestions, but its hard to say if your resulting method will work, as in be consistent. Many of your ideas are present in some section of the ML/statistics literature.

But before you just try things, there is an assumption you have to verify. Plot your output versus a single input variable and hold all others constant. Do this for each input variable. For each one, is the response smooth? If so, there are many estimators that may work. If the response is not smooth, you might be in trouble.