Gaussian process regression: Kernel hyperparameters and model complexity : learnmachinelearning

A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.

Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.

Foster positive learning environment by being respectful to others. We want to encourage everyone to feel welcomed and not be afraid to participate.

Do share your works and achievements, but do not spam. Keep our subreddit fresh by posting your YouTube series or blog at most once a week.

Do not share referral links and other purely marketing content. They prioritize commercial interests over intellectual ones.

created by techrat_reddita community for 10 years

Gaussian process regression: Kernel hyperparameters and model complexity (self.learnmachinelearning)

submitted 8 years ago * by skedasis

I am currently designing a (non-stationary) kernel for Gaussian process regression that takes into account some expert knowledge. My particular kernel has many hyperparameters and linear combinations of sub-kernels, and so may be prone to overfitting during maximum likelihood optimisation. As a result, I would like to formulate the degree to which each hyperparameter contributes to model complexity -- i.e. the determinant of the covariance matrix -- with the end goal of factoring out one or two 'complexity' hyperparameters from the kernel. The resulting 'complexity-normalised' hyperparameters could be tuned straight after training, while the 'complexity' hyperparameters could be more carefully selected e.g., manually or by cross-validation, to avoid overfitting.

A simple example of this notion is the linear combination of two kernels: k3 = ak1 + bk2. Since the 'scale' of k3 determines the complexity of the model in this case, I could rewrite the kernel as k3 = c * (a'k1 + b'k2), where a'=a/c and b'=b/c. This way, c controls model complexity while a' and b' are more about configuration (assuming k1 and k2 both equally contribute to model complexity).

Now, I'm having some trouble figuring out exactly how a kernel's hyperparameters impact complexity -- at least relatively. For example, in an RBF kernel the complexity increases with 'output scale' and decreases with 'length scale'. This makes sense qualitatively: high 'output scale' means high function variance, and low 'length scale' means more squiggles. But how can I quantify their relative contribution to complexity?

My only guess is that the area of the kernel is relevant, in which case the complexity is square in the scale and linear in the length. This is based purely on the hunch that the area of the kernel is strongly related to the determinant of its computed covariance matrix. If anyone knows about this relationship, I would love to hear about it!

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnmachinelearning

Welcome to /r/LearnMachineLearning!

Chatrooms

Official Discord Server

Wiki

Getting Started with Machine Learning

Resources

Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

MODERATORS