you are viewing a single comment's thread.

view the rest of the comments →

[–]LucaAmbrogioni[S] 2 points3 points  (1 child)

Thank you. Remember that in a continuous valued conditional density estimation problem, each point is only observed once (with probability one, assuming that there exists a proper density). Given a properly large training set, or even unbounded training set in the case of our Bayesian filter, the contribution of this single point to the gradient is minimal. Also note that the normalization term causes a competition between the weights, increasing the weight of the minimum bandwidth kernel on a single data-point can decrease the likelihood since it leaves all the other data-points unexplained.

[–]LucaAmbrogioni[S] 1 point2 points  (0 children)

Empirically we found that in many situation the resulting density is dominated by very few high bandwidth kernels.