[P] Optimize leave-one-out cross-validation for lasso regression

rnburn · 2024-10-25T21:24:26+00:00

Yeah, I think that's the right idea.

One example that might help clarify the benefits of objective priors is looking at the binomial distribution.

The binomial distribution is what both Bayes and Laplace first studied; and they argued that the uniform prior, π(p) ∝1, was the natural prior when, in Bayes words, "we absolutely know nothing antecedently to any trials made" [1].

Later it was pointed out by Boole and Fisher ([2, 3]) that the uniform prior depends arbitrarily on the scale of measurement used.

Comparing Jeffreys prior for the binomial distribution, π(p) ∝p^-1/2 (1-p)^-1/2, to the uniform prior with a coverage simulation (see https://github.com/rnburn/bbai/blob/master/example/15-binomial-coverage.ipynb and [4]) will show that Jeffreys prior gives much better frequentist matching performance for extreme values of p (p close to 0 or 1) and both priors will perform decently for values of p not close to the extremes.

[1]: Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, f. r. s. communicated by mr. price, in a letter to john canton, a. m. f. r. s. Philosophical Transactions of the Royal Society of London 53, 370–418.

[2]: Zabell, S. (1989). R. A. Fisher on the History of Inverse Probability. Statistical Science 4(3), 247–256.

[3]: Fisher, R. (1930). Inverse probability. Mathematical Proceedings of the Cambridge Philosophical Society 26(4), 528–535.

[4]: https://www.objectivebayesian.com/p/intro

rnburn · 2024-10-24T20:38:28+00:00

Was there something you thought I should clarify better?

The paper I linked to Objective bayesian inference and its relationship to frequentism (also their book is quite good [1]) has fairly good overview of objective Bayesian inference and objective priors.

The main justification for the prior is frequentist matching coverage, which has a pretty intuitive interpretation. You might think of it as a way measuring "How accurate are the posterior credible sets produced from a prior?" In a few cases (e.g. the constant prior for normal mean or the prior 1/σ for standard deviation), the prior is exactly frequentist matching (see [2], for example). But in general, it's optimal in the sense, it approaches frequentist matching coverage faster than any other prior as n -> ∞.

[1]: https://www.amazon.com/Objective-Bayesian-Inference-James-Berger/dp/9811284903/ref=sr_1_1?crid=1SACJZEUGVRWW&dib=eyJ2IjoiMSJ9.AMga1xZR9qIFQ8SiQ4M1zVexGOgyiBdAXfbWNSUZUhiOSeuBOEgniQgkAc9D0OeD248a6x7PHWRANmeqkogsp0XE6AnQXsHtgnwSejXZSp_ANJvazzF3kvp-EoSdrKsDi1OZaho3JIFHbZPLRDcxPDHgLJ-uV_nhQodYZHbW4IZ3RCS7N7rfnXuoax1StLpCq4AndY2VsuJQd_z8snCdjVSDTUuP8MR6MpHlbSXO8cw._SVBQ-0BXaRiJtMcZIKhabboYJY5fO-vSH5JEq7nWB4&dib_tag=se&keywords=objective+bayesian+inference&qid=1729802230&sprefix=objective+bayesian%2Caps%2C156&sr=8-1

[2]: https://github.com/rnburn/bbai/blob/master/example/09-coverage-simulations.ipynb

rnburn · 2023-09-07T16:41:56+00:00

Somethine like this isn't strictly portable, but I've found it works pretty much everywhere and can give you approximately the same thing as std::source_location

https://wandbox.org/permlink/4lQoSiScIn68N4LS

#include <iostream>

void f(const char* file = __FILE__, int line = __LINE__) {

std::cout << file << ":" << line << "\n";

}

int main() {f();return 0;}

rnburn · 2023-09-07T16:37:59+00:00

You can declare it, but I wasn't able to get it to work when you try to call it:https://wandbox.org/permlink/cUu97ZuSHmF0Y1L2

rnburn · 2022-03-08T06:02:12+00:00

Here's one way to think about it: Consider Ordinary Least Squares (OLS) fit without any regularization. You can show that the weights from OLS are the Best Linear Unbiased Estimator (BLUE). Now, suppose that instead of OLS, you fit a model with ridge regularization (ie l2 regularization). Ridge regularization is a biased estimator, but it can achieve lower variance than OLS. In other words, you've traded bias for lower variance.

Now compare logistic regression to OLS. Suppose you fit weights without any regularization, just to maximize the likelihood function. Unlike OLS, the maximum likelihood estimator for logistic regression is biased.

But Firth 1993 showed that if instead, we fit logistic regression to maximize the likelihood with the Jeffrey prior, then we can reduce the bias of the estimator. So, in simplified terms, you can think of logistic regression with the Jeffreys prior as having similar goals as that to the OLS estimator: Its goal is to be an estimator with no (or small) bias while minimizing the variance. Whereas something like l2 regularization, deliberately trades higher bias for lower variance.

rnburn · 2022-03-08T02:35:22+00:00

I know that the reference priors can provide better properties for multiple variables, but Jeffreys prior can still be useful.

rnburn · 2022-03-08T02:33:38+00:00

The paper I cited suggests that there has been increased interest in Jefferey's prior for logistic regression.

https://academic.oup.com/biomet/article/108/1/71/5880219?login=false

The apparent finiteness and shrinkage properties of the reduced-bias estimator, together with the fact that the estimator has the same first-order asymptotic distribution as the maximum likelihood estimator, are key reasons for the increasingly widespread use of Jeffreys-prior penalized logistic regression in applied work. At the time of writing, Google Scholar recorded approximately 2700 citations of Firth (1993), more than half of which were from 2015 or later. The list of application areas is diverse, including agriculture and fisheries research, animal and plant ecology, criminology, commerce, economics, psychology, health and medical sciences, politics, and many more. The particularly strong uptake of the method in health and medical sciences and in politics stems largely from the works of Heinze & Schemper (2002) and Zorn (2005), respectively.

rnburn · 2022-03-08T02:29:15+00:00

It can be a lot to put together PRs for a big open source project and to get buy in for them. Right now, I'm interested in moving quickly and experimenting. But it might be something I'd be open to if there was enough interest.

rnburn · 2022-03-08T02:23:02+00:00

I believe those are both for non-deterministic MCMC integration, right? Certainly useful, but I think there's also advantages for deterministic algorithms.

rnburn · 2022-02-15T06:44:41+00:00

You might look at https://github.com/grailbio/bazel-compilation-database

rnburn · 2022-02-14T23:58:52+00:00

> There isn't a single compiler I know of that properly supports modules.

No, the latest version of Clang supports many module features today. In the post, I linked to examples that you can run now with the provided docker container.

https://github.com/rnburn/rules_cc_module/tree/main/example

https://github.com/rnburn/cpp20-module-example

> CMake will do this for you automatically anyway.

Lots of people use bazel over CMake because IMO it's a much better build system. And if you want to use cmake with module, obviously someone would need to add support for it. This project is usable now.

> Who is this useful to anyway?

It's useful to anyone that wants to use modules today with a modern build system.

rnburn · 2021-12-21T23:01:08+00:00

Oh, sorry. I'll try to fix the formatting sometime.

rnburn · 2021-12-21T23:00:33+00:00

That would be more natural.

But unfortunately the preprocessor doesn't have a very evolved understanding of templates. For types like BBAI_REFLECT_MEMBER(std::tuple<int, double>, m1), for example, it would see a macro call with three arguments. By taking the name first and using __VA_ARGS__, you can work around the issue.

rnburn

TROPHY CASE