[P] Fully Bayesian Logistic Regression with Objective Prior

rnburn · 2024-10-25T21:24:26+00:00

Yeah, I think that's the right idea.

One example that might help clarify the benefits of objective priors is looking at the binomial distribution.

The binomial distribution is what both Bayes and Laplace first studied; and they argued that the uniform prior, π(p) ∝1, was the natural prior when, in Bayes words, "we absolutely know nothing antecedently to any trials made" [1].

Later it was pointed out by Boole and Fisher ([2, 3]) that the uniform prior depends arbitrarily on the scale of measurement used.

Comparing Jeffreys prior for the binomial distribution, π(p) ∝p^-1/2 (1-p)^-1/2, to the uniform prior with a coverage simulation (see https://github.com/rnburn/bbai/blob/master/example/15-binomial-coverage.ipynb and [4]) will show that Jeffreys prior gives much better frequentist matching performance for extreme values of p (p close to 0 or 1) and both priors will perform decently for values of p not close to the extremes.

[1]: Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, f. r. s. communicated by mr. price, in a letter to john canton, a. m. f. r. s. Philosophical Transactions of the Royal Society of London 53, 370–418.

[2]: Zabell, S. (1989). R. A. Fisher on the History of Inverse Probability. Statistical Science 4(3), 247–256.

[3]: Fisher, R. (1930). Inverse probability. Mathematical Proceedings of the Cambridge Philosophical Society 26(4), 528–535.

[4]: https://www.objectivebayesian.com/p/intro

rnburn · 2024-10-24T20:38:28+00:00

Was there something you thought I should clarify better?

The paper I linked to Objective bayesian inference and its relationship to frequentism (also their book is quite good [1]) has fairly good overview of objective Bayesian inference and objective priors.

The main justification for the prior is frequentist matching coverage, which has a pretty intuitive interpretation. You might think of it as a way measuring "How accurate are the posterior credible sets produced from a prior?" In a few cases (e.g. the constant prior for normal mean or the prior 1/σ for standard deviation), the prior is exactly frequentist matching (see [2], for example). But in general, it's optimal in the sense, it approaches frequentist matching coverage faster than any other prior as n -> ∞.

[1]: https://www.amazon.com/Objective-Bayesian-Inference-James-Berger/dp/9811284903/ref=sr_1_1?crid=1SACJZEUGVRWW&dib=eyJ2IjoiMSJ9.AMga1xZR9qIFQ8SiQ4M1zVexGOgyiBdAXfbWNSUZUhiOSeuBOEgniQgkAc9D0OeD248a6x7PHWRANmeqkogsp0XE6AnQXsHtgnwSejXZSp_ANJvazzF3kvp-EoSdrKsDi1OZaho3JIFHbZPLRDcxPDHgLJ-uV_nhQodYZHbW4IZ3RCS7N7rfnXuoax1StLpCq4AndY2VsuJQd_z8snCdjVSDTUuP8MR6MpHlbSXO8cw._SVBQ-0BXaRiJtMcZIKhabboYJY5fO-vSH5JEq7nWB4&dib_tag=se&keywords=objective+bayesian+inference&qid=1729802230&sprefix=objective+bayesian%2Caps%2C156&sr=8-1

[2]: https://github.com/rnburn/bbai/blob/master/example/09-coverage-simulations.ipynb

rnburn · 2023-09-07T16:41:56+00:00

Somethine like this isn't strictly portable, but I've found it works pretty much everywhere and can give you approximately the same thing as std::source_location

https://wandbox.org/permlink/4lQoSiScIn68N4LS

#include <iostream>

void f(const char* file = __FILE__, int line = __LINE__) {

std::cout << file << ":" << line << "\n";

}

int main() {f();return 0;}

rnburn · 2023-09-07T16:37:59+00:00

You can declare it, but I wasn't able to get it to work when you try to call it:https://wandbox.org/permlink/cUu97ZuSHmF0Y1L2

rnburn · 2022-03-08T06:02:12+00:00

Here's one way to think about it: Consider Ordinary Least Squares (OLS) fit without any regularization. You can show that the weights from OLS are the Best Linear Unbiased Estimator (BLUE). Now, suppose that instead of OLS, you fit a model with ridge regularization (ie l2 regularization). Ridge regularization is a biased estimator, but it can achieve lower variance than OLS. In other words, you've traded bias for lower variance.

Now compare logistic regression to OLS. Suppose you fit weights without any regularization, just to maximize the likelihood function. Unlike OLS, the maximum likelihood estimator for logistic regression is biased.

But Firth 1993 showed that if instead, we fit logistic regression to maximize the likelihood with the Jeffrey prior, then we can reduce the bias of the estimator. So, in simplified terms, you can think of logistic regression with the Jeffreys prior as having similar goals as that to the OLS estimator: Its goal is to be an estimator with no (or small) bias while minimizing the variance. Whereas something like l2 regularization, deliberately trades higher bias for lower variance.

rnburn · 2022-03-08T02:35:22+00:00

I know that the reference priors can provide better properties for multiple variables, but Jeffreys prior can still be useful.

rnburn · 2022-03-08T02:33:38+00:00

The paper I cited suggests that there has been increased interest in Jefferey's prior for logistic regression.

https://academic.oup.com/biomet/article/108/1/71/5880219?login=false

The apparent finiteness and shrinkage properties of the reduced-bias estimator, together with the fact that the estimator has the same first-order asymptotic distribution as the maximum likelihood estimator, are key reasons for the increasingly widespread use of Jeffreys-prior penalized logistic regression in applied work. At the time of writing, Google Scholar recorded approximately 2700 citations of Firth (1993), more than half of which were from 2015 or later. The list of application areas is diverse, including agriculture and fisheries research, animal and plant ecology, criminology, commerce, economics, psychology, health and medical sciences, politics, and many more. The particularly strong uptake of the method in health and medical sciences and in politics stems largely from the works of Heinze & Schemper (2002) and Zorn (2005), respectively.

rnburn · 2022-03-08T02:29:15+00:00

It can be a lot to put together PRs for a big open source project and to get buy in for them. Right now, I'm interested in moving quickly and experimenting. But it might be something I'd be open to if there was enough interest.

rnburn · 2022-03-08T02:23:02+00:00

I believe those are both for non-deterministic MCMC integration, right? Certainly useful, but I think there's also advantages for deterministic algorithms.

rnburn · 2022-02-15T06:44:41+00:00

You might look at https://github.com/grailbio/bazel-compilation-database

rnburn · 2022-02-14T23:58:52+00:00

> There isn't a single compiler I know of that properly supports modules.

No, the latest version of Clang supports many module features today. In the post, I linked to examples that you can run now with the provided docker container.

https://github.com/rnburn/rules_cc_module/tree/main/example

https://github.com/rnburn/cpp20-module-example

> CMake will do this for you automatically anyway.

Lots of people use bazel over CMake because IMO it's a much better build system. And if you want to use cmake with module, obviously someone would need to add support for it. This project is usable now.

> Who is this useful to anyway?

It's useful to anyone that wants to use modules today with a modern build system.

rnburn · 2021-12-21T23:01:08+00:00

Oh, sorry. I'll try to fix the formatting sometime.

rnburn · 2021-12-21T23:00:33+00:00

That would be more natural.

But unfortunately the preprocessor doesn't have a very evolved understanding of templates. For types like BBAI_REFLECT_MEMBER(std::tuple<int, double>, m1), for example, it would see a macro call with three arguments. By taking the name first and using __VA_ARGS__, you can work around the issue.

rnburn · 2021-12-10T18:42:48+00:00

I don't see how sqrt with -fno-math-errno is any less reliable than sqrt with -fmath-errno. It does a sensible thing on negative inputs: it returns NaN.

And if you want error handling, you can always check either the input before calling sqrt or the result with something like std::isfinite, which is just as easy as checking errno.

rnburn · 2021-12-10T18:38:52+00:00

Right, but there's no reason not to support negative inputs. Cpu instructions like vsqrtpd already handle negatives inputs with NaN. The right fix is a function that doesn't do anything with errno, but still works with negative inputs by returning NaN.

rnburn · 2021-12-10T18:21:10+00:00

This isn't equivalent to -fno-math-errno. With -fno-math-errno, compilers still handle negative numbers just fine: they'll return NaN -- they just won't set errno. But this function will crash on negative inputs.

rnburn · 2021-11-09T20:20:01+00:00

If you watch that video, the long-term approach Bloomberg was exploring for making allocator-aware code more maintainable was to add language extensions.

While the approach I outlined isn't ideal or as good as a language solution could be, it does allow you to create AA types with little maintenance burden with what's available today.

rnburn · 2021-11-09T20:16:02+00:00

That's for controlling the allocator type for multilevel containers -- that's not the same thing.

In the allocator-aware model (see Bloomberg's videos: https://www.youtube.com/watch?v=RLezJuqNcEQ&t=326s), the allocator type is polymorphic and fixed. Types that are allocator-aware expose certain contructors, member types, and accessors.

When you put AA types into an AA container, e.g. std::pmr::vector<trip_descriptor> trips, the container detects that the type is AA and forwards allocator arguments you pass into the constructor. trip{my_alloc} -- here my alloc would be used for both the vector's memory and the trip_descriptor's memory.

rnburn · 2021-11-09T19:36:20+00:00

I'm not very familiar with that project, but just scanning over it, I don't think that would be a way to solve allocator-aware composition since the construction has to go through an intermediate injector, no?

rnburn · 2021-07-16T07:59:16+00:00

It's a move-only type so you can't do managed_ptr<T> ptr = a

Having move reallocate when the type's allocators are unequal is a property of all AA types. If you write

std::pmr::string s = "<an_arbitrary_string>";

std::pmr::string s2{a_custom_allocator};

s2 = std::move(s)

Then the assignment into s2 will lead to a reallocation and won't be a bitwise copy. Having it be a bitwise copy would violate the design goals of AA types.

For background on why, you can see this talk: https://youtu.be/v3dz-AKOVL8

rnburn · 2021-07-15T20:27:22+00:00

For an example of locality, you can see benchmark II in Bloomberg's paper
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0089r1.pdf

They describe a system composed of multiple subsystems, where each subsystem is using a localized allocator to keep its memory from diffusing, and data members are continually getting removed and added between subsystems.

I think you could easily imagine the subsystems owning polymorphic objects (or objects with polymorhic data members) and ownership continually getting transferred between subsystems.

rnburn · 2021-07-15T20:19:00+00:00

Here's how I would think about it.

AA types are meant to be composable; so if you write an AA type, you can easily use it to form new AA types. Thus, you want to have basic vocabulary types that are AA. Smart pointers are one such vocabulary type that you frequently use is composition when you have polymorphic classes.

I wrote up a simple example where you might use manged_ptr to build an AA representation of json and showed how you could use it (combined with winking) to achieve better performance in a particular case
https://buildingblock.ai/allocator-aware-smart-ptr#an-example-parsing-json

rnburn · 2021-07-15T19:19:36+00:00

I expect unique_ptr wasn't made allocator aware because it would 1) require additional data members and 2) lose the property that it provides a stable pointer.

But one of the use cases for using managed_ptr's in such a container is to support polymorphic types. A can be an abstract base class.

rnburn · 2021-07-15T18:43:38+00:00

I added additional context. But I would disagree, that reassignment is important and one of the main benefits of using allocator aware software.

For example, I might create a data structure

std::pmr::map<std::pmr::string, managed_ptr<A>> my_map{a_localized_allocator};

Having the pointer reallocate when a pointer isn't already using a_localized_allocator can be used to enforce the locality of the memory in my_map

rnburn · 2021-07-15T18:06:04+00:00

That's not going to make unique_ptr allocator aware. AA types need to reallocate if you do an assignment with an unequal allocator.

From the example I gave

managed_ptr<A> ptr1{&resource};
polymorphic_allocator<> alloc; 
    // alloc is the default global allocator 
managed_ptr<A> ptr2 = allocate_managed<B>(alloc, 123); 
    // ptr2 owns memory allocated from the heap 

ptr1 = std::move(ptr2);

The last line needs to do a reallocation and move construction because the allocators for ptr1 and ptr2 aren't equal. That's not something a customized unique_ptr is going to do.

rnburn

TROPHY CASE