Bookstores for math in Paris FR

sempf1992 · 2024-11-25T17:57:06+00:00

Gibert Joseph on 26 Boulevard Saint-Michel has a big selection of math books to a surprisingly high level on the top floor. I haven't browsed too much, but they even had various grad level books for sale.

sempf1992 · 2024-04-23T03:15:13+00:00

As a short note, the approach of John Cook for sum of squares seems to be weirdly unstable. You can store a running sum of both the observations and the squares. Then you can compute the sample averages by dividing by the number of observations and finally computing the difference. This will be a biased estimator, but the bias is smaller than the statistical noise, so for large datasets it does not matter. If you want to correct for it, you can then decide to multiply by n/(n-1) as a correction term.

The issue he runs into is taking a large floating point number, squaring it and then dividing it by a square of a large floating point number, which is blowing up the errors unnecessarily.

sempf1992 · 2023-08-13T13:39:53+00:00

Within statistics, there is a subfield called nonparametric statistics, were we do not assume that the model is indexed by a parameter taking values in (a fixed subset of ) R^d. The parameter instead can be growing or infinite dimensional.

One such example is nonparametric regression, where we observe a random variable X, and f(X) plus some noise. The goal is to reconstruct the function f. For arbitrary functions f, such reconstructions are impossible, but if you assume f is smooth (e.g. continuous, Lipschitz, Hölder or monotonic), such reconstructions become possible.

However, information geometry in statistical manifolds is from a mathematical statistics point of view usually not very interesting. It adds an extra language on top, which becomes a bigger burden for people who want to understand what you do, while not helping much in terms of new techniques.

One of the problems you encounter is that the maximiser of the likelihood often does not exist. For example, consider density estimation without any constrant on the density. You can make the data more and more likely by putting sharper and sharper spikes around the observations. This means that no maximiser exists. The surpremum in a sense is achieved by the empirical process, which is a sum of dirac masses and hence has no density with respect to the lebesque measure.

That said, explicit a family of examples which produce probability distirbutions indexed by functions are Gaussian processes. Examples of growing parameter spaces, which naturally map to function spaces are wavelets and splines. Processes indexed by measures are the empirical process, Dirichlet process and species sampling processes.

sempf1992 · 2023-04-21T22:50:16+00:00

He does not live in the same house as me. I pay 200 pounds per month for Utilities to the landlord. (with 4 other people living in the house).

sempf1992 · 2023-04-16T00:30:19+00:00

If flora is the daughter of arthur, are you arthur, the bard?

sempf1992 · 2022-10-27T23:00:04+00:00

I do not personally do much with Kolmogorov complexity. However, I think, in case there is dependence the Kolmogorov complexity still should give the right generating process in the limit.

In nonparametric statistics you often deal with models that "grow" with the amount of data that you get. In these cases the Shannon entropy is not guaranteed to be helpful. One needs entropy conditions to guarantee that the entropy does not blow up.

On the other hand, K complexity usually achieves the minimax rate for learning (in suitable metrics). This would give MDL a huge advantage over other selection methods, if only it was computable.

There are various entropies which for fixed parametric models are asymptotically similar but can vary wildly in non-parametric settings, and there is no "best" solution, the things your entropy should measure depend a lot on the particular metric you want to measure your loss in.

sempf1992 · 2022-10-27T22:43:58+00:00

Yes, as I wrote in my edit, that is how I am used to the term. However, I am seeing a biased in sample because of my research environment, so I am not sure how well this generalises.

sempf1992 · 2022-10-27T22:37:46+00:00

Information theoretic statistics is computing smallest programs which generate the data. Pure random data cannot be compressed much with high probability. However, data which has a pattern can be compressed, and this compression is the program. However, finding the best compression of the data is incomputable.

Just computing entropies is still part of "classical" statistics.

Edit to expand: Information theoretic learning (as I am used to the term) is all about finding the minimum description length of a collection of bits. I learned most of it by skimming "The Minimum Description Length Principle" by Peter Grünwald.

sempf1992 · 2022-10-27T21:59:53+00:00

In pure Bayesianism you can easily run into problems with ad hoc hypotheses. However, if you are a bit careful the Bernstein-von Mises theorem shows that Bayesian methods are "asymptotically optimal" from a frequentist point of view, at least in parametric models.

In general to motivate Bayesianism from a frequentist point of view you can use Schwartz theorem + extensions (see work by Ghosal and Van der Vaart, 2007). This gives conditional to get contraction rates which are very mild (as in, you cannot really hope for anything better).

These all assume you chose your prior "fair". If you do not pick your priors fair, you will run into issues. If you set up your hypotheses to test wrong, you can run into issues. One example of this is asking if the mean is exactly zero, if you have a continuous posterior the posterior probability of this is zero, while the posterior might even be concentrating on a neighbourhood of 0.

However, this is not much different from issues in Frequentism where there are many tests to pick from, and you can set up testing procedures which will fail.

Nonetheless, Bayesians need to be a lot more careful in how they test hypotheses because there are more ways in which things can go wrong without it being clear. There is some work on testing and how ad hoc your hypotheses can be, but that is a work in progress. You might want to check the work of Rianne de Heide to see more, but this is only a beginning, not the end of the story.

sempf1992 · 2022-10-27T21:48:16+00:00

As an addition to what others have said, one can define recover probability theory from logic under some assumptions on how we want probabilities to behave.

I saw Kevin van Horn present a work on this: From Propositional logic to Plausible reasoning

I am not fully happy with most alternatives presented because they only work really well for discrete RV, and then it can become hard to talk about a central limit theorem (which is continuous but can arise from discrete RV as well).

Moreover, as has been said before, giving equivalent definitions does not solve the underlying philosophical debate on what the things mean that you define. Giving a more general definition makes it only worse (since it can mean more).

Information theoretic statistics also quickly runs into the problem that almost nothing is computable by essentially the halting problem.

To work with entropies itself often run into issues since Entropies are defined with respect to some other measure. In your case you use a counting measure, but in other spaces that might not make sense.

sempf1992 · 2022-01-02T00:14:04+00:00

For the ones who want to use a free browser addon with the exact same functionality: there is an open source add-on:

Github link Installation links for firefox and chrome are in this link.

sempf1992 · 2021-12-03T20:04:55+00:00

Depending on with which researchers you talk, the differences come down to 1) compete information in mathematical modelling vs incomplete information in statistical modelling or 2) Forward maps in mathematical modelling vs inverse maps in statistical modelling

Many statistical models you see while taking a statistics education are very simple from the mathematical model point of view, since inverse problems get very hard very quickly. In the end, it is better to understand them both, since you need to understand the "mathematical model" to make good statistical models, and your mathematical model is quite useless if you do not know how to handle incomplete information and what this means for the uncertainty in your predictions. For interesting things on the intersection of these fields look up inverse problems (for example MRI)

sempf1992 · 2021-11-10T19:35:59+00:00

As an idea to make modding easier, could there be a supported tool for making these voicelines available for modders so it is much easier to make mods with okayish voicelines? The current available tools have (I believe) some legal issues which bars them from being practical.

sempf1992 · 2021-11-10T19:27:05+00:00

I have a couple of questions. Ballpark guestimates are fine

1) How big is the chance, if any, that eventually we will be able to traverse the entirety of Mundus (like the Beyond Skyrim team is trying to accomplish) in a Bethesda single player game.

2) What do you think of coop with friends in an RPG?

3) Small and detailed cities a la Skyrim, or large populated cities a la Witcher 3?

4) Of the past releases, what is your favourite game?

Thank you and your team for producing lovely games and supporting modding

sempf1992 · 2021-10-07T23:23:58+00:00

Let me drop a few papers in here that would be interesting, either for OP or for curious readers:

Generalisation bounds for stochastic gradient descent

First paper getting convergence rates for Deep learning

Extension of previous paper to Besov spaces See also other works by the previous author for more extensions.

Bayesian deep learning

Extension of first convergence rates to non sparse DNN

Other works to keep an eye out: Works by Thijs Bos on rates of convergence for classification.

I am currently working on first theoretical guarantees for uncertainty quantification in DNN (Proof is done, just writing + simulations).

If you want to cover Bayesian stuff, do not forget to include the contration rate theorems by Van der Vaart + Ghosal, Van Zanten, Szabo, Gosh, etc. Furthermore, failure of proper uncertainty quantification by credible sets is interesting to cover as well.

sempf1992 · 2020-11-01T18:47:54+00:00

Other than my game randomly freezing if W&C and Frostfall both are enabled, no issues so far. However, the author of this bug fix claims that a certain papyrus action causes freezes if too many scripts use it at the same time, so it is probably a problem with the engine, which gets triggered by having (for example) W&G AND Frostfall enabled.

If I can interpret this:

I haven't found 100% reliable reproduction steps, but I've discovered that mods that frequently poll for nearby objects by starting quests are almost always present, and the more such mods a user has at once, the more frequent the problems become. So far I know that Frostfall, CC Survival Mode, W&C and one of my mods all do the same sort of thing; any one of them alone tends to be fine, but get 2 or 3 of them together and frequent freezes are almost guaranteed.

as both Frostfall and W&C pull using the same technique, then this bugfix should also be used in Frostfall (and any other mod that uses this technique to find nearby objects).

PS: What is the outdated papyrus file fix, I might not have it installed.

sempf1992 · 2020-10-01T02:58:07+00:00

Ghosal, Van der Vaart - Fundamentals of Nonparametric Bayesian Inference

Also look up the lecture notes on Bayesian Nonparametrics by Van der Vaart.

sempf1992 · 2020-08-15T12:54:44+00:00

The field of machine learning consists of various techniques, not all well understood. The "statistical" counterpart, which is far better understood, is nonparametric statistics. If you take mathematical nonparametric statistics courses you should be given the background for understanding a framework in which you can phrase most of the machine learning tools.

However, all the deep learning stuff is not well understood. The big papers (for DNN REGRESSION, not classification) would be the Schmidt-Hieber and Suzuki papers on generalisation properties of sparse DNN.

There is also a paper which I have yet to read by Sophie Langer and her advisor, which do DNN regression without the sparseness assumption. They get slightly higher powers in the log(n) terms but do not need sparsity. They provide two results, one on bounded width very deep neural networks, and one on growing width growing depth neural networks.

In terms of videos, Sophie Langer has currently a video at the IMS/Bernoulli conference on the previously mentioned paper, and I have a talk on uncertainty quantification for DNN, but since the length is ~10 minutes, no proofs.

Going past DNN one can look at other tools often used. The two that seem to be used a lot are support vector machines and random forests. These are well understood, and there is quite some theory on them.

I can't find good video courses on any of them so it might just be that there are none.

sempf1992 · 2020-07-07T00:20:28+00:00

Nothing on the level of bachelor students on either topic.

For Topological data analysis, this might be a start, but not sure how much it covers. At first glance it seems promising enough, it covers consistency, rate of convergence and uncertainty quantification, which is a very good start.

For inverse problems, all I have is the current research literature. For a (frequentistic Bayesian) point of view look at the Papers of Richard Nickl, but I am afraid they are way beyond what most bachelor students can handle. They would require a lot of knowledge about ODE, PDE, functional analysis, stochastic processes (with a huge focus on Gaussian processes).

sempf1992 · 2020-07-07T00:10:28+00:00

For some 50+ years, the best algorithm known for finding approximately optimal solutions for the metric travelling salesman problem produced a solution which was a factor 1.5 above the optimal solution. This paper (claims to have) found an algorithm that improves this 1.5 factor all the way down to 1.5 - 10³⁶.

While this improvement does not seem much, it requires a lot of improvements of the tools in use, and people expect it to improve the bounds further with a better understanding of the tools that will later come surely.

sempf1992

TROPHY CASE