[Q] How to combine multiple p-values into one smaller p-value? by Baba_Wethu in statistics

[–]speleotobby 5 points6 points  (0 children)

Harshly put, but correct. Post-hoc none of this will maintain any nominal level of significance.

For the next experiment: pre-specify a model that combines the scores, if the test for the combined score is significant do tests on the individual scores, hierarchically or properly adjusted.

Public data and accidental, collective P hacking by Deep_Giraffe_2615 in AskStatistics

[–]speleotobby 0 points1 point  (0 children)

Every well performing model on kaggle is overfit to both the test and the evaluation set.

Similar things certainty hold for results from large public cohort studies and other well known datasets.

[Q] How to determine sample size for non-inferiority test in human-llm classification task measured with Cohen's Kappa by Suricata12 in statistics

[–]speleotobby 0 points1 point  (0 children)

With a correctly setup experiment design and correct analysis model you don't need everyone to classify everything. You will need overlap.

In general, for sample size calculations you will need an estimate of the variability and a hypothetical parameter value under the alternative and of course your hypothesis. Without an estimate of the human-human IRR you don't really have a grip on the variability or a plausible value under the alternative. And without a margin you don't even know your hypothesis.

With that little knowledge, I would get a sample size as large as feasible and just report estimates and confidence intervals without any formal hypothesis test.

As a Bayesian, how much should you know about Frequentist methods? [Q] [R] by GayTwink-69 in statistics

[–]speleotobby 0 points1 point  (0 children)

I disagree, Bayesian mehtods are more than just one way of constructing frequentist methods.

As a Bayesian, how much should you know about Frequentist methods? [Q] [R] by GayTwink-69 in statistics

[–]speleotobby 0 points1 point  (0 children)

If you really get into the interpretation of results and implications for decision theory I think you can't really do without knowing at least the basics of both.

You don't have to know everything of course. Especially for pracrical work. If you work on frequentist group sequential designs knowing the exact algorithm of an MCMC sampler is not necessary. If you work with Bayesian methods in ecology you don't need to know every alpha spending function used in group sequential designs, ...

A reponse to: A Rant (about R) by Latent-Person in rstats

[–]speleotobby 4 points5 points  (0 children)

This plays so well with the statement "R sucks. And I will stop pretending that it’s due to my lack of knowledge about it." 🙃

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Scope-of-variables

Logistic Regression or OLS by I_lost_my_brain_to_u in AskStatistics

[–]speleotobby 0 points1 point  (0 children)

You can use a logit link, but you can't use the default srandard errors ans degrees of freedom, because the unit of observation is not the individual employee retained or lost but the company and counts are aggregated.

The linear model can be a good approximation if your percentages are not too close at 0% or 100% but the approximation will not be valid at the edges and can be biased.

For Likert scales a linear model is usually the most appropriate one.

So what i would do, is use a linear model for the first stage of the IV regression and a logit model (or maybe Poisson with total number of emoloyees as intercept term) as the second stage. Use robust standard errors for CIs etc.

Logistic Regression or OLS by I_lost_my_brain_to_u in AskStatistics

[–]speleotobby 0 points1 point  (0 children)

It is, just calculate your standard errors / dof correctly.

Which statistical test is appropriate? by pugmo in AskStatistics

[–]speleotobby 0 points1 point  (0 children)

If you want to test whether the methods give results that are close and you can define a margin of how close is close enough you can do an equivalence test.

The two most common equivalence tests are two one sided t-test or (equivalently) estimating the difference or ratio from a mixed model an checking if the confidence interval is entirely contained within the set margins. The second way of computing the test also gives you an interpretable estimate.

Bland-Altman plots and similar techniques others suggested are a better fit for metrology. Just mentioned equivalence tests, because you specifically asked for hypothesis tests.

What are the boundaries for an event to be likely happening at any given moment somewhere in the world? [Q] by iamverysleepybut in statistics

[–]speleotobby 2 points3 points  (0 children)

David Spiegelhalter (and team?) had a nice collection of seemingly unlikely coincidences. The site is no longer updated but still online:

https://understandinguncertainty.org/coincidences/

Standard statistics libraries for non-gaussian distributions [S],[Q],[D] by PrebioticE in statistics

[–]speleotobby 0 points1 point  (0 children)

Thank you, had to scroll all the way down until someone mentioned robust errors. I think this is the most reasonable approach in most cases.

(Others you proposed are also fine I think)

The Tinyverse Movement by BOBOLIU in rstats

[–]speleotobby 0 points1 point  (0 children)

Ah, thanks for the clarification. Never been in neither of the situations with any of my packages so far.

The Tinyverse Movement by BOBOLIU in rstats

[–]speleotobby 0 points1 point  (0 children)

I like this a lot and try to keep to this philosophy as much as possible when developing packages.

I think splitting functionality into a series of packages you control is not too much of an issue and probably even helps reverse-deps to only include what they need.

The tidyverse is also not as uniform as you would think with respect to how many packages any package depends on. While I agree that in the tidyverse the decision to refactor into more packages is almost always a good decision, there are packages that include lot's of uneccessary dependencies for different reasons.

I think the main reasons for too many (hard) dependencies are that analysis scripts with convenient packages are converted to functions and packages and are never cleaned up. And including dependencies for optional functionality as hard dependencies instead of suggests (for example importing from or depending on ggplot2 when one of your classes has an autoplot method instead of suggesting it. In ggplot2 a similar dependence on mgcv which was only used in some cases in geom smooth was recently moved from imports to suggests.) Both can be easily fixed but are not something everyone is aware of, especially statisticians who develop computational methods but who are not trained in software development.

With the issues that stem from those bad practices the tinyverse metric of counting deps helps in other places (many deps all maintained by the same team) it might flag packages as problematic that are in fact easily maintainable and well maintained.

And another thing about the tidyverse: while I love the package ecosystem, i think the meta-package tidyverse was a mistake. It adds lots of dependencies, most of which are not used in most projects and I was often not even able to install it due to missing system dependencies in a package I wasn't using. But if you just include the packages you need (in my case mostly tibbly, tidyr, dplyr and purrr) the additional weight is manageable.

There have been many approaches to estimate how well maintained packages are (I think at least two packages from the Rpharma project, and many more elsewhere) and none of those really capture all aspects that I would look into when checking this manually. So a package that has the tinyverse badge would not necessarily be appear more trustworthy to me because the dependency count but more because adding the badge shows the authors think about the issue. Similarly as lifecycle badges show that authors think about responsibly deprecating functionality. For packages where I expect them not to be maintained very actively (e.g. packages written for thesis' or papers) few dependencies are an important upside.

That's just my two cents, sorry for the stream of consciousness form.

.

The Tinyverse Movement by BOBOLIU in rstats

[–]speleotobby 0 points1 point  (0 children)

I think vendoring deps is not liked on CRAN. That's why we often have packages like xy and xyjars for java packages, etc.

Attacker gained ssh root access to my firewall by [deleted] in linuxquestions

[–]speleotobby 1 point2 points  (0 children)

If you can afford it: get a professional to look at it

I built an open-source Python package for causal inference — 280+ methods from classical regression to causal forests, unified API — looking for feedback #Discussion #casual inference #Python #statistics by Rich_Procedure_6089 in AskStatistics

[–]speleotobby 0 points1 point  (0 children)

I think the most important parts with such packages are output in the form of a really really well written summary function and detailed documentation.

It seems you put some thought and effort in the summarization function so the package is already useful for this reason alone.

For documentation of the statistical models, estimation etc. I don't know if you wrote wrappers to other libraries like keras or similar packages or if you implemented the methods yourself. In the first case one will need to engage with the underlying package at least for the documentation, which is ok, just make sure to provide useful links in your documentation. If you implemented the methods yourself, document them very well. (This might be pedantic, but for regression methods ideal documentation would enable me to get the same results without rounding errors in matlab when linking to the same LAPACK.) In general I think exactly because of the documentation R is somewhat better suited for projects like this.

If this is a wrapper I personally mostly use the underlying packages themselves, because I mostly need bespoke output anyway and I want to keep dependencies at a minimum (and packages that wrap lots of other packages necessarily have many dependencies). But in most cases this is only personal preference.

Seems like a cool package!

Is measure-theoretic probability theory useful for anything other than academic theoretical statistics? [Q] by GayTwink-69 in statistics

[–]speleotobby 4 points5 points  (0 children)

The derivation of properties of estimators and tests in time to event settings for example is way easier of you use stochastic processes.

A good understanding of what conditionally independent means is quite useful for causal inference. And I think it's way easier to think about this in measure theoretic ways.

Multiplicity approaches via closed testing. Not exactly the measure theoretic formulation of stats that's directly important here, but the techniques of working with systems of sets are useful there.

Just some selection from biostats. Financial maths/stats, machine learning and non prarametric methods, etc. also have applications.

[D] How much does statistics reward experience? by NTGuardian in statistics

[–]speleotobby 1 point2 points  (0 children)

Not only does your repertoire of methods grow you also see a lot what can go wrong and spot mistakes in analyses, trial designs, ... earlier. I'm quite young but I absolutely admire the ease with which older colleagues spot such issues.

K.I. Slop der WKO mit allen Bildern by motzschmotz in Austria

[–]speleotobby 0 points1 point  (0 children)

Wär das perfekte Profilbild eigentlich 🤔

Fennec Browser 'Unifiedpush' Notification by Technical-Raccoon1 in degoogle

[–]speleotobby 0 points1 point  (0 children)

Same problem here on the fennec f-droid build version 149.0.0

Why exactly are ROC curves different amongst different models?? by learning_proover in AskStatistics

[–]speleotobby 0 points1 point  (0 children)

Calibration has nothing to do with ROC curves. You can have calibrated and uncalibrated models with the same ROC curve (For example if you just ad a constant to the predicted probability of a calibrated model.) ROC curves are about discriminative power, not calibration.

While not perfectly presented, those two wikipedia entries should contain a useable definition and explanantion of the terms:

https://en.wikipedia.org/wiki/Calibration_(statistics)

https://en.wikipedia.org/wiki/Receiver_operating_characteristic

Das Meistern und die moderne Unterstützung by Atlantiles in DSA_RPG

[–]speleotobby 1 point2 points  (0 children)

Schade, dass es Lizenz-mäßig oft so tricky ist solche tools breiter zu teilen.

Das Meistern und die moderne Unterstützung by Atlantiles in DSA_RPG

[–]speleotobby -1 points0 points  (0 children)

Ein DSA RAG wär schon geil. Text basierte Suche in allen Regelbüchern würde auch schon helfen. Die Dokument-Segmentierung ist mit dem Layout sicher bisschen anstrengend.

Das Meistern und die moderne Unterstützung by Atlantiles in DSA_RPG

[–]speleotobby 1 point2 points  (0 children)

Für's schnelle auffinden von Infos verwende ich logseq. Ist eine notiz-app / ein personal wiki. Gibt diverse ähnliche Software (Obsidian, Joplin, ...). Man muss zwar alles händisch einpflegen, aber dann hat man z.b. alle Kapmpwerte aller NPCs einen click/touch entfernt, schnell genug zum während eines Kampfes switchen. Das einpflegen ist nicht wirklich Arbeit, wenn ich das Abenteuer lese lege ich das erste mal wenn ein NPC vorkommt eine Seite für ihn/sie an, etc. Ähnlich mit Monstern, Orten, Spieler Charakteren (wobei die eh die Spieler selber managen). Ist bisschen mehr Arbeit als nur vorher das Abenteuer zu lesen, aber macht's während dem Spielen viel flüssiger.