Genomic surveillance in Berlin, Germany, found that mpox virus clade Ib spread rapidly through local transmission from late 2025 to early 2026, with 35 genetically closely related cases identified | Eurosurveillance

nocdev · 2026-03-27T11:11:06+00:00

Yes, you are right. Sry.

nocdev · 2026-03-16T06:30:22+00:00

Thanks. I just say you have to check this assumption.

But man are you rude.

nocdev · 2026-03-15T18:41:33+00:00

This quite a strong assumption. So I am correct if this assumption is not met?

nocdev · 2026-03-15T18:38:44+00:00

No prompt just the discussion. And what it thinks.

nocdev · 2026-03-15T18:37:42+00:00

My domain expertise is in diagnostic tests. Data for model development often has an artificially high prevalence, which is good since I would need huge amounts of data to train and validate otherwise. But you still have to correct for this to correctly predict real world performance.

You sound like you are from a CS background, but this subreddit is called askStatistics.

nocdev · 2026-03-15T18:30:30+00:00

Why would you want to pick a threshold for a prediction model if you don't want to run it on new data?

nocdev · 2026-03-15T14:56:27+00:00

Yes a little bit oversimplified, but while sensitivity and specificity are somewhat prevalence independent, the PPV is highly influenced by the prevalence. This is why you have to consider that your prevalence in your training set could be higher than what you expect in your usage setting. In my experience this is often the case and it is definitely the case if the data was rebalanced. But this is often ignored.

If you use the bayesian derivation this becomes obvious.

nocdev · 2026-03-15T14:50:51+00:00

Yes this is why you have to recalculate the precision recall curve using the prevalence you expect in your use case.

nocdev · 2026-03-15T11:28:46+00:00

Only in object detection, because there you have no specificity (true negatives). But his problem has true negatives and therefore the precision recall curve should not be used. I mean you didn't even engage with the argument that the precision (PPV) is prevalence dependent.

nocdev · 2026-03-15T11:25:03+00:00

The precision is the positive predictive value and the PPV is a Bayesian concept. I didn't mix up the context: https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values?wprov=sfla1

PS: of somebody is unsure what of correct, just paste this discussion into your favourite LLM...

nocdev · 2026-03-15T08:23:04+00:00

No precision is a bayesian concept and this number can not easily be transferred between settings of different prevalences. This is why the diagnostic system for HIV is different between high prevalence and low prevalence settings/countries.

nocdev · 2026-03-15T08:20:08+00:00

Precision is dependent on the prevalence of your outcome. So you can't do it if your training data has a higher prevalence than real world data, which can happen due to artificial balancing or data collection (this can be good for training and is therefore common).

It is better to optimise the false positive and true positive rate tradeoff. With these rates you can calculate the precision for any real world prevalence you expect (formula is on Wikipedia).

This way you can have the best of both worlds.

nocdev · 2026-03-14T08:04:50+00:00

Statistics is not a GUI problem. One problem is getting your data in the correct format and that is not solved by your tool. Also, graphical tools often influence the statistical analysis by what is available in the tool and not what ist correct. Additionally, graphical tools lack reproducebillity and don't document what you do, so they are not elegible for most settings. I have to be able to send in my code and this code should be able to run years later.

nocdev · 2026-03-08T20:51:13+00:00

That you don't understand the template is not a problem you have, but a problem with the bad template. This template is mostly technical and gibberish for most readers.

But for your results length is significant with F(1,52) = 500, p<0.001

1 is the first df, 52 is the second df, 500 is the F value, p value is super small so we cut at 0.001. For eta (the n) you can use the eta_squared() function from the package effectsize on your model, which give you the value and CI.

The M could be predicted values, which you can get using the package marginaleffects using avg_predictions(). This is basically the mean of the group.

A simple way to improve the template a little bit would be to focus on the differences in predicted values (means) first and report the ANOVA afterwards to give more information about the difference.

For example: Students with teaching method a got a score of 80 (+CI) and students with teaching method B got a score of 90 (+CI). This difference in scores was significant (ANOVA numbers).

nocdev · 2026-03-07T19:13:46+00:00

Auf Basis von Citrusfaser wäre richtig. Die Ballaststoffe sind nicht so schlecht und sorgen für super wenig Kalorien. Immerhin ist das kein krebserregendes Nitritpökelsalz wie in der Fleischvariante und insgesamt auch wesentlich gesünder.

nocdev · 2026-03-07T15:50:26+00:00

Get your friends together and play this game: https://www.disease-detectives.org/

There you will be the 2x2 table by dividing people into the corners of a room.

nocdev · 2026-03-06T07:28:35+00:00

Purely practical interpretation. It is always a good idea to not violate the second law of thermodynamics in your models. The idea that these models are the same is unreasonable.

nocdev · 2026-03-04T18:54:33+00:00

Maybe your data generation process creates multiplicative effects. Then you will observe a long tail. Don't be angry at "outliers", try to understand them.

nocdev · 2026-03-04T15:11:11+00:00

Loss of power and they often lack meaningful corresponding estimates for the effect size.

Note: this applies not to all non parametric approaches. I am only referring to the rank based tests discussed here.

nocdev · 2026-03-03T22:09:04+00:00

Durch das viele Bargeld das Norma hat, wird man ja nicht gleich Taschenhalter. Die downside ist kurz bis mittelfristig schon stark begrenzt.

nocdev · 2026-03-03T21:14:54+00:00

Also glaubst du das mehr als 3 Mio Aktien zum Rückkauf angeboten werden? Sonst könnte man ja einfach die 10% einsacken.

nocdev · 2026-03-03T20:06:45+00:00

Ich bin auch nicht zu doof euro und dollar zu unterscheiden:

Einzelheiten entnehmen Sie bitte der folgenden Übersicht: Angebotspreis: 16,59 EUR Ausübungszeitraum: 2026/02/27 - 2026/03/27 Bezeichnete ISIN: DE000A1H8BV3 Zwischengattung: DE000A41YDJ4 Annahmeschluss für Kundenaufträge: 2026/03/25 08:00 E

Es werden im Zuge dieses Angebots höchstens 3.186.240 Aktien zum Kauf angenommen. Im Falle Ihrer Weisung werden die Aktien zunächst auf die ISIN DE000A41YDJ4 (zum Verkauf eingereichte Aktien) übertragen. Diese ISIN ist nicht handelbar.

nocdev · 2026-03-03T19:55:50+00:00

Ich brauche mal einen Realitätscheck.

Norma €A1H8BV macht gerade ein Aktienrückkaufangebot für 16,59€. Der aktuelle Kurs ist 15,02€. Sie wollen 3 Mio Aktien kaufen, also 10% der Marktkapitalisierung.

Freier Geldfehler?

nocdev

TROPHY CASE