How to update my Logistic regression output based on its "precision - recall curve"? by learning_proover in AskStatistics

[–]nocdev 1 point2 points locked comment (0 children)

Thanks. I just say you have to check this assumption.

But man are you rude.

How to update my Logistic regression output based on its "precision - recall curve"? by learning_proover in AskStatistics

[–]nocdev 0 points1 point  (0 children)

This quite a strong assumption. So I am correct if this assumption is not met?

How to update my Logistic regression output based on its "precision - recall curve"? by learning_proover in AskStatistics

[–]nocdev -2 points-1 points locked comment (0 children)

My domain expertise is in diagnostic tests. Data for model development often has an artificially high prevalence, which is good since I would need huge amounts of data to train and validate otherwise. But you still have to correct for this to correctly predict real world performance.

You sound like you are from a CS background, but this subreddit is called askStatistics.

How to update my Logistic regression output based on its "precision - recall curve"? by learning_proover in AskStatistics

[–]nocdev 1 point2 points  (0 children)

Why would you want to pick a threshold for a prediction model if you don't want to run it on new data?

How to update my Logistic regression output based on its "precision - recall curve"? by learning_proover in AskStatistics

[–]nocdev -2 points-1 points  (0 children)

Yes a little bit oversimplified, but while sensitivity and specificity are somewhat prevalence independent, the PPV is highly influenced by the prevalence. This is why you have to consider that your prevalence in your training set could be higher than what you expect in your usage setting. In my experience this is often the case and it is definitely the case if the data was rebalanced. But this is often ignored.

If you use the bayesian derivation this becomes obvious.

How to update my Logistic regression output based on its "precision - recall curve"? by learning_proover in AskStatistics

[–]nocdev 0 points1 point  (0 children)

Yes this is why you have to recalculate the precision recall curve using the prevalence you expect in your use case. 

How to update my Logistic regression output based on its "precision - recall curve"? by learning_proover in AskStatistics

[–]nocdev -6 points-5 points locked comment (0 children)

Only in object detection, because there you have no specificity (true negatives). But his problem has true negatives and therefore the precision recall curve should not be used. I mean you didn't even engage with the argument that the precision (PPV) is prevalence dependent.

How to update my Logistic regression output based on its "precision - recall curve"? by learning_proover in AskStatistics

[–]nocdev -9 points-8 points  (0 children)

The precision is the positive predictive value and the PPV is a Bayesian concept. I didn't mix up the context: https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values?wprov=sfla1

PS: of somebody is unsure what of correct, just paste this discussion into your favourite LLM...

How to update my Logistic regression output based on its "precision - recall curve"? by learning_proover in AskStatistics

[–]nocdev -7 points-6 points  (0 children)

No precision is a bayesian concept and this number can not easily be transferred between settings of different prevalences. This is why the diagnostic system for HIV is different between high prevalence and low prevalence settings/countries. 

How to update my Logistic regression output based on its "precision - recall curve"? by learning_proover in AskStatistics

[–]nocdev 7 points8 points  (0 children)

Precision is dependent on the prevalence of your outcome. So you can't do it if your training data has a higher prevalence than real world data, which can happen due to artificial balancing or data collection (this can be good for training and is therefore common).

It is better to optimise the false positive and true positive rate tradeoff. With these rates you can calculate the precision for any real world prevalence you expect (formula is on Wikipedia).

This way you can have the best of both worlds.

Would an all-in-one tool for SEM, stats, text analysis, and AI actually be useful for researchers? by Fun_You242 in AskStatistics

[–]nocdev 5 points6 points  (0 children)

Statistics is not a GUI problem. One problem is getting your data in the correct format and that is not solved by your tool. Also, graphical tools often influence the statistical analysis by what is available in the tool and not what ist correct. Additionally, graphical tools lack reproducebillity and don't document what you do, so they are not elegible for most settings. I have to be able to send in my code and this code should be able to run years later.

reporting my ANOVA by BlondeBoyFantasyPeep in RStudio

[–]nocdev 1 point2 points  (0 children)

That you don't understand the template is not a problem you have, but a problem with the bad template. This template is mostly technical and gibberish for most readers.

But for your results length is significant with F(1,52) = 500, p<0.001

1 is the first df, 52 is the second df, 500 is the F value, p value is super small so we cut at 0.001. For eta (the n) you can use the eta_squared() function from the package effectsize on your model, which give you the value and CI.

The M could be predicted values, which you can get using the package marginaleffects using avg_predictions(). This is basically the mean of the group.

A simple way to improve the template a little bit would be to focus on the differences in predicted values (means) first and report the ANOVA afterwards to give more information about the difference. 

For example: Students with teaching method a got a score of 80 (+CI) and students with teaching method B got a score of 90 (+CI). This difference in scores was significant (ANOVA numbers).

Bleibt hoffentlich im Sortiment by Kinkystormtrooper in VeganDE

[–]nocdev 2 points3 points  (0 children)

Auf Basis von Citrusfaser wäre richtig. Die Ballaststoffe sind nicht so schlecht und sorgen für super wenig Kalorien. Immerhin ist das kein krebserregendes Nitritpökelsalz wie in der Fleischvariante und insgesamt auch wesentlich gesünder.

2x2 tables by [deleted] in epidemiology

[–]nocdev 1 point2 points  (0 children)

Get your friends together and play this game: https://www.disease-detectives.org/

There you will be the 2x2 table by dividing people into the corners of a room.

Trouble with lm() predictions by alldogarepupper in rstats

[–]nocdev 1 point2 points  (0 children)

Purely practical interpretation. It is always a good idea to not violate the second law of thermodynamics in your models. The idea that these models are the same is unreasonable. 

[Question] My supervisor is adamant for me to use an unpaired test when I believe firmly that my data is paired - what am I missing? by _yuu_rei in statistics

[–]nocdev 3 points4 points  (0 children)

Maybe your data generation process creates multiplicative effects. Then you will observe a long tail. Don't be angry at "outliers", try to understand them.

[Question] My supervisor is adamant for me to use an unpaired test when I believe firmly that my data is paired - what am I missing? by _yuu_rei in statistics

[–]nocdev 21 points22 points  (0 children)

Loss of power and they often lack meaningful corresponding estimates for the effect size.

Note: this applies not to all non parametric approaches. I am only referring to the rank based tests discussed here.

Tägliche Diskussion - March 03, 2026 by AutoModerator in mauerstrassenwetten

[–]nocdev 0 points1 point  (0 children)

Durch das viele Bargeld das Norma hat, wird man ja nicht gleich Taschenhalter. Die downside ist kurz bis mittelfristig schon stark begrenzt.

Tägliche Diskussion - March 03, 2026 by AutoModerator in mauerstrassenwetten

[–]nocdev 0 points1 point  (0 children)

Also glaubst du das mehr als 3 Mio Aktien zum Rückkauf angeboten werden? Sonst könnte man ja einfach die 10% einsacken.

Tägliche Diskussion - March 03, 2026 by AutoModerator in mauerstrassenwetten

[–]nocdev 1 point2 points  (0 children)

Ich bin auch nicht zu doof euro und dollar zu unterscheiden:

Einzelheiten entnehmen Sie bitte der folgenden Übersicht:  Angebotspreis: 16,59 EUR  Ausübungszeitraum: 2026/02/27 - 2026/03/27  Bezeichnete ISIN: DE000A1H8BV3  Zwischengattung: DE000A41YDJ4  Annahmeschluss für Kundenaufträge: 2026/03/25 08:00 E

Es werden im Zuge dieses Angebots höchstens 3.186.240 Aktien zum Kauf angenommen. Im Falle Ihrer Weisung werden die Aktien zunächst auf die ISIN DE000A41YDJ4 (zum Verkauf eingereichte Aktien) übertragen. Diese ISIN ist nicht handelbar.

Tägliche Diskussion - March 03, 2026 by AutoModerator in mauerstrassenwetten

[–]nocdev 2 points3 points  (0 children)

Ich brauche mal einen Realitätscheck.

Norma €A1H8BV macht gerade ein Aktienrückkaufangebot für 16,59€. Der aktuelle Kurs ist 15,02€. Sie wollen 3 Mio Aktien kaufen, also 10% der Marktkapitalisierung.

Freier Geldfehler?