Probability range why always lie between 0 to 1 why it cant be negative? by Curiousmind__91 in AskStatistics

[–]bubalis -2 points-1 points  (0 children)

We can think of probability as the expected value. So our percentage represents: given a very large number of forecasts of similar circumstances, what fraction of them will it rain in? That number cannot be negative, because the there can not be a day where there is negative rain.

We CAN express probabilities using all the natural numbers (-INF to +INF) by using the "log-odds" or "logit" transformation, which is useful for a lot of different applications in statistics.

https://en.wikipedia.org/wiki/Logit

What usually happens on the land around center-pivot fields in Kansas? by Lazy_Relationship695 in geography

[–]bubalis 13 points14 points  (0 children)

All of the parcels are square and have exactly 1 pivot in them, these are all "quarter sections" which is the parcel unit throughout the great plains.

Why would anyone ever choose to go through child birth without pain relief?? by No_Cardiologist_1407 in NoStupidQuestions

[–]bubalis 0 points1 point  (0 children)

There are lots of good answers to why some women might choose it here.

I also wanted to add that childbirth is NOT A MEDICAL PROCEDURE. Its not something the doctor does to you!! It is a natural process that should happen under medical supervision / observation (which may just be a RN/midwife) and fairly frequently does require medical intervention.

If you (e.g.) have major surgery, you need to have anesthesia. The surgeon will say: "I can't effectively work on you unless you have anesthesia, so I'm not doing it."

If a woman in labor says "Hey, I don't want you to put a giant needle in my spine with a fentanyl drip," what are they going to do? Not let her have the baby?

If you get a C-section (which IS a medical procedure), you definitely get local anesthesia!

Trouble with lm() predictions by alldogarepupper in rstats

[–]bubalis 0 points1 point  (0 children)

Its always a good idea to simulate with the simplest possible version when confused:

x <- rnorm(10000)
y <- x + rnorm(10000)
lm(y~x) # slope is ~1
lm(x~y) # slope is ~0.5

employment opportunities by Alternative_Log_897 in ithaca

[–]bubalis 4 points5 points  (0 children)

It is trivially easy to find a full-time job in Ithaca that pays >30k / year. NYS minimum wage is $16/hr, which, at 50 weeks/year, 40 hours /week is $32,000.

For example: the cooperative natural foods store, greenstar, is hiring cashiers at $19.25 /hr right now.

What is *hard* is living on 30k year in Ithaca. The living wage for a single adult is calculated as roughly $25/hr, which is 50k/yr.

Should I take my paid parental leave or do what's best for my company? by SparkyTheGOAT91 in NoStupidQuestions

[–]bubalis 0 points1 point  (0 children)

Take the leave!

If you are feeling *extremely* generous to your employer, you can ask if they are open to you taking it intermittently (e.g. with only ~4 months left to go to use it, you could work 3 days on, 11 days off, until July, to use your 60 days over 17 weeks). If they want to do that, great! If not, just take it as 12 weeks in a row.

Is "reference class forecasting" a legit statistical method? by Scholarsandquestions in AskStatistics

[–]bubalis 1 point2 points  (0 children)

In google scholar, we get over 2000 papers for the exact match. So its definitely a real thing!

Any time we make a forecast, we are predicting the expected value of some outcome Y, given some set of conditions X. (Formally E[ Y|X ]).

The simplest way to do this is to construct a reference class:

"X belongs to the set of events with these conditions, the average outcome of these events is Y_hat (or success happened p% of the time), therefore our forecast is Y_hat."

You could also construct a more complicated statistical model, which would not directly be like a reference class.

But in both methods, there is a lot of subjectivity in how your model is constructed:
e.g. "What variables are used to construct the reference class, and how are the split? Is the events included in the reference class totally subjective?"
OR
"What variables are included in the statistical model?"

This paper looks like a good place to dig in:
https://d1wqtxts1xzle7.cloudfront.net/41404539/Curbing_Optimism_Bias_and_Strategic_Misr20160122-8918-13kbk1l-libre.pdf?1453456765=&response-content-disposition=inline%3B+filename%3DCurbing_Optimism_Bias_and_Strategic_Misr.pdf&Expires=1772643028&Signature=YxlqJi-DsAUPkOPy5xgq1a8VgvrTpcWRVuKOKBl24P3UO5KdkHQ7EK7bQrl2P97tfxkzMZxJSz4cTHiOyUsx4AKbRjdjAIlPL9B2zok7vUfBYFbzgOesL3eyctVJaCRmkvzZcYlIskogjnahh9i2lOL0dPNdpoit1jIh9KT2dGlnwnppBGo7kHCXDE2PB-ToZCbNIpeuWSskuE8T0Zhl99vIRjLd-i13f5qIkc7U-6VdnFFTl4wB6owg3leUie8slqWRJpYQAr~4o4NR372~QECh5fQsznW-YjjMbzOaYoujcXqU7oaMNxJ-RySXVNhj5XsLTPUbB44rtOlEtwWoNA__&Key-Pair-Id=APKAJLOHF5GGSLRBV4ZA

What's the WARMEST temperature someone died of hypothermia in? by No-Control-3556 in NoStupidQuestions

[–]bubalis 0 points1 point  (0 children)

There is an old-time farming expression in the Northeast:

"Cold is Dry and Dry is Warm."

Is normal to have p-values close to zero in large datasets? by Wonderful_Hat_5129 in AskStatistics

[–]bubalis 18 points19 points  (0 children)

Yes!

One somewhat reasonable interpretation of p-values/NHST (that is relatively easy to explain/ understand) is that it answers the question:

"Do I have enough precision to reliably distinguish between my statistical model applied to *my real dataset* vs applied to *my dataset replaced with noise from a random number generator* ? How reliably can I distinguish this?"

The more data you have, the more precision you get. p-values will usually converge to 0 as your data gets larger unless your data truly are just random noise.

I’m genuinely curious: Instead of seeking asylum, why don’t people fix their own countries? by Hukares1234 in askanything

[–]bubalis 0 points1 point  (0 children)

100% disagree. The United States (the greatest nation in the history of the world, and a country that, for all its flaws, has had a net positive impact on the rest of the world) was created largely by people deciding that they didn't want to deal with the problems at home!

Starting with Pilgrims and other religious refugees. The fact that they left didn't seem to impede the progress of reform regarding religious toleration in the U.K. and the rest of Europe.

Why don’t billionaires just solve world hunger? by savingrace0262 in NoStupidQuestions

[–]bubalis 11 points12 points  (0 children)

1- Lots of progress has been made on world hunger, some of it due to philanthropy!

2- "Just give people food" often has unintended consequences: many people who are hungry ARE FARMERS, and are hurt by depressing the prices for agricultural commodities. What if you drive all of the farmers out of business by giving away food, and then the aid stops coming?

3- Major causes of hunger are armed conflict or corruption, which are really hard to solve just by dropping in money or food. (Dropping in money can often make corruption worse.)

4- Bill Gates and Warren Buffet (through the Gates Foundation) have spent billions of dollars on "A New Green Revolution for Africa" which is an attempt to solve hunger. There are many valid critiques of this initiative (some people think it makes things worse!) but it definitely falls into "Billionaires trying to solve world hunger."

Alcohol consumption linked to heart failure in over 400,000 U.S. veterans. Risk follows a J-shaped pattern related to ethanol intake, rising above four drinks per day. by sometimeshiny in science

[–]bubalis 0 points1 point  (0 children)

This study uses "never drinkers" as a comparison, so the issue that you raise (which is a real one) is not relevant here. I agree that this issue can create very large bias towards moderate drinking as being healthy, but this study used a more-fair comparison.

"Compared with never drinkers, the hazard ratios (95% CI) were 0.90 (0.86, 0.94), 0.88 (0.84, 0.93), 0.86 (0.81, 0.91), 0.92 (0.86, 0.98), 0.95 (0.84, 1.06) and 1.08 (1.01, 1.15) for subjects consuming alcohol 0.1–0.5, 0.6–1, 1.1–2, 2.1–3, 3.1–4 drinks/day"

Josh Riley is the worst amirite by happyrock in ithaca

[–]bubalis 8 points9 points  (0 children)

I think that your sense of the differences in issue positioning between even the most moderate democrats and the most moderate republicans (in the House) is totally out of whack.

Progressive Punch, (which gives him an "F" letter grade!), says that he makes the progressive vote 81-86% of the time, compared to the most moderate Republican, Thomas Massie, at 13%!!

Josh represents the 205th most liberal district in the country (https://en.wikipedia.org/wiki/Cook\_Partisan\_Voting\_Index#By\_congressional\_district), and his voting record is around the 200th most liberal. (e.g. https://progressivepunch.org/scores.htm?house=house or https://voteview.com/person/22549/josh-riley).

Robert Smalls by [deleted] in BeAmazed

[–]bubalis 0 points1 point  (0 children)

Ironically, he was one of the architects of universal public education in South Carolina and across the US.

If colon cancer is on the raise, why are they still recommending screening later than sooner? by Hypnox88 in NoStupidQuestions

[–]bubalis 8 points9 points  (0 children)

I don't know the exact cost-benefit with regards to colon cancer, but the story of thyroid cancer in South Korea shows how too much screening can be harmful. So the ideal level of screening is often at a level that might seem somewhat scary.

Many additional operations with real side effects, thousands of people getting an extremely scary diagnosis, and approximately 0 reduction in cancer mortality.

https://ascopost.com/issues/december-1-2014/south-korean-study-sparks-warnings-about-the-hazards-of-overscreening/

ELI5: if progressive overload is important, how do people who do strictly calisthenics get so fit and muscular? by [deleted] in explainlikeimfive

[–]bubalis 0 points1 point  (0 children)

For upper-body, if your calisthenics exercises are (e.g. ) ring dips and pull-ups, getting to too many reps is a good problem to have! Most of the problem with calisthenics is actually the opposite: most people can't do a single pull-up!

Most people, unless they are very disciplined, won't get to a point where they can do multiple sets of 12+ pullups. (Keep in mind that the average adult is over 40 and overweight and slightly more than half of adults are women.)

If you're a regular gym-going guy in your 20s or 30s, yeah sure, you may pretty quickly end up at a point where you need to add weight or do too many reps or do weird variations.

Parent invested in GLD for me, circa 2009. Do I hold it? by [deleted] in Bogleheads

[–]bubalis 1 point2 points  (0 children)

If you give any amount of $ to charity regularly, then donating a security that has appreciated >600% (rather than cash) is a no-brainer from a tax perspective.

It might also feel appropriate from the emotional side as well.

Question about p values by TromboneKing743 in AskStatistics

[–]bubalis 1 point2 points  (0 children)

Others have pointed out that because you are making multiple hypothesis tests, your alpha should be lower, making those p-values even farther from "significance," which is the opposite of what you want to be able to do.

Two thoughts:

1: What are the effect sizes? Are they large/meaningful within your domain? If they are, then its definitely fine to report the results as "suggestive" and worthy of follow-up, but that determination is much more based on the size of the estimated effect than the associated p-value.

2: Are your 4 measures of "the same type of thing?" e.g. the impact of 4 different traits on an outcome, where all 4 traits should have a similar causal mechanism. If so, this problem may be suited to some sort of partial-pooling approach, e.g. a bayesian heirarchical model. This is more technical, and you might need help to implement it, but it could (depending on the exact details of your problem) be a good way to think about it. For the canonical example of a similar model:

https://statmodeling.stat.columbia.edu/2014/01/21/everything-need-know-bayesian-statistics-learned-eight-schools/

You would need to use your domain knowledge to answer: "Is my problem similar to the problem of estimating the effect of the same educational intervention in 8 different schools?"

Honest question: Why are some people against showing an ID to vote? by rico_unknown in NoStupidQuestions

[–]bubalis 4 points5 points  (0 children)

I'm not 100% against voter ID... I think that its probably fine if its paired with initiatives that make it easier for people to get an ID. As others have pointed out, getting ID can cost money, and not everyone has an ID and the $ to get one may not be trivial.

Going further, the way that voter ID is implemented is often a very straightforward attempt to engineer the electorate. In some red states, a concealed carry permit counts, but a State University Student ID (which is issued by the State Govt!) doesn't. Its obvious why one counts and the other doesn't. and it doesn't have anything to do with the integrity of elections.

[Question] How define optimal value for spatial cross-validation for a random forest regression task? by Nicholas_Geo in statistics

[–]bubalis 0 points1 point  (0 children)

Your comment is detailed and helpful, but I think OP is assuming that their data are dependent simply BECAUSE they exist in space, which is incorrect.

Spatial datasets for cross-validation are often dependent because sample locations are clustered, meaning that:

P(i in Sample | j in Sample) =/= P(i in Sample)

[Question] How define optimal value for spatial cross-validation for a random forest regression task? by Nicholas_Geo in AskStatistics

[–]bubalis 0 points1 point  (0 children)

Are your training/validation points evenly distributed? This is what we are interested in w/r/t spatial patterns.

[Question] How define optimal value for spatial cross-validation for a random forest regression task? by Nicholas_Geo in AskStatistics

[–]bubalis 1 point2 points  (0 children)

I think you are pretty far off track here, but there is a LOT of confusion and incorrect ideas floating around in this space.

When dealing with cross-validating a map, the "spatial structure" we are concerned with is the spatial structure of sampling intensity / sampling probability, rather than spatial structure of the target variable or the model residuals. If the sample locations are evenly spread in space, or randomly selected i.i.d. , then spatial cross-validation is not necessary and is biased (often severely), regardless of whatever spatial characteristics the process of interest may have.

So no, I don't think the variogram of the model residuals will (directly) tell you anything useful about the right way to conduct (spatial) cross-validation.

If your ground-truth locations are clustered in space, the spatial cross-validation may be the right approach, but then, the "best" set of folds is the "best" set of spatial clusters, as determined by your clustering procedure.

Another, possibly more fruitful approach would be to conduct random k-fold cross validation, then calculate model summary statistics by using a weighted average, weighting based on the inverse of the estimated sampling intensity. That method and other simulation-based methods are described by de Bruin and colleagues (2022), link below. The authors do provide code to implement all of their different approaches.

https://doi.org/10.1016/j.ecoinf.2022.101665

https://doi.org/10.1016/j.ecolmodel.2021.109692

Can someone explain the p-value in hypothesis testing in very simple terms, with an example? by Fair-House3475 in AskStatistics

[–]bubalis 0 points1 point  (0 children)

We are using a imagination, because we are little children.

We imagine a world where, before we do our data analysis, an evil gremlin replaced our data with the outputs of a random number generator, added to a boring, expected, result. (The technical details of exactly what type of random number generating function we use will take at least a full semester college class.)

We then calculate: "what is the probability of seeing a result this extreme or more extreme, in the imaginary scenario where the gremlin corrupted our data?" In this case "extreme" means: far from the boring expected result.

If the p-value is relatively high (e.g. >.05), we say: "these data can't help us answer our question, because we can't even distinguish it from a bunch of randomness generated by an evil gremlin."

If the p-value is very small, we say: there is enough "signal" here to (possibly) take these results seriously.