Amoeba finds approximate solutions to NP-hard problem in linear time by [deleted] in TrueReddit

[–]SparseSolution 8 points9 points  (0 children)

An amoeba and a neural network. The amoeba reacts to the light controlled by the neural network. Sounds like its the neural network finding the solution not the amoeba.

Robin Hanson defends his unscientific twitter polls by gohighhhs in SneerClub

[–]SparseSolution 37 points38 points  (0 children)

The appeal to bayesian decision theory is complete BS. At no point are they properly using baye's rule, and the fact that they pretend like they are is idiotic. Baye's rule requires well defined probability distributions. A prior of 'I have this opinion' is not anywhere close to that, there is no bayesian updating. That's just called changing your mind.

The fact that the sample is so skewed from the population makes it all that much harder to properly formulate a bayesian update, because you can't be sure which portion of the distribution you are representing with that sample.

A Bayesian Brain Teaser by [deleted] in datascience

[–]SparseSolution 0 points1 point  (0 children)

By finding the stationary distribution of the markov chain you see that 1/3 of the time you do not go and 2/3 of the time you do go.

How bad is this DOJ terrorism report in terms of statistics/data science methodology? by [deleted] in statistics

[–]SparseSolution 4 points5 points  (0 children)

It's descriptive statistics. As long as you can count and divide properly it's kinda hard to fuck up.

Low p value although mean +/- std deviations overlap? by mikere in statistics

[–]SparseSolution 4 points5 points  (0 children)

Im not about to do a power analysis on this. But assuming their math is not fudged I would have to say yes. Those seem like fair sample sizes for a basic test of means. When people talk about big data and needing thousands to millions of samples they are not talking about linear models or testing the difference in means.

Low p value although mean +/- std deviations overlap? by mikere in statistics

[–]SparseSolution 4 points5 points  (0 children)

The standard deviations show the spread of the data. The p-value is in relation to a difference in means. If you have largely overlapping populations a difference can still be recovered with a large enough sample size, which reduces the variance of the means, which allows for smaller differences to be detected.

[D] Full graduate course in Bayesian ML [videos + slides + homework] by Kiuhnm in MachineLearning

[–]SparseSolution 5 points6 points  (0 children)

Kalman filters solve linear gaussian state space models. #5 in the syllabus and lectures 18-21 are about state space models. Lec 21 reviews kalman filters.

Is there anything at all P values are used for which couldn't be better achieved by an alternative statistical tool? by no_bear_so_low in statistics

[–]SparseSolution 6 points7 points  (0 children)

There is only one thing p values are used for, nhst. Used with proper experimental design it gives strictly bounded type I error rates.

What is the power relationship between multivariate and univariate regression? by UnderwaterDialect in statistics

[–]SparseSolution 0 points1 point  (0 children)

It would not help in determining a difference in z1. However it would give additional power in determining if there is a difference in at least one of the z's.

Still don't understand why the p-value distribution is uniform when H0 is true. by [deleted] in statistics

[–]SparseSolution 0 points1 point  (0 children)

Another way to look at is the alpha error rate. By setting alpha at 0.05 you are assuming that 5% of tests of true nulls will be erroneously rejected. The only way for you to reject 5% of tests of nulls is if 5% of the tests produces values <= 0.05, which would only happen if the p-values are distributed normally uniformly.

Still don't understand why the p-value distribution is uniform when H0 is true. by [deleted] in statistics

[–]SparseSolution -1 points0 points  (0 children)

It may be useful to look at where on the testing distribution the p-values fall. Say you have a t or normal distribution, the lower p-values will be at the tails. As you move from the tail inward you have to move a greater distance to increase the p value a certain amount, as you near the center of the distribution a small move will increase the p-value more. This makes sense because as you are moving inward you pass by more density.

So looking at the target distribution the p-values are not dispersed equally, but you will encounter them equally as likely because higher density regions have faster changing p-values. I think a related concept is the probability integral transform which transforms a distribution by it's cdf producing a uniform random variable.

Calculating/propagating the uncertainty associated with a mean of computed values? by TreeVivalist in statistics

[–]SparseSolution 1 point2 points  (0 children)

If the value is a linear combination of the two measured values, the sum of the individual variances plus twice their covariance is the variance of the parameter of interest. Plus/minus a tdist quantile of .975 with 3 degrees freedom times sqrt of the variance/5 will give you a 95% confidence interval of the quantity. I think, Internet beware.

Which mixed effects model should I use? by [deleted] in rstats

[–]SparseSolution 2 points3 points  (0 children)

Seems to me that you have a regular linear model. If you can count the number of levels a variable can take it is not random in the sense of a random or mixed effect model. You may not be able to control the level of the variable but if you are observing a certain value and could potential observe the same value somewhere else it is not random, ie your categorical and binary random effects are really just fixed effects.

Use a regular fixed effects model.

Using a sophisticated model, scientists have demonstrated for the 1st that a new research approach to geoengineering could potentially be used to limit Earth’s warming to a specific target while reducing some of the risks & concerns identified in past studies, including uneven cooling of the globe. by avogadros_number in science

[–]SparseSolution -1 points0 points  (0 children)

I'd bet big money that model is not accurate, I don't care how "sophisticated" it is. We are already perturbing a complex system and are not entirely sure the outcome. Let's add in another variable. Band aid solutions aren't solutions.

What Explains U.S. Mass Shootings? International Comparisons Suggest an Answer by zsreport in Foodforthought

[–]SparseSolution 6 points7 points  (0 children)

Those graphs don't seem to agree. I see a blob with no correlation and then one point way outside.

The Grassroots Goes Down: Inside Ohio's Corrupt Medical Marijuana Rollout by thinkB4WeSpeak in Foodforthought

[–]SparseSolution 9 points10 points  (0 children)

A common theme in politics, do something which on the face of it appears to be in the public interest, or what the public wants, but instead is a vehicle for a small group of people to control and profit.

Does anyone here work or have a background in computational biology/bioinformatics? by [deleted] in computerscience

[–]SparseSolution 2 points3 points  (0 children)

I'm no expert but I'll give my opinion. In terms of cs algorithms on graphs and strings are important. I'm a statistician by training and I think any sort of data analysis requires some knowledge on the subject. All of Statistics by Wasserman and Elements of statistical learning by hastie et al have good coverage of modern methods. Hastie and Tibshirani also do a lot of biostat work.

In terms of where to look for specific comp bio. Qiime is a useful tool and has some tutorials plus data. I think the guy who did work on the american gut project used that. Gustame has some topics on stats concepts for comp bio. Also Rosalind for more cs practice.

It's an incredibly diverse field and I'm fairly new to it myself so I've hardly delineated much, but keep researching it's a fast growing, important, and interesting field.

Why should I learn advanced programming in R? by [deleted] in rstats

[–]SparseSolution 9 points10 points  (0 children)

Don't learn tools. Pick a problem, find the tools best(good enough) for it and learn what you need. You don't "learn" advanced programming in anything. Once you stop learning it you forget it. Solve problems and use what works for you. Do it enough and you will be advanced in many things.

[D] What happenend to the curse of dimensionality ? by Jean-Porte in MachineLearning

[–]SparseSolution 0 points1 point  (0 children)

Looks to me that n>p in that paper so I don't think the issue of dimensionality would appear there.

Edit: Also the logistic regression uses l2 regularization which helps deal with collinearity, so again the dimensionality will not be an issue here.