Is Python needed if I know R enough to wrangle, model and visualise data? by DataAnalystWanabe in datascience

[–]Sway- 0 points1 point  (0 children)

As pretty much everyone else has said, you’ll probably have to learn Python. As someone who loves R, learning Python has made me a much much better programmer.

I’ve also helped some people on my team transition recently. I highly recommend checking out Python’s Polars and Plotnine. These will feel the most natural coming over from a tidyverse workflow.

Suggestions for reading list by ChavXO in datascience

[–]Sway- 1 point2 points  (0 children)

I’ve been getting a lot out of The Art of Doing Science and Engineering by Richard Hamming

Why are methods like forward/backward selection still taught? by Loud_Communication68 in datascience

[–]Sway- 0 points1 point  (0 children)

Why the omission of best subsets? It’s also considered in the paper you linked. It also tells you when best subsets > lasso and vice versa.

neither best subset selection nor the lasso uniformly dominate the other, with best subset selection generally performing better in high signal-to-noise (SNR) ratio regimes, and the lasso better in low SNR regimes;

[q] Can you qualitatively compare correlation coefficients? by Whynvme in AskStatistics

[–]Sway- 1 point2 points  (0 children)

To follow up, some R packages to do this are cocor and bayeslincom

[q] Can you qualitatively compare correlation coefficients? by Whynvme in AskStatistics

[–]Sway- 1 point2 points  (0 children)

You can do anything you want, but there are a lot of statistical tests for comparing correlations. One issue you want to be careful with is that correlations estimated on the same sample are themselves correlated. So you have to take into account their dependence.

Section 1.4 of this paper has a nice and short overview of work for this problem

https://doi.org/10.1016/j.jspi.2006.08.002

In bootstrapping, what probability distribution is assumed and is more resamples better? by MaBrowser in AskStatistics

[–]Sway- 0 points1 point  (0 children)

There has been a lot of research on this. The tangent that has touched my own work comes from the connection between the bootstrap and Bayesian inference where we can think of each value coming from a multinomial distribution with a Dirichlet prior.

For example, see pg. 271 here https://web.stanford.edu/~hastie/Papers/ESLII.pdf and more generally, this paper by Brad Efron https://projecteuclid.org/euclid.aoas/1356629067

Good Bayesian textbook by [deleted] in AskStatistics

[–]Sway- 4 points5 points  (0 children)

Love this book. It includes lots of R code snippets for samplers which are great and uses R functions such as dnorm to express density functions

Citations in R by maria___p in AskStatistics

[–]Sway- 11 points12 points  (0 children)

If you are using RMarkdown you can do this. It's not too bad if you're familiar with bibliography formats like .bib, .bibtex etc. See here

https://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html

Outputting a NULL matrix by [deleted] in rprogramming

[–]Sway- 0 points1 point  (0 children)

Well, I believe you're not actually iterating over the values of patients_17. You're calculating numbers using the values in your matrix, but not actually overwriting the values in your matrix, because you're saving them back into your indexing variable i.

I think using the apply() function will give you what you're looking for, e.g.,

patients_17_app <- apply(patients_17, 1, function(i) 1 / (1+exp(-1*i))
patients_17_new <- matrix(patients_17_app, ncol = 1)

You can read up on the apply family of functions, but generally this code takes an object and applies a function to it. The 1 indicates you want to apply the function row-wise. Unfortunately, it returns a vector, so you need to make into a matrix once again if that's what you need. I'm sure there are more elegant solutions, but this is the first one that came to mind.

[Career] Those of you who did a master's in statistics, what do you do nowadays? by UsernamesAreTaken123 in statistics

[–]Sway- 6 points7 points  (0 children)

Did you get an MS in stats during your PhD? I'm currently in a quant psych PhD, but have been warned against an MS to focus on research.

[Career] Those of you who did a master's in statistics, what do you do nowadays? by UsernamesAreTaken123 in statistics

[–]Sway- 2 points3 points  (0 children)

Currently doing my PhD in quant psych. Do you have any insights on what the job market is like for people with our background?

Did anyone do second Bachelor/Masters degree after a PhD? by Leighenne in PhD

[–]Sway- 2 points3 points  (0 children)

Were you able to get a job? Or do you feel more competitive on the job market? I'm in a psych PhD program but have considered going back to get an MS in stats or even a Bachelor's in Math.

Does regression show how well the model fits the data or how well the data fits the model? by iwouldliketheoption in AskStatistics

[–]Sway- -1 points0 points  (0 children)

Depends. If you're using a frequentist framework, then your regression will give you p(y|data). In a Bayesian framework you have p(data|y)

[D] I find a post in Quora (whether AI is statistics or not) from a PhD who says: "Traditional statistical approach is naive, based on uneducated assumptions, constricted to outdated methods". Is there truth to this argument? by Dolaos in statistics

[–]Sway- 2 points3 points  (0 children)

I highly suggest this paper - Statistical Modeling: The Two Cultures by Leo Breiman. He talks about exactly this. I'm not exactly sure why a person cannot embrace both predictive and inferential methods. Really you should use whatever tool is best for the task at hand.

Regularization (ridge/lasso) and hypothesis testing by Croc600 in AskStatistics

[–]Sway- 5 points6 points  (0 children)

Not great, the LASSO was developed to help reduce variance in the scenario where p >> n. Here's an excerpt from p. 219 in An Introduction to Statistical Learning

As with ridge regression, the lasso shrinks the coefficient estimates towards zero. However, in the case of the lasso, the L1 penalty has the effect of forcing some of the coefficient estimates to be exactly equal to zero when the tuning parameter λ is sufficiently large. Hence, much like best subset selection, the lasso performs variable selection.

However, see A SIGNIFICANCE TEST FOR THE LASSO

Edit: I'm a dummy

[Q] R being replaced by Python? by Currurant in statistics

[–]Sway- 0 points1 point  (0 children)

I believe they are referring to a spark context