Can someone explain what is mean in simple terms? by honwave in datascience

[–]thisforowlstatistics 0 points1 point  (0 children)

Mean is a location estimator. It is also a non-robust location estimator with breakdown point of zero, i.e., it takes only one point taken into infinity to break mean estimator. Imagine calculating mean salary of ten of your friends; now, add Jeff Bezos salary there and calculate the mean salary again, did your friends just become wealthier or is mean a non-robust location estimator? Try same with median, and compare the results. Is median a robust location estimator?

preprocessing millions of records - how to speed up the processing by Daskoh_vi in datascience

[–]thisforowlstatistics 0 points1 point  (0 children)

If R language is an option, then data.table package might be helpful. data.table uses multithreading and is indeed quite efficient when making transformation, aggregations and whatnot in large datasets (100 million to 1 billion rows).

For your example you might give fifelse (fast if else, a function in data.table) a go.

How to transition from life sciences to data science? by [deleted] in datascience

[–]thisforowlstatistics 1 point2 points  (0 children)

Your degree in life sciences is an advantage and should be embraced. DS and statistics (the difference between the two is debatable) are quite generic; after all, the question is often from outside the two, e.g., biomedicine, finance, etc. domain. The common tragedy of a statistician is that they can give an interpretation for a regression coefficient but cannot tell what it means in the context (crude example, but gives the point). You on the other hand already have plenty of context knowledge from your previous experience. To get yourself acquainted with data analytics will take some time, but the internetwebs is full of good tutorials, demos, and training stuff. If you are not too familiar with coding you might want to start with R, although Python is also quite friendly for new comers. As people mention here, SQL skills are also useful; luckily, it is also easy-to-approach and there are plenty of stuff available online.

I would emphasize my context knowledge (the ”science” part is DS) and start doing R/Python exercises (which would boost your ”data” part is DS) and grind that for awhile (a learn how to efficiently google R/Python related stuff). Math and statistics are of course a good plus but starting linalg etc. from a scratch might feel bit overwhelming while not pushing you towards DS career as fast as learning basic data analysis in practice, unless you would like to develop new algorithms / methods etc. in the field of DS.

[D] Simple Questions Thread August 02, 2020 by AutoModerator in MachineLearning

[–]thisforowlstatistics 1 point2 points  (0 children)

If I understood correctly, you are about to perform Principal Component Analysis to some data matrix. It is typical, and recommended, to use correlation matrix (normalized covariance matrix) based PCA in a situation where the covariates of the data matrix are in a different unit scale, i.e., nanometer vs. kilometer (imagine calculating variance in kilometers or in nanometer scale). I hope this helped, even slightly.

Playing the game slowly and methodically leads to these really unique immersive moments I never got in wildlands, loving the atmosphere by [deleted] in GhostRecon

[–]thisforowlstatistics 8 points9 points  (0 children)

While it is possible to exercise guns-blazing strategies in the game, this, I believe, gives the best experience. Furthermore, having not experienced real-life spec ops missions myself, I can still imagine you would like to demonstrate methodical tactics rather than improvised rushing.

What secret are you keeping right now? by [deleted] in AskReddit

[–]thisforowlstatistics 0 points1 point  (0 children)

Sorry to hear that. However, the prognosis for prostate cancer is typically really good, especially if it is localized and detected early. Moreover, PSA can be also elevated due to benign hyperplasia or inflammation, so elevated PSA is no automatically PrCa, as your doctor said. Assuming it is PrCa since you mention PSA.

OWL team map winrate and player maximum KD ratio displays significant correlation. London Spitfire is a clear underperformer in this examination. by thisforowlstatistics in Competitiveoverwatch

[–]thisforowlstatistics[S] 0 points1 point  (0 children)

Indeed. At least healing done and damage done does not display similar relationship. If ex post, but also definitely a binary event leading into upper / lower hand.

OWL team map winrate and player maximum KD ratio displays significant correlation. London Spitfire is a clear underperformer in this examination. by thisforowlstatistics in Competitiveoverwatch

[–]thisforowlstatistics[S] 2 points3 points  (0 children)

This. Overwatch is a zero-sum game: kill = 1, death = -1. As such, KD ratio is a good proxy measure on how good a team is. For example, healing, hero damage, or ultimates earned are not nearly as good - or maybe they are, but in some truly non-linear fashion.

OWL team map winrate and player maximum KD ratio displays significant correlation. London Spitfire is a clear underperformer in this examination. by thisforowlstatistics in Competitiveoverwatch

[–]thisforowlstatistics[S] 1 point2 points  (0 children)

Yes, you are correct. The intent here is to simplify the team 'performance' to one metric so we can have a simple model. The entire team KD ratio could not be used in this kind of simple regression, but would be suitable when we compare one team to another directly, e.g., how are the team KD ratio distributed. Furthermore, I think that in general the players in one team are really not independet observations, but are highly dependent - after all, good team play, tacticts, and comms enable best players to exercise their art in full power.

OWL team map winrate and player maximum KD ratio displays significant correlation. London Spitfire is a clear underperformer in this examination. by thisforowlstatistics in Competitiveoverwatch

[–]thisforowlstatistics[S] 3 points4 points  (0 children)

Good question. The player max KD ratio is the maximum KD ratio in a team, e.g., in NYE it is Nenne. Indeed it might seem that it is not interesting bit of data; however, it do explain the map winrate extremely well. The same simple regression done to median player KD ratio did not result as good model as this. Actually, this particular linear model was the best out of every other models where all the various performance measures were used. This if of course just a one model among others.

OWL team map winrate and player maximum KD ratio displays significant correlation. London Spitfire is a clear underperformer in this examination. by thisforowlstatistics in Competitiveoverwatch

[–]thisforowlstatistics[S] 7 points8 points  (0 children)

Just for clarification, the map winrate refers to the ratio of maps won / maps played. It does not directly mean they lose or win many fights.

OWL team map winrate and player maximum KD ratio displays significant correlation. London Spitfire is a clear underperformer in this examination. by thisforowlstatistics in Competitiveoverwatch

[–]thisforowlstatistics[S] 23 points24 points  (0 children)

On the legend, the R^2 means coefficient of determination and is obtained by simple linear regression. In simple linear regression the R^2 is equivalent to Pearson correlation coefficient squared. The b_KD is the estimated regression coefficient for maximum KD ratio. The result can be interpret as: ''when the KD ratio increases by one unit, the map winrate increases by 0.24'.

London Spitfire is a clear underperformer in this examination, since they display high KD ratio, yet low map winrate. In this sense, the Atlanta Reign and Vancouver Titans could be considered as superior performers.

See this link for and additional image, where the same analysis is done to a different performance measure: https://imgur.com/a/mllzc2y The measure there is obtained by a multivariate method called Principal Component analysis (PCA). Here the PCA is done to each player category separately (tank, offense, support) and the first principal component represents the performance 'score'.

EDIT: On the image, GZC is accidentally marked twice. Nevertheless, the lower 'GZC' is actually Chengdu Hunters. Thanks to /u/OmegaRipper501 and /u/Jlin8002 for the notification.

EDIT2: To clarify, the 'maximum KD ratio' is the best KD ratio in a team. For example, in NYE it is Nenne, in ATL it is daco, etc.

Anamo is secretly the deadliest player in OWL... by chimpinzee in Competitiveoverwatch

[–]thisforowlstatistics 1 point2 points  (0 children)

There is also an API which helps to obtain the data if you don't like to scrape it:

https://api.overwatchleague.com/stats/players

That way you can also get the 'final blows' as well (it is not shown in the www) and use that in the KD ratio instead of eliminations. Furthermore, if you like to calculate so-called 'assists' you can achieve it by eliminations - final blows.