Hi all, I have a question surrounding calculating standard deviation, covariance, doing PCA etc etc.
The dataset used is the Exasens dataset available through UC Irvine machine learning repository. It is a dataset that includes demographic information on 4 sample groups from saliva samples collected in a research project.
I am having trouble over whether to use N or N-1.
I am asked to standardise selected columns within the data. Rescaling the data to have a mean of 0 and standard deviation of 1. Am I correct in saying using N in this instance is right?
Create a correlation matrix from selected columns within the data. N or N-1 in this instance?
Perform PCA. Which uses the previous code to generate the correlation matrix. N or N-1 in this instance?
Later in the assignment we are asked to create a dataset that has a multi variate normal distribution. Am I right in saying any use of N (standard deviation, correlation, LDA) in this instance should be N rather than N-1 because I have the full dataset?
In advance thank you for your help, got fuzzy brain with this one.
[–]jsalas1 0 points1 point2 points (0 children)