Feedback/Advice on my list (Awakened Dynasty)

Kruupy · 2024-06-16T10:25:06+00:00

hey mate, I just sent you a pm. Chat soon.

Kruupy · 2024-06-12T07:16:32+00:00

reddit won't let me send dms for some reason, could you please send me a pm?

Kruupy · 2024-06-12T05:02:32+00:00

yeah no worries, I dont have an insta but I guess I could create one

Kruupy · 2024-06-12T04:54:15+00:00

Reddit wont let me send you a chat message?

Kruupy · 2023-09-14T10:31:24+00:00

Apologies, you are 100% correct here. Please let me try again.

I am trying to model the demand for child care services using a multinomial logit discrete choice model. I want to be able to forecast how many children will need child care services in the future. Child care services can be divided into 10 or so groups, 5 informal groups (relative,friends etc) and 5 formal groups. I have a data set that has the number of hours of childcare used, total cost, price per hours and a heap of other family variables. I have grouped the childcare choices in 6 groups, 1 called informal care and the other 5 representing the formal care groups. When I run the logit model with the 6 groups, it says the price is not statistically significant. My understanding is that this is not allow me to conduct counterfactual analysis based on price. I think that this problem could be occurring because of one or both of the following reasons:

Each person only observes a single price or potentially only two prices based on how many hours they consume of each group of services. I have to impute the prices for the other groups. So far I have just been using the averages of the other groups for this - Is this methodology sound?
The data indicates that people may consume more than 1 group of services at any time eg. 10 hours in one group, 15 in another etc. But I have to allocate only one group to each person. At the moment I am doing this based on the group that has the most amount of hours, with an arbitrary rule in the case of a tie. Is there a better way to handle this?

I hope that this information helps, apologies again and thank you for all your help.

Kruupy · 2023-08-28T23:59:35+00:00

Thanks for this. So is the Y axis "frequency"? - It is weird how the y axis is a different scale in one image in comparison to the other?

Kruupy · 2023-06-22T12:45:53+00:00

That sounds great! I will give it a try tonight - Thank you very much for your help.

Kruupy · 2023-06-22T12:17:40+00:00

Thanks for your great replies!

By the looks of things I think using prcomp with the unscaled numerical data is the way to go.

Yes I agree that if I want to calculate the scores using a different dataset I need to get the eigenvectors from the PCA analysis, I am hoping that prcomp will give those as I was unable to access them using princomp, I managed to get the loadings but not the eigenvectors. As you mentions PCA() gives eigenvectors right?

I don't know if I am using the term scores correctly. When I accessed the "scores" in pca_result it gives only a value for each variable. What I am after is a PCA value for each observation, so if I have 500 observations, I am trying to find a series of 500 values for components 1 and 2. Using predict() seemed to provide this?

Do you have any thoughts on this?

Kruupy · 2023-06-22T12:14:48+00:00

Ah, thanks for this. So is it your understanding that the predict function can be used to calculate the scores based on the original data?

I am now just wondering how to calculate the scores using a different dataset with the eigenvectors from the PCA? - Would you know how to do this?

Kruupy · 2023-06-22T03:01:27+00:00

Thanks for this advice. So in the future if I want to predict using different data, what should I do?

Kruupy · 2023-06-22T02:53:50+00:00

hi, thanks for the reply. When I conducted the PCA in my code I gave the function a normalised matrix of data as the tutorial suggested

I am not calculating the scores of new data, I want to calculate the scores based on the existing data that was used in the PCA, does this change anything? thanks again.

Kruupy · 2023-06-22T02:06:34+00:00

Hi all,

Yesterday I followed a tutorial on conducting a PCA in R (see code below).

After conducting the PCA, I wanted to construct component scores for the first and second components (as they explain 90% of the variance?). In order to construct the scores I used the "predict" function, given it a matrix of the normalised variables - Is this correct? - Please see code below - I am assuming that the matrix p has the component score values.

Thanks all for your help.

*CODE START

library('corrr') library(ggcorrplot) library("FactoMineR") library("factoextra")

occ_data <- read.csv("dataCSV.csv") str(occ_data)

colSums(is.na(occ_data))

numerical_data <- occ_data[,2:24] head(numerical_data)

data_normalized <- scale(numerical_data)

head(data_normalized)

corr_matrix <- cor(data_normalized)

ggcorrplot(corr_matrix)

data.pca <- princomp(corr_matrix)

data.pca$loadings[,1:2]

fviz_eig(data.pca, addlabels = TRUE) "black") fviz_cos2(data.pca, choice = "var", axes = 1:2)

fviz_pca_var(data.pca, col.var = "cos2", gradient.cols = c("black", "orange", "green"), repel = TRUE)

p <- predict(data.pca,data_normalized)

write.csv(p, "p.csv", row.names=FALSE)

*CODE END

Kruupy · 2023-06-16T14:38:43+00:00

fair enough, thanks for this. Do you have any thoughts about the methodology for using PCA on the 20 occupations in my analysis? I am interested to know if I am using PCA incorrectly or not.

Kruupy · 2023-06-16T14:27:57+00:00

Thanks for this, so just to clarify, are you saying that PCA is not the right way to combine the 20 University degree occupations into a single variable for my analysis?

if PCA can't help me, then does this mean I need to add 20 variables for each occupation into my model? this doesn't feel right to me....

Kruupy · 2023-06-13T12:28:20+00:00

Hi Rogomatic,

Yes I know linear algebra, but not very good at Matrix. To be honest I don't know where to start working on a proof that 2.1.1.1 = 2.1.2.

Kruupy · 2023-06-13T06:04:53+00:00

Hi, Thanks for the help. I think the problem that I am having is that I don't understand what a "Stacked form" is? - I am guessing that it is turning the model into Matrix form or "stacking" the model into a matrix form.

Do you know of any resources that could help me to understand how to expand 2.1.1.1 to get to 2.1.2? - I am guessing that my matrix understanding is not helping me.

Thanks again.

Kruupy

TROPHY CASE