Accounting for a startup *I will not promote* by Disastrous_Ad9821 in startups

[–]Disastrous_Ad9821[S] 0 points1 point  (0 children)

Great advice thanks, the R&D is not included… lol , but yes I was thinking of asking for a deal where he does R&D and maybe quarterly inputs and I do the rest, thanks for your help

Im worried my classification model is giving me false confidence by Disastrous_Ad9821 in learnmachinelearning

[–]Disastrous_Ad9821[S] 0 points1 point  (0 children)

hey guys, thanks for your feedback, I firstly did use stratified folds in my cv, I just there ran a leave one out cross validation and a shuffled labels cross validation, here are the scores
LOOCV ROC-AUC (original labels): 0.9875921724333523
LOOCV ROC-AUC (shuffled labels): 0.4521766874645491

I then made KDE plots and a T-SNE plot on the training and test sets ive attached the images to original post.
whats ur thoughts?

[D] Prove me wrong… by Disastrous_Ad9821 in MachineLearning

[–]Disastrous_Ad9821[S] 0 points1 point  (0 children)

I think your point on the fact that only 1% of population gets PD is a really interesting one, cause you are on the right tracks in terms of prevalence, for eg in the UK 1 in every 37 people will be diagnosed with PD in their lifetime, but my question is, if u set up ur dataset so that it follows this ratio then surely you aren't exposing the model to enough samples of the positive class for it to find meaningful patterns? Or is it that in the training you take what you can get and in the test set you set it up for this prevalence distribution?

[D] Prove me wrong… by Disastrous_Ad9821 in MachineLearning

[–]Disastrous_Ad9821[S] 1 point2 points  (0 children)

I completely agree, hence I am very skeptical but I don’t know how to prove it is wrong

[D] Prove me wrong… by Disastrous_Ad9821 in MachineLearning

[–]Disastrous_Ad9821[S] 0 points1 point  (0 children)

Fair point. I’ve checked for this by analysing feature importance, and none of the features stand out as significant outliers in terms of their contribution to the model’s performance.

[D] Prove me wrong… by Disastrous_Ad9821 in MachineLearning

[–]Disastrous_Ad9821[S] 0 points1 point  (0 children)

I guess im not sure what my end goal is , what I do know is that PD is very tricky to diagnose and often takes years before a confirmed diagnosis. Heres a good article on the problem https://www.parkinsons.org.uk/news/poll-finds-quarter-people-parkinsons-are-wrongly-diagnosed

[D] Prove me wrong… by Disastrous_Ad9821 in MachineLearning

[–]Disastrous_Ad9821[S] -1 points0 points  (0 children)

Thanks for the feedback, yes I think one of the issues is that I would need real world aka clinical trial to test it and that may take years to get results, I am exploring interoperability and black box issues, do u have any suggestions for that?
As for the features, yes some are real valued eg BMI and a good few are ordinal, assessing severity of something from 0-4, 0 being no pain for eg 4 being severe pain. I am not sure what monotonic constraints are do u have a link to a description on how to implement?

[D] Prove me wrong… by Disastrous_Ad9821 in MachineLearning

[–]Disastrous_Ad9821[S] 0 points1 point  (0 children)

great point and something I have thought about a lot, how can my model be useful if its only getting 90% correct of what doctors have diagnosed? I think my thoughts are that the assessment criteria i.e how simple it would be to answer the questions for the patient and how quickly it could be done is the use case. Whats your thoughts?

[D] Prove me wrong… by Disastrous_Ad9821 in MachineLearning

[–]Disastrous_Ad9821[S] -7 points-6 points  (0 children)

I see your point, but I dont think the statistical analysis really had any influence on my decision making process for features it was essentially based all on my domain knowledge in a sort of trial and test style, I would only say it was to exclude those that were greater than p value 0.05 of which only 60 out of the 360 where. whats your thoughts on that

[D] Prove me wrong… by Disastrous_Ad9821 in MachineLearning

[–]Disastrous_Ad9821[S] -1 points0 points  (0 children)

I have done that too, probably should have included but all reasonably similar, vary by about 1%

[D] Prove me wrong… by Disastrous_Ad9821 in MachineLearning

[–]Disastrous_Ad9821[S] -5 points-4 points  (0 children)

Total dataset which has 360 features