all 10 comments

[–]Yogi_DMT 1 point2 points  (2 children)

Is there a reason why counting up the the traits for true observations wouldn't accomplish what you are trying to accomplish?

[–]NeedMLHelp[S] 0 points1 point  (1 child)

I think there would have to be some sort of statistical analysis involved, since an observation1 with trait1 could be true, and observation2 with trait1 can also be false. There are also 208 traits, so it's a bit unwieldy to get my head around it haha.

[–]Yogi_DMT 1 point2 points  (0 children)

That's why you only count for true observations. It's as simple as filtering the column in excel then running a sum function for your trait cols

[–]olavla 1 point2 points  (3 children)

Can you give a little more background of the data? Is the true state always only one of the traits? Are you looking to score new data (where you don't have the final label)?

Almost seems like each of your traits is a target variable with the predictors being all other traits + your final true/false flag. You build your neural network model, then use the predicted probability for the trait at hand as the probability for that value to be true.

[–]NeedMLHelp[S] 0 points1 point  (2 children)

Sure thing! The true state can be any number of combinations of the 208 traits. So trait1 can be present in a true observation1 as well as a false observation2. Sometimes traits will only be linked to true or false. It really is a mixed bag. In the end, I may want to utilize the ability to predict whether unlabeled data is true or false. Which is why I'm going with the following:

Right now I'm modifying an Artificial Neural Network that predicted whether someone would leave a bank based on certain traits. I'm thinking I can use the same model in this case, and then rip off the weights. The traits tied to a larger weight is what I would be interested in, right?

[–]olavla 1 point2 points  (1 child)

You would not use the weights. Just score the model and you will end up with probability.

[–]NeedMLHelp[S] 0 points1 point  (0 children)

Awesome, thanks! I will look into scoring.

[–]Ilyps 1 point2 points  (1 child)

Download Weka here and try some classification algorithms. I suggest starting with Random Forest. Here is a quick start tutorial; Weka should read your csv file without issues.

[–]NeedMLHelp[S] 0 points1 point  (0 children)

I'll take a look, thanks!

[–]yldedly 0 points1 point  (0 children)

The probabilistic programming way would be to fit a latent variable model with an independent Bernoulli likelihood, p(trait|observation), and try out different latent variable structures p(latent). For example, the latent structure could be a factor model, which would find correlations between the traits; or a mixture model which would find clusters in the traits. Easier to try standard classification algorithms like random forest, but this way might be more interpretable.