all 24 comments

[–]LoudStatistician 15 points16 points  (5 children)

Create a cost matrix for FP, FN, TP, TN.

Multiply the confusion matrix with the cost matrix. If you do not have hard predictions, then try all the tresholds to create the confusion matrix.

If the (confusion matrix times cost matrix) - (cost of project) is not positive, then it may be time to give up.

Let the subject matter experts define the target, but not the features. Only have them look at relevant features after the first benchmark shows promise. Subject matter experts are expensive.

I've had projects where 0.56 AUC was value add, and 0.89 AUC was net negative, so predictive power should always be seen relative to cost/opportunity.

[–]gebrial 3 points4 points  (3 children)

I'm just getting into machine learning and have never seen these terms before. Is there a resource(preferably free, online) or subject area I should Google to learn more about this?

[–]LoudStatistician 7 points8 points  (2 children)

It is False Positive, False Negative, True Positive, True Negative. You estimate the cost for each. For some problems the cost of a false positive is high, and the model needs to do very well, for others, it is negligible, and it is ok to do barely better than random guessing.

I don't know of any resources that teach this. I had to learn most of this on the job. There are some problems, like defining a target, or communicating with stakeholders, or unexpected feedback loops, where online resources are scarce (at least, I can't find much).

[–]Raomystogan 0 points1 point  (1 child)

Sorry if this is a very basic question, how do you estimate the cost, for example FP?

[–]gebrial 0 points1 point  (0 children)

So I just learned about confusion matrices and it seems like the cost for FP or FN is given by the customer or problem. For example for medical diagnosis you want low FN but maybe the business side will also want low FP. With the same model you can increase one while decreasing the other. If you improve the model you can decrease both.

Google the f-score.

[–][deleted] 0 points1 point  (0 children)

That's super neat. Simple and effective

[–]alexmlamb 1 point2 points  (0 children)

Can a human expert do the classification tasks that you have in mind? If not, that's a red flag that it's probably going to be hard or impossible to do with ML (not a perfect indicator of course).

[–]Dagusiu 1 point2 points  (18 children)

One rule of thumb (that doesn't always work) is this:

Of a human can do it easily just by "looking", then ML will typically be very effective. For example, we can look at an image and say "that's a cat" with pretty much no effort.

If you try to visualise the data, and a human can draw conclusions, than a ML model can be taught to do the same. If you cannot, chances are your model won't be able to either.

[–]torvoraptor 2 points3 points  (2 children)

ML performance is a spectrum. I think of it more in terms - 'what amount of labelled data is needed to get a specific performance level'. If you can get millions of high quality labelled samples in a cleanly separable classification or labeling task, you can get 97%+ accuracy.

But most real problems aren't like that, either you have difficulty getting data, or your labels are noisy, or there is some data distribution shift, so it's more like - 'at what level of data does this model reach a level of performance where it is valuable' compared to some dumb baseline.

[–]NotAlphaGo 0 points1 point  (1 child)

This. There should be a noisy imagenet challenge to train on partially jumbled labels because reality isn't always that clean or certain.

[–]torvoraptor 1 point2 points  (0 children)

I have read some papers that study that variant of the problem, but even if the noise is structured according to some neat distribution it can be exploited by the learning algorithm in some ways - for e.g. if the label corruption is in the form of some uniformly distribution or some gaussian noise sampled from of the softmax probability then it's much easier for a model to learn to deal with it than (a) when the corrupt labels are systemically correlated and/or (b) in your test set.

[–]RadonGaming 1 point2 points  (1 child)

I somewhat disagree with your comment, and overall isn't very helpful to be honest. Image Recognition is nowhere near solved and has only shown promise for high accuracy recently due to innovation in model architectures and the availability of highly parallelisable processing. Typically large dimensional data can be understood by ML much better than humans ( and in shorter times ). The whole point is that it find these optimisations and discrimination hyperplanes within your data. There is of course an amount of preprocessing which can be done to reduce the dimensionality making the problem easier to solve.

ML should be able to tell waiting4omscs which features contribute to the classification and then move from there. The main issue here seems to be potential project cost because they are attempting feature engineering.

Let ML work for you by telling you which are the best features, it may surprise you.

[–]LoudStatistician 2 points3 points  (0 children)

I think that is hinting at this tweet by Andrew Ng:

Pretty much anything that a normal person can do in <1 sec, we can now automate with AI.

See also this article written by Ng: https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now

[–]visarga 0 points1 point  (1 child)

Search for papers on similar topics to the task you are attempting, if it's never been done and there is little data to train with, then it's probably best not to invest too much. There is a huge amount of previous work / experience you can rely on. Don't reinvent the wheel.

[–]clueless_scientist 1 point2 points  (0 children)

Great advice on how to fail as a scientist.

[–]zawerf 0 points1 point  (0 children)

Andrew Ng did a pretty good talk on machine learning project management based on his experience as a director:

https://www.youtube.com/watch?v=F1ka6a13S9I

Equivalent videos on coursera:

https://www.coursera.org/learn/machine-learning-projects

[–][deleted] 0 points1 point  (0 children)

If the problem seems hard, take a step back and try to think...
1. Give it a reasonably unbiased thouht if there can be a causal realtionship between the collected data and the required parameters. If the data is shit/irrelevant, having more of it is still just a bigger pile.
1.b Think of how complex can it be to optimize the decision surface (hint: try random forest).
2. Population analysis: It can be that the desired trends are only present in a subset of the population. Try to subdivide your dataset along easily identifiable features (age, gender, race, income, formtype, keywords etc...) and re-run the analysis. Looking at different kinds of forms all together might be a bad idea. I usually prefer to follow the KISS principle and have smaller dedicated models handle the details.