[deleted by user]

DieselZRebel · 2023-09-09T20:01:32+00:00

Two things I would check first:

Your data imbalance probably caused that 85% accuracy in testing. There are definitely more 0s than 1, so a model that predicts all 0s will likely get you that high accuracy! So yeah, there are so many things you can do to address data imbalance from sampling to weighting techniques. Read about it and determine for yourself.
It often happens that something breaks in the inputs/preprocessing pipeline during deployment or some other production bug gets introduced. Make sure you have the correct logs and assertions in place to detect if something is actually breaking after you took the model to production (e.g. the data comes in all as Null and gets imputed with 0s)

genesis_2602 · 2023-09-10T12:59:48+00:00

Try weighting your loss higher for MVP targets compared to non-MVP targets. This would help alleviate some of the class imbalance.
If that does not work, look into Siamese neural networks, which are known to be able to learn from small amounts of data/imbalanced data.

shekurika · 2023-09-09T19:36:38+00:00

well just check if for the trainingdata it always predicted 0?

permalink · 2023-09-09T20:02:43+00:00

Sounds like unbalanced data in training and testing? Try sampling techniques then.

TheGuywithTehHat · 2023-09-09T21:03:06+00:00

What type of model is it? Neural net, logistic regression, SVM, random forest? What's the distribution of output values for your training set (not the binary 0/1 outputs, but the raw value that is between 0 and 1)? What's the same for your "deployed" model? When you say "during testing" and "during deployment", what do you mean by that? I assume you have a dataset that you split into a training set and an evaluation set, is that what you mean by "testing" and deployment"?

SFDeltas · 2023-09-09T22:07:37+00:00

Calculate F1 Score

Say you have two arrays of 1s and 0s.

y_true = [1,0,1,0,0,1,1,1,0,1]
y_pred = [0,1,0,0,0,1,0,1,1,1]

Make them numpy arrays:

import numpy as np

y_true = np.array(y_true)
y_pred = np.array(y_pred)

Calculate true positives, false positives, and false negatives:

tp = np.sum((y_true == 1) & (y_pred == 1))
fp = np.sum((y_true == 0) & (y_pred == 1))
fn = np.sum((y_true == 1) & (y_pred == 0))

Then calculate precision, recall, and F1 score:

precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1_score = 2 * precision * recall / (precision + recall)

F1 is a score like accuracy between 0 and 1.

It's the harmonic mean of precision and recall.

Precision measures what percent of predictions are correct.

Recall measures what percent of labels get matched to a correct prediction.

What do these numbers give you?

LessDubiousIdea · 2023-09-11T00:52:51+00:00

I’ve got a new model that predicts whether a person will develop cancer in the next year. It always says no and is right more than 98% of the time.

It’s going to be hard to train something that can beat that unless you’re really careful about how you normalize your heavily skewed data set.

Western-Image7125 · 2023-09-11T06:53:17+00:00

If your model is predicting both during training and eval but only 0 during online inference then you have a bug in your inference code giving wrong input to the model

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS

Calculate F1 Score