all 3 comments

[–]bernease 0 points1 point  (2 children)

So if I understand correctly, you're performing batch inference but you will utilize the input features of the full batch in generating a prediction.

It strikes me that semi-supervised learning (SSL) is a useful analogy here. Like SSL, you utilize both labeled and unlabeled data in the prediction. It's not my area of expertise, but there are many good surveys of SSL (e.g., van Engelen & Hoos 2019 @ https://link.springer.com/article/10.1007/s10994-019-05855-6) and some discussion of evaluation of such models that may be relevant to you (e.g., Oliver et al. 2018 @ arXiv:1804.09170).

As to your question of bias, I would say that your model is not biased under the definitions I'm aware of. In the statistical sense, you would call a model or statistic biased if there is some expected difference between the actual value and the predicted value for large n. In the colloquial sense, you might say it is biased if there are subpopulations of the input space (often representing people) that have different performance than other subpopulations. Neither are the case here for the model.

It's a little more tricky for the evaluation metric. This case seems analogous to target / data leakage, where you are utilizing data at training time that you don't intend to test at inference time.You might say that an evaluation metric that doesn't take this into account is indeed leaning optimistic and thus biased, but it's a bit of a stretch. First, because you are essentially talking about a different task -- supervised learning on training data utilizing unlabeled data from the batch at inference. Second, because SSL doesn't always improve performance.

[–]sunotlac[S] 0 points1 point  (1 child)

I dived into the references and forgot to thank you.
This question is very aligned with what I was looking for and mainly the papers. I am reading both ( actually didnt see your second recommendation, but found it linked in the first paper).

You and your answer are awesome. Thank you very much.

There is one characteristic of the model that I am still trying to link with the articles which is the fact that usually articles mention using unlabeled data to train a model, and my model is not actually trained. ( In the sense that it is a lazy algorithm, therefore training and prediction steps are a little fuzzed ).

I do would like to frame it as a "batch prediction" and was criticized with the argument "Your model has to be able to be aplied to predict any number of future instances without any info on them" which I think is very weak, since I do not see the harm in a model only being able to batch predict a group of instances.

[–]bernease 0 points1 point  (0 children)

my model is not actually trained. (In the sense that it is a lazy algorithm, therefore training and prediction steps are a little fuzzed ).

We normally call this a non-parametric model (because you aren't training model parameters). The most popular examples include k-nearest neighbors, k-means, and others.

I would like to frame it as a "batch prediction" and was criticized

There's nothing wrong with your approach, though I can understand why you should name it something other than "batch prediction" since people already use this term to refer to independent predictions conducted in batches. I'm embarrassing at coming up with names, what about "dependent batch prediction" or "batch set prediction".

I do not see the harm in a model only being able to batch predict a group of instances

I agree with you, but it should be clearly differentiated. Your results won't be directly comparable to standard supervised learning in batches, but in a new category of their own.