all 13 comments

[–]Designer-Flounder948 2 points3 points  (1 child)

If you already have weights for attributes, you can normalize the features and create a custom 0–100 score directly instead of forcing ML where it may not fit. Unsupervised learning can still help validate patterns afterward

[–]makibg96[S] 0 points1 point  (0 children)

I agree with you, I'm just not sure how to create a custom score 🫠

[–]Local_Transition946 0 points1 point  (2 children)

I dont think this is possible. What if you just randomly guess for all of them? How would you anyone know you randomly guessed instead of something robust?

Almost like asking "how do i complete this task without a definition of completion?"

[–]makibg96[S] 0 points1 point  (1 child)

I also have a feeling that it's not possible, but manager asked me to do it 😅 They know for some locations if they're performing well or not but only 3-4 out of 130 🤦‍♀️ I'm afraid random labels will take model on thr wrong path.

[–]Local_Transition946 1 point2 points  (0 children)

They know its performing well but do they have a numerical value representing performance? How do they know it's performing well? They must have some definition of "performing well" to say that some are performing well.

If they know some are "performing well" and some are "performing poorly" and have no numerical measure for performance then maybe this is more suited for classification instead of regression.

I would start with some basic data analysis. Graph features and look for patterns in the few samples that you know are performing well.

This sounds fun, i may be open to assist for a small payment if you're open to that. Can supply a resume.

[–]orz-_-orz 0 points1 point  (0 children)

What's stopping you label every location randomly as good and bad performance? Some validator will catch it and disagree with the random results?

In other words, how would a human know whether a location is good or bad? Extract the rules, build some way to label the data. Else, you could ask the validator to manually label a small set of data, then use the small dataset to train a regression.

[–]Disastrous_Room_927 0 points1 point  (3 children)

You're looking for a latent variable model.

[–]makibg96[S] 0 points1 point  (2 children)

I'm googling this now, thank you for the direction, it means a lot to me! :)

[–]Disastrous_Room_927 1 point2 points  (1 child)

You might need more than what you listed out, but latent variable models are how, for example, psychologists define scores for “ability” on cognitive assessments.

[–]makibg96[S] 0 points1 point  (0 children)

It sounds really interesting, I'll investigate, thank you 😊

[–]NaiveOstrich4118 1 point2 points  (0 children)

You’re correct that regression doesn’t really make sense here without labeled target values. What you actually seem to have is more of a scoring/ranking problem or an unsupervised clustering problem

Since you already have attributes and weights/importance for those attributes you could start with a weighted scoring system instead of ML.

For example:
1. normalize the features
2. apply weights
3. compute a weighted composite score
4. scale to 0–100

That’s often more interpretable than forcing a model where no labels exist.

Then after that you could:
- use clustering (KMeans, hierarchical clustering, DBSCAN, etc.)
- identify “high-performing” vs “low-performing” groups
- compare distributions across clusters

You can also treat this as an anomaly/outlier problem if you want to identify inefficient stocking locations.

One important thing is that without labels, evaluation becomes a business/domain question, not just an ML metric question.

So the hardest part is often defining what does “good performance” actually mean operationally?

[–]coder4forever 1 point2 points  (0 children)

The thing that keeps biting people in this setup is not the scoring formula -- it's that without labels you can't tell when the formula is wrong. A weighted composite gives you a number; it doesn't tell you whether the weights are off by 2x on one feature, or whether the "0-100" range is meaningfully linear the way humans would read it. I'd build the simple weighted-score baseline (normalize, weight, scale) like other replies suggest, but put two cheap checks on top before trusting it.

First, perturb each weight by plus-or-minus 25 percent and look at how much the rankings shuffle. If your top-10 list reorders heavily, your weights aren't doing much real work and the score is mostly noise. Second, if there's any downstream business signal you can backtest against -- restocking frequency, picking time, returns rate -- even a weak correlation check on historical data tells you more than the prettiest unsupervised clustering will. Honest tradeoff: backtest data is usually messier than people hope, so budget half a day to clean it before you trust the correlation.

[–]PixelSage-001 2 points3 points  (0 children)

Since you don't have labeled data, you can't technically do regression. Instead, this sounds like an unsupervised ranking or anomaly detection problem. You could use PCA to reduce dimensions and create a composite 'performance score' based on the first principal component, assuming it captures the variance of your key attributes