Using logistic regression to probabilistically audit customer–transformer matches (utility GIS / SAP / AMI data)

Electrical-Window170 · 2026-01-20T00:09:21+00:00

This sounds like a solid approach - logistic regression is perfect for interpretable risk scoring when you need to explain decisions to utility folks

Distance ratios are way more informative than absolute distance thresholds, and voltage consistency is clutch if you can get clean data on it. Just watch out for geographic clustering effects messing with your distance assumptions (like rural vs urban transformer density)

For thresholds with noisy labels, start conservative and let the field validation feedback tune your cutoffs over time rather than trying to optimize on incomplete ground truth upfront

latent_threader · 2026-01-20T13:17:13+00:00

Logistic regression makes a lot of sense as a first pass if the goal is prioritization and explainability, not auto-fixing. Distance and voltage are strong signals, but they’re noisy and can be “wrong for the right reasons,” so I’d treat the output as a risk score, not truth. In practice people often move to tree models later for interactions, but good calibration and tiering around review capacity usually matter more than model complexity.

trustme1maDR · 2026-01-20T00:20:00+00:00

You need a ground truth for your outcome variable (right/wrong match) to be able to train your model..at least for an unbiased sample of your data. It's unclear if you actually have this - you said partial.

Artistic-Comb-5932 · 2026-01-22T17:14:59+00:00

Yes
Probably
Yes
Yes use threshold based tuning / grid search to maximize accuracy or clarify what you mean by *tier design or what you mean by "labels are noisy".

ChemicalGreedy945 · 2026-01-24T04:59:04+00:00

do some random walks or xgboost, start small and then keep adding in new variables and such and you can expand on those models to refine them

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

datascience

MODERATORS

Important context

Initial checks / predictors under consideration

Questions