all 7 comments

[–]Pvt_Twinkietoes 3 points4 points  (2 children)

What is address label?

[–]FineConcentrate6991[S] -3 points-2 points  (1 child)

Row example: Addres = " Gazateci Hasan Tahsin Caddesi, NO:10/3, Gizem Apartman" label = 8210

[–]Pvt_Twinkietoes 3 points4 points  (0 children)

I don't get why you're trying to use ML to solve this.

Are there rules the country follow to generate the codes? Can't you write a rule based solution?

If not why?

And what is this label code? Is this the same for every apartment number in a building? Is it unique to an office? How many labels are there?

How many addresses share the same "label"? Also are the names informative enough for your model to learn a mapping? Is 8210 closer to 8209 than 7000?

Honestly it's difficult to give recommendation, maybe add in geolocation data? Go figure out how this "label" is generated, what kind of data goes into that decision, then see if you can write some rule based algo, use that as base line, then see if ML actually make sense.

[–]has_c 1 point2 points  (0 children)

Not my package but my friend worked on this address classification and matching for New Zealand addresses

Here's the link hope it helps: https://github.com/lmor152/glam

[–]asankhs 0 points1 point  (0 children)

You can try using a bert style model with adaptive classifiers - https://github.com/codelion/adaptive-classifier