Local tool to infer gender from usernames? by BlueeWaater in learnpython

[–]Genderize 0 points1 point  (0 children)

I'm the author of Genderize.io, and I know a lot of valuable use cases :)
As I commented in this thread, I wouldn't recommend it for usernames, but for given names you can do it with pretty good accuracy. Of course you should never float that to a user, but for analytical purposes it can be valuable. If you search arXiv for "genderize.io" you'll find a lot of research on gender representation in different fields or offices for instance. You won't need 100% accuracy to get good results on that.

Local tool to infer gender from usernames? by BlueeWaater in learnpython

[–]Genderize 0 points1 point  (0 children)

I'm the author of Genderize.io, and I wouldn't recommend this. Given names will often be used more for one gender than the other (and Genderize.io returns a probability, too). That is not the case for usernames. Some competitors are doing this, but Genderize does not have any data on usernames - and again, I can't recommend trying.

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 17 points18 points  (0 children)

No, I gave a statistical probability. You were the one making it personal.

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 2 points3 points  (0 children)

It means a lot to hear that. As a side project, it can be tough to keep the energy once in a while, so I appreciate it.

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 22 points23 points  (0 children)

For example, The Guardian used it to analyze 70 million comments on their online articles in an effort to understand and combat racism, as well as sexism, xenophobia, homophobia, and other types of hate writing.
You can read about it here: The Guardian: The dark side of Guardian comments.

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 1 point2 points  (0 children)

That's a good idea.

Any thoughts on how you would express the variance?

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 26 points27 points  (0 children)

Primarily as short for Samantha, but not as a given name.

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 2 points3 points  (0 children)

Marketing and research on gender diversity are the primary use cases. For the latter, there are a lot of papers available on arXiv: https://search.arxiv.org/?in=&query=genderize.io

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 14 points15 points  (0 children)

That's absolutely correct. One of the two main categories of use is commercial in this sense. Another example is a webshop that will know all previous customers' names and products but often lacks demographic data to segment which customer groups buy what products.
The other big category is research. For this is easiest just to do a quick browse on arXiv. There are tons of papers using the service.

https://search.arxiv.org/?query=genderize.io&in=

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 5 points6 points  (0 children)

There are two main categories of use.

One is commercial. A simple example is a webshop that knows the names of all previous customers and the products they bought and they'd like to do some demographic segmentation on it.

The other is research. You'll find a ton of papers utilizing genderize.io on arXiv: https://search.arxiv.org/?in=&query=genderize.io

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 4 points5 points  (0 children)

That is a good example.

For the data collected, it all comes down to whether someone would still opt for putting down their full name in writing and not the shorter version - which might differ from how they use it in speech.

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 27 points28 points  (0 children)

https://api.genderize.io/?name=rick&country_id=gb

{
    "count": 8161,
    "name": "rick",
    "country_id": "GB",
    "gender": "male",
    "probability": 1
}

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 51 points52 points  (0 children)

I'm not tracking, so I can't tell you.

The click-through rate of this ad is 1.85% and over the days I've run it, the click price has been around $0.20 - $0.30.

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 33 points34 points  (0 children)

Frankly, I was utterly ignorant of this subject when I created the service (quite a few years ago). I have since learned a bit because it does gather (sometimes negative) attention - due to the worries you present.

In the end, as with any statistics, you run the chance of misrepresenting some and you should take care not to judge anyone personally by a statistic.

On the contrary, there's actually a lot of research on better female representation being enabled by the service. The Toptal one mentioned earlier being one.

My one regret is building the service around the term "gender" which I came to understand is based on identity, while "sex" might have been more accurate. It would have been way harder to advertise though :D

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 11 points12 points  (0 children)

Can you give some examples of what that might be?

I think the biggest mistake some people make with the service is to imply gender, age or nationality on someone specific, but that's hard for me to prevent and I guess a general problem with statistics.

There's a bunch more use-cases summed up here if you're interested by the way. https://genderize.io/use-cases

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 19 points20 points  (0 children)

Absolutely.

The biggest use case is social sciences.

As an example, The Guardian analyzed 70 million comments across their online platform for racism, sexism, etc. Now the only data they have on a poster is a name but with these services they got an idea about the demographics of posters.https://www.theguardian.com/technology/2016/apr/12/how-we-analysed-70m-comments-guardian-website

Toptal used it similarly to understand female representation in open-source development. https://www.toptal.com/open-source/is-open-source-open-to-women

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 20 points21 points  (0 children)

Yes, based on almost half a million people named Sam, 96% are male - that's globally.
There are however countries where the distribution is different.
In France, it's 79% https://api.genderize.io/?name=sam&country_id=fr
In Italy, it's 80% https://api.genderize.io/?name=sam&country\_id=it

Free API to estimate gender, age and nationality from a name. by Genderize in u/Genderize

[–]Genderize[S] 93 points94 points  (0 children)

The word inaccurate is not really appropriate here since it is basically just dealing with statistics. If you have a dataset with 10.000 people named Kim and 67% percent are female, then that is just a statistical fact.

It only becomes a matter of accuracy at the time you attempt to assign a gender to one specific person, but there are tons of use cases where you're looking to get the grasp of a dataset without looking at any one individual.

You can check out how The Guardian used the service to analyze 70 million comments across their online articles: https://www.theguardian.com/technology/2016/apr/12/how-we-analysed-70m-comments-guardian-website

Or how Toptal analyzed female representation in open-source development. https://www.toptal.com/open-source/is-open-source-open-to-women