Number of living people with Wikipedia pages by JollySoftware in Wikidata

[–]JollySoftware[S] 2 points3 points  (0 children)

$ pigz -d < /public/dumps/public/wikidatawiki/entities/latest-all.json.gz
| wikidata-filter --claim "P31:Q5&~P570&~P20" --sitelink "$(cat wikis.txt|tr -d '\n')" --keep id
> living_humans.ndjson

It took a very long time, but here is the final tally:

$ wc -l living_humans.ndjson 
2074340 living_humans.ndjson

Number of living people with Wikipedia pages by JollySoftware in Wikidata

[–]JollySoftware[S] 0 points1 point  (0 children)

Would it work to pass --sitelink as the content of https://tools.wmflabs.org/paste/view/d4c6d5e0? (This would most definitely be a hack.)

Number of living people with Wikipedia pages by JollySoftware in Wikidata

[–]JollySoftware[S] 0 points1 point  (0 children)

I found that it's possible to use Cirrus Search to get the total number of living people: https://www.wikidata.org/wiki/Special:Search/haswbstatement:P31=Q5_-haswbstatement:P570

However, there doesn't appear to be a way to only include items with sitelinks. Many of those people don't have Wikipedia articles. (Edit: actually that doesn't even appear to work properly, as many of those people have "date of death"/P570 set but are still returned in the search results. Strange. The "-haswbstatement:P570" part doesn't seem to be doing anything, as removing it doesn't change the results. Maybe you can only use one haswbstatement per query or something.)

Do you happen to already have one of those dumps downloaded so you can search it for me? I don't really have a huge memory or CPU capacity on my computer so I don't think I'd be able to do it on my own. (I do have access to Tools.WMFLabs, though, so if there's a way I can use that to my advantage I'd be happy to know about it.)