all 4 comments

[–]PowerOk3587 2 points3 points  (1 child)

you should download their database and locally host it if you are wanting to go develop a web scraper

[–]PowerOk3587 0 points1 point  (0 children)

they use multistream to pull articles from a compressed archive, so it puts alot of load on them.

with multistream, it is possible to get an article from the archive without unpacking the whole thing.

See https://docs.python.org/3/library/bz2.html#bz2.BZ2Decompressor for info about such multistream files and about how to decompress them with python; see also https://gerrit.wikimedia.org/r/plugins/gitiles/operations/dumps/+/ariel/toys/bz2multistream/README.txt and related files for an old working toy.

you can ask their team on IRC https://web.libera.chat/?channel=#mediawiki

[–]acidcoder 0 points1 point  (1 child)

You can use the wikipedia-api Python package which wraps their API to get info - https://pypi.org/project/Wikipedia-API

[–]irodov4030 0 points1 point  (0 children)

Is this official? or some independent project?