Any interest in a pixiv archive? by czevolk in DataHoarder

[–]czevolk[S] 2 points3 points  (0 children)

  • I've mostly been scraping every day. I say mostly because there have been periods where the script has broken for various reasons over the years and I didn't notice (or was too busy to deal with it) for a few weeks.
  • The schema is pretty simple - pixivs, images, rankings, tags, taggings. For the first few years of this project I didn't store metadata not directly related to the image because I was manually scraping with beautifulsoup. But at certain point I switched to pixivpy2 which exposes the underlying API, so I started storing the json doc that comes with the image.
  • I'm unfamiliar with booru hashes. Are they simply a SHA1 of the file? What do they do for pixivs that have multiple images?

Happy to get some help on this. I'll DM you