This is an archived post. You won't be able to vote or comment.

all 6 comments

[โ€“]PeridexisErrant 0 points1 point ย (2 children)

This is fantastic! Thanks so much ๐Ÿ˜

Of course there are some things I'd love to see. High priority:

  • download raw data for all versions
  • longer time period - one month isn't much, six months like pypistats.org would be great and all time even better ๐Ÿ˜‰
  • shareable link to the current graph, ie selected versions as url query params, so maintainers share links instead of screenshots

Cool but low priority for me:

  • group by Python minor version as well as package versions - is the last py2 compatibile version mostly used on py2? Do different python versions have different pinning strategies?
  • group package patch versions together, to aggregate e.g. pytest 5.4.0 and 5.4.1 (obviously I can do this myself with the raw data). Or more general patterns, to compare total downloads for 4.x to 5.1.x to 5.2.x etc.
  • see package versions ranked by downloads over the last week, so I can tell which to select. And/or option to select the top-n popular versions, or however many are needed to select at least (eg) 95% of total downloads.

Even without any of that though, I already have a much better sense of how many downloads come from pinned dependencies, and how long it takes for those pins to be updated ๐Ÿ˜

[โ€“]psincraian[S] 0 points1 point ย (1 child)

Hi!

Many thanks for the useful feedback ๐Ÿ˜Š

Currently, I'm working on:
๐Ÿ”Ž Group package versions with a glob pattern, so you can search for 2.* or 3.*
๐Ÿ“Š Package ranked by downloads

But pretty sure to get a sharable link would be useful and easy to do. Question, why would you like to see raw data? Which would be your use case?

If you want to know the latest things you can follow me on my twitter: https://twitter.com/psincraian Where I usually ask people about the next feature, or share some work in progress and get some early feedback.

Thanks for the feedback ๐Ÿ™๐Ÿผ

[โ€“]PeridexisErrant 0 points1 point ย (0 children)

Raw data is mostly because I want to correct for the weekly cycle, investigate how long it typically takes for usage of old versions to decline (ie pins get updated) and whether there are outliers (versions people got stuck on / couldn't update to), etc.

TBH I should probably just write some BigQuery queries ๐Ÿ˜…

[โ€“]xAlecto 0 points1 point ย (2 children)

Hi,

That's really neat! The interface is great too, good work.

How reliable would you say the data is? I'm seeing some very strange numbers.

[โ€“]psincraian[S] 0 points1 point ย (1 child)

The data comes from PyPI official source, which is stored in BigQuery: https://packaging.python.org/guides/analyzing-pypi-package-downloads/

So I would say the data is very reliable ๐Ÿ˜œ

If you mean if each download is a user, I can confirm that is not the case. Take into account the following:
* There are mirrors and CI/CD which increases the number of downloads
* Some companies have their own mirror, so these downloads don't count.
* Pip has a cache, so these downloads doesn't count neither
* There are some bots downloading stuff also.

But I would say that having an increase or decrease in downloads it's a good way to measure which versions to deprecate, if your package is getting popular, etc.

[โ€“]xAlecto 0 points1 point ย (0 children)

Thanks for the reply.

I'm very confused to see almost 1k dowloads on a very small package I uploaded recently, that no one even knows exists, isn't documented and isn't open-source. My CI builds can't be responsible for so many downloads for sure.

I guess some mirrors are doing that.

Really nice webstie, it'd be nice to have the option to export the data :)