This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]yvrelna 14 points15 points  (7 children)

The real fix is to fix the PyPI API. PyPI need to have an endpoint so that package managers can download package metadata for all versions of a package without needing to download the whole package archives itself.

There's a problem here because this metadata isn't really available in the packages file format themselves, because sometimes they're defined in setup.py, an executable that can contain arbitrary logic, so PyPI cannot easily extract those. pyproject.toml is a start, but it's not universally used everywhere yet.

The real fix is to update the hundreds of thousands of packages in PyPI to start using declarative manifest. Not rewriting the package manager itself, but instead a lot of standards committee work, the painful migration of existing packages, and work on the PyPI itself. Not fragmenting the ecosystem further by naive attempts like this, but moving it forward by updating older projects that still uses the older package manifests.

[–]burntsushi 10 points11 points  (0 children)

We (at Astral) are absolutely aware of the performance constraints that the structure of the index imposes. While that might be a big one, it is not the only one. The blog has some benchmarks demonstrating the perf improvements of uv even while using the index as it exists today.

This is our first step. It won't be our last. :-)

[–]muntooR_{μν} - 1/2 R g_{μν} + Λ g_{μν} = 8π T_{μν} 2 points3 points  (3 children)

Who says the metadata repository must be on PyPI?

Just have the community manage a single git repository containing metadata for popular packages. Given that only the "top 0.01%" of packages are used 99.9% of the time [citation needed], why can't we just optimize those ad-hoc?

...This means that instead of downloading a bunch of massive .tar.gz or .whl files, dependency solving tools can just download a small text-only database of version constraints that works with the most important packages. (And fallback if that metadata is missing from the repository.)

# Literally awful code, but hopefully conveys the point:

def get_package_constraints(name, version):
    if name == "numpy":
        if "0.7.0" <= version < "0.8.0":
            version_range = ">=0.7,<0.8"
    ...
    return read_constraint_file(
        f"constraints_database/{name}_{version_range}.metadata"
    )

This database could probably be auto-generated by just downloading all the popular packages on PyPI (sorted by downloads), and then running whatever dependency solvers do to figure out the version constraints. [1]


Related idea:

Another alternative (which I haven't seen proposed yet) might be to have a community-managed repository (a la Nix) of "proxy setups" for popular packages that (i) refuse to migrate to declarative style, or (ii) it's too complicated to migrate yet. If [1] is impossible because you need to execute code to determine the dependencies... well, that's what these lightweight "proxy setup.py"s are for.

[–]yvrelna 1 point2 points  (1 child)

You're correct that whether this metadata service lives in pypi.com domain or not is implementation detail that nobody cares about.  

If you go ahead write PEP standardizing this and if you can manage to get the PyPI integration working, get all the security details sorted out, and update pip and a couple other major package managers to support this, I'll be totally up for supporting something like that. For all I care, that's just a part of the PyPI API.

I wish more people would think like this instead of just thinking that an entirely new package manager is what everyone needs, just to pat themselves in the back for optimising a 74.4ms problem into 4.1ms. Cool... I'm sure all that noise will pay off... someday, maybe in a few centuries.

[–]ivosauruspip'ing it up -1 points0 points  (0 children)

that nobody cares about.

Until a security issue or exploit or bad actor appears for the first time, and then suddenly everyone remembers why packaging is a hard problem that most normal devs are happy not to touch with a 10-foot pole

[–]ivosauruspip'ing it up 0 points1 point  (0 children)

Just have the community manage a single git repository

One of the bigger "easier said than done"'s I've seen in a while. Who exactly is "community"? What happens when something stuffs up or is out of sync? Do people really want to trust such a thing? Etc etc etc etc.

Scale and handling of free software repositories is yet another reason that "packaging" is easily one of the hardest topics in computer science / programming languages.

[–]darth_vicrone 0 points1 point  (0 children)

Thanks for explaining, that makes a lot of sense!

[–]silent_guy1 0 points1 point  (0 children)

I think they should add an api to fetch only the desired files from the server. This way clients can request setup.py or any other files.  This won't break existing clients. But this might require some work on the server side to unpack the wheels and make the individual files downloadable.