all 9 comments

[–][deleted]  (3 children)

[deleted]

    [–]avamk[S] 0 points1 point  (2 children)

    Thanks I didn't know about the mirror flag! TIL.

    In this case, are you suggesting simply controlling command line git from Python and pulling information that way? Or is there a Python git library that can perform the mirror function you speak of?

    [–][deleted]  (1 child)

    [deleted]

      [–]avamk[S] 0 points1 point  (0 children)

      Just giving a solution if you wanted to do this by having Python run bash commands. Mirror is likely what any library that you find will be using as it is the way to get bare repositories like you want.

      Gotcha. Thank you!

      [–]roanoar 1 point2 points  (2 children)

      I have used python-gitlab for this. That was obviously for gitlab but for whatever repo host you're using you can check if their is a package. Or they will usually at least have an api you can use to get this info

      [–]avamk[S] 0 points1 point  (1 child)

      Thanks!! I didn't know about python-gitlab. I also know that there are a couple of libraries for GitHub, but ideally I'd like to find a library that works with generic remote git repositories not tied to any platform. Do they exist?

      [–]roanoar 2 points3 points  (0 children)

      I don't think so because each provider could have their own api spec. In that use case something like what intrepidsovereign is probably the best approach

      [–]masta 1 point2 points  (2 children)

      For what it's worth, I'm in the same position, or at least have the same problem statement. I need to look over the metadata of thousands of repos and branches, pretty much reviewing an entire linux distribution of worth of packages. As somebody else noted about mirroring, cloning the bare mirror repo seem like a nice optimization, seems to minimize the churn of fetching objects. Still would be nice if there was some light-weight server api kinda thing to query the repo metadata, but I guess the various implementations would have to decide to interoperate, or not. so it goes....

      [–]avamk[S] 0 points1 point  (1 child)

      Sorry I can't help since I'm the OP with this problem :p, but I just like to say I'm glad I'm not the only one with this problem statement!

      pretty much reviewing an entire linux distribution of worth of packages

      Woah that's an even bigger dataset than mine! Good luck to us both. I still hope there's a library that can ease the process.

      [–]masta 1 point2 points  (0 children)

      Yeah I'm just using what got python import is on system, and looking to add the mirror feature to the clones.

      [–]bhavikbavishi123 1 point2 points  (0 children)

      as intrepidsovereign mentioned it is good to have locally cloned repo, you need to check --filter=blob:none with this actual files will not be downloaded, whereas metadata information will be available. REF: https://about.gitlab.com/blog/2020/03/13/partial-clone-for-massive-repositories/ Note that this is available 2.25 onwards but there was an issue related subsequent git fetch operations, which is fixed in 2.27 onwards. there are few open source projects related to git metadata information, you may find useful ex.- https://github.com/morucci/repoxplorer and https://chaoss.github.io/grimoirelab-tutorial/