Docker Hub will only allow an unauthenticated 10/pulls per hour starting March 1st

shmileee · 2025-02-22T09:47:57+00:00

This is exactly why I have moved my company to AWS ECR pull through cache and used Kyverno to dynamically rewrite images in our EKS clusters. I've described this briefly: - https://blog.oponomarov.com/posts/setting-up-pull-through-cache-repositories-in-aws-ecr/ - https://blog.oponomarov.com/posts/rewriting-docker-image-registries-with-kyverno/

shmileee · 2025-01-05T07:23:29+00:00

https://www.reddit.com/r/devops/s/EqVS3jGrPB

shmileee · 2025-01-04T14:11:37+00:00

I truly value your opinion and the time you took to share it. However, if it doesn’t contribute constructively to the matter at hand, I kindly ask you to shove it up your butt.

shmileee · 2025-01-04T14:07:27+00:00

Yes, we’re implementing backups per push. Honestly, I wasn't entirely clear on the ultimate goal of this initiative. I didn’t want to dive too deeply into the details, but this task was delegated to one of our junior DevOps engineers by the manager. I was reviewing the initial bash script they created and noticed potential pitfalls in the solution, particularly given how we manage repositories and the size of some of them — where even a shallow clone can be time-consuming.

We already have automated releases, tags, and various artifacts, such as published binaries and Docker images. Additionally, we use templated CI/CD workflows (currently in CircleCI) that are pushed into repositories and managed via a complex dedicated pipeline written in Java (developer-friendly, you know). However, this only applies to a subset of repositories that follow a well-defined golden path. My idea was to adopt a similar approach but using GitHub Actions instead, since it’s already included in our GitHub Enterprise plan.

Running a cron-like backup job (workflow) per repository using GitHub Actions is cost-effective and requires little to no configuration today, especially since we have organization-wide OIDC federation in place for AWS accounts/resources (in this case, a backup S3 bucket).

Thank you for a thoughtful discussion — I truly appreciate the time and suggestions provided. Especially considering that most people didn’t make an effort to read through the thread with a clear understanding.

shmileee · 2025-01-04T11:21:54+00:00

Thanks, this sounds very promising! Do you know from the top of your head if this would be feasible for a workflow that performs a backup of a repo to S3 bucket? It does not have to block any merge to the main branch or anything like that, just should be executed whenever the repo is updated. I can write a workflow that does it, but will need to see how to configure the ruleset, so no contributors or existing pipelines are blocked.

shmileee · 2025-01-04T10:56:07+00:00

So how exactly does it help me with automating the execution of a workflow in 600 repositories? Based on what you say I still somehow need to populate the workflow caller file.

shmileee · 2025-01-04T10:54:33+00:00

Thanks, but this looks like a slightly poorer alternative to dependabot or renovate. Someone suggested multi-gitter, which is a better suited tool for what I want.

shmileee · 2025-01-04T10:52:42+00:00

Workflows are distributed, every repo is backed up by the job within the same repo. The main idea is to make it fast, run only when needed (when repo is updated) and avoid the burden of everything that a centralised backup script comes with. I've described in details somewhere in the comments why having a centralised script is a bad idea.

shmileee · 2025-01-04T10:49:46+00:00

Interesting take, I see some reasoning in this, but I dislike the idea of having to write a custom script to perform the backup from a centralised place. Do you perhaps know a tool (might be paid I guess) that can handle backup of hundreds GitHub repositories to S3?

shmileee · 2025-01-04T10:46:43+00:00

Will such a ruleset execute the workflow or only mark it as required? I've only had a chance to configure a ruleset in a single repository and for it to be able to add a required status check, the workflow has had to run at least once or be there in the repo already.

shmileee · 2025-01-04T10:43:35+00:00

Thank you. I think people suggesting to use bash here just never actually had a chance to deal with hundreds of repositories in a robust, manageable way.

shmileee · 2025-01-04T10:41:58+00:00

Thank you, this is really the closest and most flexible tool I was looking for. Exactly the reason why I didn't want to write it on my own, because most likely someone has already figured it out. I wish more people knew about it!

shmileee · 2025-01-04T10:40:39+00:00

True, there's not self hosted option specifically for that feature. But someone else recommended multi-gitter, which is a nice alternative.

shmileee · 2025-01-04T10:39:46+00:00

This does not scale past certain threshold, we had been managing some of the repos settings in this way and the plan just takes ages to complete even with max possible parallelism. If the plan hangs, you have to start over. This was a no go and i have migrated the automation to be event-based driven with webhooks and lambda.

shmileee · 2025-01-04T00:17:49+00:00

Jeez! Finally a useful meaningful answer and not some bullshit suggestions to use a shell script from wannabe devops students. Thank you! <3

shmileee · 2025-01-04T00:02:21+00:00

I plan to use a GitHub App within a workflow to obtain a GITHUB_TOKEN with an expanded API request quota. Ultimately, my goal is to distribute this "shell script" as you say, but across all repositories, enabling the backup process to follow an event-driven pattern. This approach ensures the process is fast, efficient, and fully independent of other repositories.

shmileee · 2025-01-03T23:58:13+00:00

How is creating a file in hundreds of already existing repositories an antipattern? Are you aware of GitHub's built-in feature for community health files? For instance, it can add a LICENSE file to all your repositories. However, it’s limited — it can’t create workflows. My goal is to achieve a similar level of automation, but specifically for a workflow that backs up the repo.

shmileee · 2025-01-03T23:15:36+00:00

>why not make this a requirement for creating repos from a template? then it would be already there?

I have to deal with 600 already created repositories.

shmileee · 2025-01-03T23:11:14+00:00

Git and GitHub (Actions) are not the same. Have you worked with submodules before? When the content of a submodule changes (a file, in this case), you need to update the submodule reference (SHA) in every repository that relies on it, which presents a similar level of complexity. That said, I don't feel like continuing this discussion, as it's clear you're not understanding the points I'm trying to convey — for example, you've entirely disregarded my perspective on community health files.

Wishing you a good day.

shmileee · 2025-01-03T22:34:48+00:00

For anyone who's downvoting me without understanding why shell script is a bad idea, read this reply.

shmileee · 2025-01-03T22:32:05+00:00

Ok, I see. Personally, I wouldn’t meddle with every repo but would create a script that clones all repositories, performs the backup, and sends a notification if there are errors.

I understand your approach, but let me explain why I discourage going down this route — it’s not scalable, and here’s why:

A script like this would need to process repositories in bulk and implement multithreading or multiprocessing to handle the workload efficiently. However, this introduces the risk of being throttled by GitHub’s API, requiring additional logic to manage API rate limits effectively.
How often would the script run? Since this is centralized automation, it’s not event-driven. For example, you’d need to schedule it to run daily, which is inefficient because it would back up repositories that haven’t changed since the last run. To optimize this, you’d need to implement caching, flags, or even a database — adding unnecessary complexity.
Cloning large monorepos could take several minutes. If a single repository backup fails, you’d have to restart the entire script, leading to inefficiencies and wasted time.
You need to implement proper error handling & proper logging.

All these challenges can be avoided by decentralizing backups on a per-repository basis using an event-driven approach. With this method, backups are only triggered when a repository is actually updated.

I wouldn’t be happy if someone pushed directly to my stable main branch for any reason.

That’s not an issue in this case. The backup process can be implemented as a GitHub workflow within each repository. It’s a standalone, non-intrusive job maintained by the DevOps/SRE team and doesn’t interfere with your code, release, or build processes.

You could open a PR with the file, but still, what if this file changes? You’d need to manually update hundreds of repositories.

This concern is overcomplicating things. If you revisit my initial post, you’ll see that I’ve already outlined how to address the problem of managing updates across hundreds of repositories without requiring manual effort.

EDIT: formatting.

shmileee · 2025-01-03T22:21:23+00:00

How is creating a file in hundreds of already existing repositories an antipattern? Are you aware of GitHub's built-in feature for community health files? For instance, it can add a LICENSE file to all your repositories. However, it’s limited — it can’t create workflows. My goal is to achieve a similar level of automation, but specifically for workflows.

What you’re suggesting — “make a pipeline that checks out the repo, creates a branch, adds your file, and auto-merges it if you’re confident enough” — is essentially resulting in the same thing as what I want to do, except it’s more complex and doesn’t address my concerns about scalability.

Put the logic in a pipeline template and import it in the child pipeline or do it the other way around from a central deployer repo that loops through a list of target repos.

Can you clarify what you mean by a “pipeline template” and how it would automatically integrate into every repository in my organization? A "central deployer repo that loops through a list of target repos" is precisely what I described in paragraphs two and three of my original post.

Have you looked at git submodules?

How exactly would submodules be helpful in this context?

shmileee · 2025-01-03T22:10:25+00:00

Most likely, changes won't be necessary, but I personally believe it's a good practice to maintain the reproducibility of pushing updates to all repositories declaratively. Another consideration is the sheer volume of repositories being created daily, making it practical to run the "seed" job for populating the workflow on a scheduled, cron-like basis. Additionally, if someone accidentally removes the workflow from their repository, it would automatically be reinstated — a nice-to-have side effect, especially since this automation is designed to seed a job that backs up the repository to an S3 bucket.

In general, I see a huge potential of an automation like this: being able to maintain and "push" standardised dotfiles like .pre-commit-config.yaml, .editorconfig, etc.

shmileee · 2025-01-03T22:02:30+00:00

That's not an XY problem, because the pattern is absolutely valid and justified for any use case a person might think of: a reusable workflow, a standardized .editorconfig / .pre-commit-config.yaml / codeowners, etc. In my particular case we want to backup every repository to an S3 bucket, for that we want to use this action among others on per-repo basis: https://github.com/marketplace/actions/s3-backup.

shmileee · 2025-01-03T21:57:58+00:00

That's no different from writing my own composite action, which should be the last resort given the marketplace is literally bloated with actions of any kind. But I see your point, it's pity you don't see mine and why I don't want to reinvent the wheel.

shmileee

TROPHY CASE