Mirror ghcr.io images to a backup registry for outage resilience #50
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally filed by @coilysiren on 2026-05-15T04:37:29Z - https://github.com/coilysiren/infrastructure/issues/166
Problem
GitHub being unreliable (API rate-limited or outage) cascades into k3s deploy failures because ghcr.io is the only image registry for
coilysiren-backend,eco-mcp-app, andgalaxy-gennamespaces. The 2026-05-14 incident made this concrete: a PAT rotation broke ghcr.io pulls in three namespaces simultaneously, and the dashboard had no backup pull path.Goal
Every image published to
ghcr.io/coilysiren/*also lands in a second registry. Deploys can fail over to the backup without a code change.Scope
In-tree workloads currently pulling from ghcr.io:
coilysiren-backendnamespace (per backend/deploy/main.yml)eco-mcp-appnamespace (per eco-mcp-app/deploy/main.yml)galaxy-gennamespace (per galaxy-gen/deploy/main.yml)Pull credential is the shared
/github/patin SSM, synced to k8s via external-secrets.Tasks
:latestand:<sha>).imagePullSecretsfor the backup registry alongside the existing ghcr.io one in each namespace.image:fromghcr.io/...to<backup>/...without re-pushing manifests.Acceptance
A simulated ghcr.io outage (e.g.
kubectl edit secret docker-registryto break the auth) does not block a fresh deploy. The backup registry serves the image; a documented one-liner flips manifests over.Out of scope
Refs
Iceboxed in the 2026-05-29 backlog burn-down: mirror ghcr images, superseded by in-cluster registry #168. Reopen anytime if it becomes real.