Migrate deploy off GitHub Actions to Forgejo + in-cluster registry #25

Closed
opened 2026-05-28 12:35:35 +00:00 by coilysiren · 0 comments
Owner

Context

GitHub no longer joins the tailnet (all TS_* secrets removed from this repo + 4 others). The in-cluster registry bridge is live and verified:

  • Registry: 192.168.0.194:30500 (NodePort, namespace registry, pinned to kai-server, plain http).
  • kai-server containerd has the registries.yaml insecure entry; a probe image round-tripped over plain-http (coilysiren/infrastructure#168, #171).
  • forgejo-runner DinD carries --insecure-registry=192.168.0.194:30500.
  • This repo is mirrored to Forgejo (origin pushes to both GitHub and Forgejo). Forgejo Actions runs on the in-cluster runner (kai-desktop-tower-wsl node, DinD sidecar, DOCKER_HOST=tcp://localhost:2375).

Goal

Replace .github/workflows/build-and-publish.yml (dead - it joined the tailnet via OIDC, which is gone) with a Forgejo Actions workflow that builds, pushes to the in-cluster registry, and rolls the deployment. No tailnet join, no GHCR, no SSH.

Work (one PR)

  1. Access control - deployer SA + kubeconfig

    • In deploy/ add a deployer ServiceAccount in coilysiren-backend, a Role (verbs: get/patch on deployments, get on deployments/status, get/list on pods), a RoleBinding, and a long-lived kubernetes.io/service-account-token Secret for the SA.
    • Build a kubeconfig: server https://192.168.0.194:6443 (LAN IP is in the k3s API cert SANs), CA + token from the SA Secret. Store it base64 as a Forgejo Actions secret DEPLOY_KUBECONFIG on this repo (Forgejo UI/API). Keep the token out of git; note where it lives.
  2. .forgejo/workflows/build-publish-deploy.yml

    • on: push: branches: [main].
    • job test: uv sync --frozen + uv run pytest.
    • job deploy (needs test):
      • build, tag 192.168.0.194:30500/coilysiren-backend:${{ github.sha }}, push (the runner's DinD is the docker host).
      • write $DEPLOY_KUBECONFIG to a file, kubectl set image deploy/coilysiren-backend-app coilysiren-backend=<ref> -n coilysiren-backend, kubectl rollout status ... --timeout=5m.
      • preserve the report status to datastore step (if: always(), POST to http://api/document, namespace ci-status).
  3. deploy/main.yml - switch the app image to the registry ref scheme; imagePullPolicy: Always. Update the stale "No GHCR pull-secret / sideloaded into containerd" comments to describe the registry pull.

  4. Makefile - point image-url at 192.168.0.194:30500/.... The .deploy target stays (structural manifest applies).

  5. Remove .github/workflows/build-and-publish.yml.

  6. Verify - push, watch the Forgejo run, confirm a new pod rolls with the 192.168.0.194:30500/... image.

Constraints / gotchas

  • App pod is pinned to kai-server via nodeSelector; registries.yaml is on kai-server only (#171), so pulls land there - fine. Do not unpin without replicating registries.yaml.
  • The deploy job runs in DinD on the WSL node; it reaches the registry and the API (192.168.0.194:6443) over the LAN.
  • Do NOT reintroduce any TS_* / OIDC tailnet join, and do NOT use GHCR. The Forgejo DEPLOY_KUBECONFIG secret is the only stored credential.
  • coilysiren/infrastructure#168 (registry migration umbrella), #170 (k3s notify bug), #171 (per-node registries.yaml).
  • Registry foundation commit: infrastructure e97885a.
## Context GitHub no longer joins the tailnet (all `TS_*` secrets removed from this repo + 4 others). The in-cluster registry bridge is **live and verified**: - Registry: `192.168.0.194:30500` (NodePort, namespace `registry`, pinned to kai-server, plain http). - kai-server containerd has the `registries.yaml` insecure entry; a probe image round-tripped over plain-http (coilysiren/infrastructure#168, #171). - `forgejo-runner` DinD carries `--insecure-registry=192.168.0.194:30500`. - This repo is mirrored to Forgejo (`origin` pushes to both GitHub and Forgejo). Forgejo Actions runs on the in-cluster runner (kai-desktop-tower-wsl node, DinD sidecar, `DOCKER_HOST=tcp://localhost:2375`). ## Goal Replace `.github/workflows/build-and-publish.yml` (dead - it joined the tailnet via OIDC, which is gone) with a Forgejo Actions workflow that builds, pushes to the in-cluster registry, and rolls the deployment. No tailnet join, no GHCR, no SSH. ## Work (one PR) 1. **Access control - deployer SA + kubeconfig** - In `deploy/` add a `deployer` ServiceAccount in `coilysiren-backend`, a Role (verbs: get/patch on `deployments`, get on `deployments/status`, get/list on `pods`), a RoleBinding, and a long-lived `kubernetes.io/service-account-token` Secret for the SA. - Build a kubeconfig: server `https://192.168.0.194:6443` (LAN IP is in the k3s API cert SANs), CA + token from the SA Secret. Store it base64 as a Forgejo Actions secret `DEPLOY_KUBECONFIG` on this repo (Forgejo UI/API). Keep the token out of git; note where it lives. 2. **`.forgejo/workflows/build-publish-deploy.yml`** - `on: push: branches: [main]`. - job `test`: `uv sync --frozen` + `uv run pytest`. - job `deploy` (needs test): - build, tag `192.168.0.194:30500/coilysiren-backend:${{ github.sha }}`, push (the runner's DinD is the docker host). - write `$DEPLOY_KUBECONFIG` to a file, `kubectl set image deploy/coilysiren-backend-app coilysiren-backend=<ref> -n coilysiren-backend`, `kubectl rollout status ... --timeout=5m`. - preserve the `report status to datastore` step (`if: always()`, POST to `http://api/document`, namespace `ci-status`). 3. **`deploy/main.yml`** - switch the app image to the registry ref scheme; `imagePullPolicy: Always`. Update the stale "No GHCR pull-secret / sideloaded into containerd" comments to describe the registry pull. 4. **`Makefile`** - point `image-url` at `192.168.0.194:30500/...`. The `.deploy` target stays (structural manifest applies). 5. **Remove** `.github/workflows/build-and-publish.yml`. 6. **Verify** - push, watch the Forgejo run, confirm a new pod rolls with the `192.168.0.194:30500/...` image. ## Constraints / gotchas - App pod is pinned to kai-server via nodeSelector; `registries.yaml` is on kai-server only (#171), so pulls land there - fine. Do not unpin without replicating registries.yaml. - The deploy job runs in DinD on the WSL node; it reaches the registry and the API (`192.168.0.194:6443`) over the LAN. - Do NOT reintroduce any `TS_*` / OIDC tailnet join, and do NOT use GHCR. The Forgejo `DEPLOY_KUBECONFIG` secret is the only stored credential. ## Links - coilysiren/infrastructure#168 (registry migration umbrella), #170 (k3s notify bug), #171 (per-node registries.yaml). - Registry foundation commit: infrastructure `e97885a`.
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/backend#25
No description provided.