forgejo-runner PVC pinned to kai-server blocks WSL migration - volume node affinity conflict #153
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
Pinning
forgejo-runnerStatefulSet tokai-desktop-tower-wsl(commit0ae7bf3, applied 2026-05-27T03:55:22Z) does not actually move the pod. PVCdata-forgejo-runner-1is bound to a local-path PV pinned to kai-server, and the new pod's nodeSelector requires kai-desktop-tower-wsl. Scheduler deadlock.Evidence
Captured via
coily ops kubectl -n forgejo describe pod forgejo-runner-1(see/tmp/runner-decision.logfor full output):data-forgejo-runner-1is pinned to kai-server, so the pod can't schedule on WSL even though the nodeSelector permits it)kai-desktop-tower-wsl itself is Ready and a 12d-old worker. The blocker is the PV, not the node.
What was done overnight
forgejo-runnerStatefulSet to 0 replicas (Kai-authorized safety move). Bridge churn from the DinD sidecar stops, kai-server host network expected to stabilize.coily exec host-watch host=kai-server) still running, will log whether the outage cadence actually stops post-scale.Options for the real fix
data-forgejo-runner-*is a docker layer cache from DinD. Empty cache means the first workflow run is slow but nothing else breaks. Cleanest. Single-line..runnerreg file) is the only thing that needs to persist, and it's an init-container output that can be recreated. Smallest steady-state surface.Recommend option 3 if the registration init-container is truly idempotent, otherwise option 1.
Out of scope
/tmp/host-watch-kai-server/recovery-*.txtbefore vs after the scale to confirm whether the runner was the only cause.How to apply
Pick an option, write the migration as a Martin-Fowler-style tiny commit on top of
2e7b0b4, scale runner back to >=1, verify pods land on kai-desktop-tower-wsl.