Finish operator-to-static-keys migration - eco-mcp, eco-spec, galaxy-gen, backend #41
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally filed by @coilysiren on 2026-05-20T08:22:21Z - https://github.com/coilysiren/infrastructure/issues/210
Multi-session tracker for finishing the operator-to-static-keys migration started this session.
Current state
Live on new pattern (operator-free):
repo-recall- in-pod sidecar (canary). #201, #202 (nodeSelector), #203 (port 80).http://repo-recallreturns 200.vmsingle- standalone ts-proxy Deployment (helm-managed workload pattern). #204.http://vmsingleandhttp://vmsingle:8428/healthreturn 200.forgejo- standalone ts-proxy Deployment (rootless-workload pattern). #209. Joined asforgejo-1because the oldforgejodevice record still holds the name in the Tailscale admin console; manual delete + ts-forgejo pod restart will reclaim. Oldforgejodevice is offline.Manifest changes pushed, blocked from landing:
galaxy-gen(coilysiren/galaxy-gen#72). CI'sbuild-and-publish.ymluseskubectl set imageonly, neverapply -f deploy/main.yml, so the sidecar manifest never reaches the cluster. Workflow needs to add an envsubst+apply step, OR we ship the apply manually.eco-mcp(coilysiren/eco-mcp-app#60). CI uses the legacyTS_OAUTH_CLIENT_ID/SECRETpattern. That credential is broken (related to but distinct from #198'soperator-oauth). Workflow also readsconfig.ymlwhich doesn't exist locally or on kai-server. CI has been failing for ~7 days; the workload pod is 7 days old.eco-spec(coilysiren/eco-jobs-tracker#42). Same as eco-mcp.backend(coilysiren/backend#64). Same as eco-mcp. Plus pre-existingImagePullBackOffunrelated to either thread.The terraform side is fully landed: all six SSM auth keys exist at
/coilysiren/{repo-recall,vmsingle,eco-mcp,eco-spec,galaxy-gen,backend,forgejo}/ts-authkey.terraform-tailscale-devicesis the source of truth.What unblocks finishing
The four blocked CIs share a root cause with #198: a Tailscale OAuth credential that was minted long ago and silently aged out. galaxy-gen is the modern shape (federated identity via
tailscale-oidc/, no long-lived bearer). The three eco-/backend CIs need the same migration. config.yml needs to be created or its yq lookup needs a default.Two paths to finish:
eco-mcp-app,eco-jobs-tracker,backendto the OIDC pattern intailscale-oidc/. Add the missing envsubst+apply step togalaxy-gen's workflow. Then a no-op push reruns each CI and the sidecar lands.Recommend path 1.
Once all six are migrated
curl -s -o /dev/null -w '%{http_code}\n' http://<name>/healthzfor all six.tailscalenamespace are gone (operator reconciles them away when each Service dropstailscale.com/expose).helm uninstall tailscale-operator -n tailscale, deleteoperator-oauthSecret, revoke the OAuth client in the Tailscale admin console.forgejodevice record in the Tailscale admin console; restartts-forgejoDeployment so it claims the freed name.docs/tailscale-static-devices.md(in-pod sidecar vs standalone proxy decision rule, RBAC shape, port-80 conventions, rotation runbook). Tracked as the doc task in the original session.Related issues