fix: stop build-publish-deploy red on report-status (#81) #15

Closed
opened 2026-05-26 03:44:41 +00:00 by coilysiren · 1 comment
Owner

Scope

Implementation ticket for the bug reported in coilysiren/backend#81. Every push since 2026-05-21 leaves the run red on the final report status to datastore step, even though the deploy and rollout succeed. Latest red: run 26430605589 on 5c4cc59. Goal here is to stop the red without changing the success criteria for the deploy itself.

Pick one

  • Resolve via tailnet IP, not MagicDNS - replace http://api/document with the api node's tailnet IP, fetched from SSM (/coilysiren/api/tailnet-ip or equivalent, stash if not there yet per AGENTS.md "cache ids on first lookup"). Removes the MagicDNS dependency entirely.
  • Retry the resolve - wrap the curl in a short retry loop (for i in 1 2 3 4 5; do curl -fsS ... && break || sleep 5; done). Cheapest fix; survives MagicDNS slow-start without touching topology.
  • Make the step non-blocking - continue-on-error: true on the step. Surfaces the failure in the run summary but keeps the run green. Acceptable if the datastore record is best-effort, not the canonical deploy signal.

Fix idea 1 is the most durable, fix idea 2 is the smallest diff, fix idea 3 is the laziest. Recommendation: try retry first, fall back to tailnet-IP if the resolve never lands within the retry budget.

Acceptance

  • Next push to main lands a fully green build-publish-deploy run.
  • ci-status document still gets posted for successful deploys (unless option 3 is chosen with explicit ack that the record is best-effort).

Out of scope

  • The tailscale-operator OAuth regression referenced in infrastructure#238. That is a sibling concern, not blocking this fix.
  • Retiring the report-status step entirely. If it stays, it should be reliable.

Resolves coilysiren/backend#81.

## Scope Implementation ticket for the bug reported in coilysiren/backend#81. Every push since 2026-05-21 leaves the run red on the final `report status to datastore` step, even though the deploy and rollout succeed. Latest red: run 26430605589 on 5c4cc59. Goal here is to stop the red without changing the success criteria for the deploy itself. ## Pick one - **Resolve via tailnet IP, not MagicDNS** - replace `http://api/document` with the api node's tailnet IP, fetched from SSM (`/coilysiren/api/tailnet-ip` or equivalent, stash if not there yet per AGENTS.md "cache ids on first lookup"). Removes the MagicDNS dependency entirely. - **Retry the resolve** - wrap the `curl` in a short retry loop (`for i in 1 2 3 4 5; do curl -fsS ... && break || sleep 5; done`). Cheapest fix; survives MagicDNS slow-start without touching topology. - **Make the step non-blocking** - `continue-on-error: true` on the step. Surfaces the failure in the run summary but keeps the run green. Acceptable if the datastore record is best-effort, not the canonical deploy signal. Fix idea 1 is the most durable, fix idea 2 is the smallest diff, fix idea 3 is the laziest. Recommendation: try retry first, fall back to tailnet-IP if the resolve never lands within the retry budget. ## Acceptance - Next push to main lands a fully green build-publish-deploy run. - ci-status document still gets posted for successful deploys (unless option 3 is chosen with explicit ack that the record is best-effort). ## Out of scope - The tailscale-operator OAuth regression referenced in infrastructure#238. That is a sibling concern, not blocking this fix. - Retiring the report-status step entirely. If it stays, it should be reliable. Resolves coilysiren/backend#81.
Author
Owner

Merged into #3 in the 2026-05-29 backlog burn-down. Impl ticket for same report-status bug as #3 Reopen if it should stand alone.

Merged into #3 in the 2026-05-29 backlog burn-down. Impl ticket for same report-status bug as #3 Reopen if it should stand alone.
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/backend#15
No description provided.