Flaky tests in tests/mcp_smoke.rs (dashboard_returns_structured_payload, refresh_runs_and_bumps_scan_version) #56

Open
opened 2026-05-23 20:55:28 +00:00 by coilysiren · 0 comments
Owner

Originally filed by @coilysiren on 2026-05-08T17:16:23Z - https://github.com/coilysiren/repo-recall/issues/66

Observed

While running make ci during the LUCA tracer work (luca#27), two tests in tests/mcp_smoke.rs failed on the first run and passed on the second with no code change between them:

  • test dashboard_returns_structured_payload ... FAILED
  • test refresh_runs_and_bumps_scan_version ... FAILED
test result: FAILED. 4 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 21.74s

Re-running immediately:

test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 36.10s

So the suite is non-deterministic on at least these two tests.

Why this matters

mcp_smoke.rs spawns the binary as a child process and talks JSON-RPC over stdio. Each test gets its own cache + state + tantivy directory under $TMPDIR to avoid redb's exclusive file lock collision (per the existing comment in tests/mcp_smoke.rs). Flake suggests the existing isolation is not quite enough. Candidates:

  • The dashboard tool poll that waits for scan_version > 0 before exercising recall_refresh (called out in AGENTS.md) may have a timing window the test does not always satisfy.
  • Two parallel tests may still race on a shared resource (port, env var, file path) that is not yet partitioned.
  • The child-process spawn may inherit state from a sibling test in the same run.

Done when

  • cargo test --test mcp_smoke runs cleanly 10 times in a row on the same machine without intervention.
  • If a real shared-resource race is found, the fix isolates it the way the existing $TMPDIR + nanos + PID + atomic counter pattern handles redb.
  • If the cause is a polling timing window, the test waits on the actual signal it cares about rather than a fixed sleep or a hopeful single check.

Notes

Filed per Kai's standing rule: every flaky-test sighting in coilysiren/* repos becomes a GitHub issue, no exceptions.

_Originally filed by @coilysiren on 2026-05-08T17:16:23Z - [https://github.com/coilysiren/repo-recall/issues/66](https://github.com/coilysiren/repo-recall/issues/66)_ ## Observed While running `make ci` during the LUCA tracer work (luca#27), two tests in `tests/mcp_smoke.rs` failed on the first run and passed on the second with no code change between them: - `test dashboard_returns_structured_payload ... FAILED` - `test refresh_runs_and_bumps_scan_version ... FAILED` ``` test result: FAILED. 4 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 21.74s ``` Re-running immediately: ``` test result: ok. 6 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 36.10s ``` So the suite is non-deterministic on at least these two tests. ## Why this matters `mcp_smoke.rs` spawns the binary as a child process and talks JSON-RPC over stdio. Each test gets its own cache + state + tantivy directory under `$TMPDIR` to avoid redb's exclusive file lock collision (per the existing comment in `tests/mcp_smoke.rs`). Flake suggests the existing isolation is not quite enough. Candidates: - The dashboard tool poll that waits for `scan_version > 0` before exercising `recall_refresh` (called out in AGENTS.md) may have a timing window the test does not always satisfy. - Two parallel tests may still race on a shared resource (port, env var, file path) that is not yet partitioned. - The child-process spawn may inherit state from a sibling test in the same run. ## Done when - `cargo test --test mcp_smoke` runs cleanly 10 times in a row on the same machine without intervention. - If a real shared-resource race is found, the fix isolates it the way the existing `$TMPDIR + nanos + PID + atomic counter` pattern handles redb. - If the cause is a polling timing window, the test waits on the actual signal it cares about rather than a fixed sleep or a hopeful single check. ## Notes Filed per Kai's standing rule: every flaky-test sighting in `coilysiren/*` repos becomes a GitHub issue, no exceptions.
coilysiren added
P2
and removed
P1
labels 2026-05-31 07:01:11 +00:00
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/repo-recall#56
No description provided.