Rewrite GitHub ingest from gh subprocess to octocrab #34

Open
opened 2026-05-23 20:55:25 +00:00 by coilysiren · 0 comments
Owner

Originally filed by @coilysiren on 2026-05-15T07:19:45Z - https://github.com/coilysiren/repo-recall/issues/173

Problem

The GitHub ingest layer shells out to gh for every API call, which makes error categorization (#164, #165), rate-limit backoff (#164), and cache-failure handling (#166) painful and stringly-typed. Stderr-parsing the gh exit path also obscures HTTP status codes and rate-limit headers that octocrab exposes natively.

While scoping this work I also found that AGENTS.md's "never gh {issue,pr,repo,search} list" rule (because they route through GraphQL under the hood) is being violated in three live sites - same class as the closed #124 and #129. Folding the cleanup into this rewrite since the call sites all move to octocrab anyway.

Where

gh subprocess sites that move to octocrab REST:

gh subprocess sites that move to octocrab GraphQL:

gh subprocess sites that stay (not API calls):

Auth

Read gh auth token once at startup, pass to Octocrab::builder().personal_token(...).build(). Empty token → build anonymous client + emit a startup warning + surface the new RemoteFetchState::Unconfigured variant on every remote column. Matches today's "best-effort, don't break the dashboard" stance.

Closes

  • #164 (back off on 403 / Retry-After / secondary-limit) - octocrab gives typed status + x-ratelimit-* headers, so a small RateLimitGate keyed on the shared client makes this trivial.
  • #165 (categorize remote-fetch failures) - the existing RemoteFetchState::{Ok, Missing, Unauthorized, RateLimited, Error} enum at src/ingest/github/fetch_state.rs gets fed by typed octocrab::Error instead of stderr-string-parsing. Adds the missing pill renderer in src/display/routes/templates.rs.

Related

  • #166 (persist last-good remote-state) - cache-shape change, independent of transport. Out of scope here, mentioned for reviewer context.
  • #146 (per-source refresh cadence) - independent.
  • #170 (refresh.per_source dead config) - was closed without a stated reason; if it should be reopened, fold the gate logic into this PR's RateLimitGate scaffolding.

Plan

Six commits, one per discrete step, each closing its own follow-up issue under this umbrella:

  1. Add octocrab + shared Octocrab client in AppState (sources gh auth token, surfaces empty-token warning + Unconfigured variant).
  2. Port issues.rs, classifier upgrade closes #165.
  3. Port pulls.rs.
  4. Port ci_runs.rs.
  5. Port git/log.rs GitHub call sites (closes the AGENTS.md GraphQL-leak finding above; collapses fetch_pr_and_issue_counts into the dedicated modules).
  6. Port labeled.rs to octocrab.graphql() + add RateLimitGate, closes #164.

Test fixtures

Already collected. Live under tests/fixtures/github/ (rest captures, error synthetics, GraphQL synthetics) with re-runnable capture.sh. See tests/fixtures/README.md.

_Originally filed by @coilysiren on 2026-05-15T07:19:45Z - [https://github.com/coilysiren/repo-recall/issues/173](https://github.com/coilysiren/repo-recall/issues/173)_ **Problem** The GitHub ingest layer shells out to `gh` for every API call, which makes error categorization (#164, #165), rate-limit backoff (#164), and cache-failure handling (#166) painful and stringly-typed. Stderr-parsing the `gh` exit path also obscures HTTP status codes and rate-limit headers that octocrab exposes natively. While scoping this work I also found that `AGENTS.md`'s "never `gh {issue,pr,repo,search} list`" rule (because they route through GraphQL under the hood) is being violated in three live sites - same class as the closed #124 and #129. Folding the cleanup into this rewrite since the call sites all move to octocrab anyway. **Where** `gh` subprocess sites that move to octocrab REST: - [src/ingest/github/issues.rs](src/ingest/github/issues.rs) - `/repos/X/issues?state=open` - [src/ingest/github/pulls.rs](src/ingest/github/pulls.rs) - `/repos/X/pulls` - [src/ingest/github/ci_runs.rs](src/ingest/github/ci_runs.rs) - `/repos/X/actions/runs` - [src/ingest/git/log.rs:585](src/ingest/git/log.rs#L585) `ci_status` - currently `gh run list` (GraphQL) → `/repos/X/actions/runs?branch=Y&per_page=1` - [src/ingest/git/log.rs:680](src/ingest/git/log.rs#L680) `fetch_deploy_health` - currently `gh run list` (GraphQL) → `/repos/X/actions/workflows/W/runs` - [src/ingest/git/log.rs:782](src/ingest/git/log.rs#L782) `fetch_pr_and_issue_counts` - duplicates pulls.rs + issues.rs, collapses into them - [src/ingest/git/log.rs:881](src/ingest/git/log.rs#L881) `my_gh_login` - `/user` - [src/ingest/git/log.rs:933](src/ingest/git/log.rs#L933) `fetch_active_repos` - currently `gh repo list` (GraphQL) → `/user/repos` - [src/ingest/git/log.rs:564](src/ingest/git/log.rs#L564) `gh_health` - replaced by startup auth probe (see below) `gh` subprocess sites that move to octocrab GraphQL: - [src/ingest/github/labeled.rs](src/ingest/github/labeled.rs) - `octocrab.graphql(...)`, sanctioned per `AGENTS.md` `gh` subprocess sites that stay (not API calls): - [src/display/routes/actions.rs:96](src/display/routes/actions.rs#L96) - `gh repo clone` is git-over-HTTPS. Replacement is a separate decision. **Auth** Read `gh auth token` once at startup, pass to `Octocrab::builder().personal_token(...).build()`. Empty token → build anonymous client + emit a startup warning + surface the new `RemoteFetchState::Unconfigured` variant on every remote column. Matches today's "best-effort, don't break the dashboard" stance. **Closes** - #164 (back off on 403 / Retry-After / secondary-limit) - octocrab gives typed status + `x-ratelimit-*` headers, so a small `RateLimitGate` keyed on the shared client makes this trivial. - #165 (categorize remote-fetch failures) - the existing `RemoteFetchState::{Ok, Missing, Unauthorized, RateLimited, Error}` enum at [src/ingest/github/fetch_state.rs](src/ingest/github/fetch_state.rs) gets fed by typed `octocrab::Error` instead of stderr-string-parsing. Adds the missing pill renderer in [src/display/routes/templates.rs](src/display/routes/templates.rs). **Related** - #166 (persist last-good remote-state) - cache-shape change, independent of transport. Out of scope here, mentioned for reviewer context. - #146 (per-source refresh cadence) - independent. - #170 (refresh.per_source dead config) - was closed without a stated reason; if it should be reopened, fold the gate logic into this PR's RateLimitGate scaffolding. **Plan** Six commits, one per discrete step, each closing its own follow-up issue under this umbrella: 1. Add `octocrab` + shared `Octocrab` client in `AppState` (sources `gh auth token`, surfaces empty-token warning + `Unconfigured` variant). 2. Port `issues.rs`, classifier upgrade closes #165. 3. Port `pulls.rs`. 4. Port `ci_runs.rs`. 5. Port `git/log.rs` GitHub call sites (closes the AGENTS.md GraphQL-leak finding above; collapses `fetch_pr_and_issue_counts` into the dedicated modules). 6. Port `labeled.rs` to `octocrab.graphql()` + add `RateLimitGate`, closes #164. **Test fixtures** Already collected. Live under `tests/fixtures/github/` (rest captures, error synthetics, GraphQL synthetics) with re-runnable `capture.sh`. See `tests/fixtures/README.md`.
coilysiren added
P3
and removed
P2
labels 2026-05-31 07:01:14 +00:00
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/repo-recall#34
No description provided.