Per-function code metrics via rust-code-analysis, file-change-driven #30

Open
opened 2026-05-23 20:55:24 +00:00 by coilysiren · 0 comments
Owner

Originally filed by @coilysiren on 2026-05-17T20:50:34Z - https://github.com/coilysiren/repo-recall/issues/189

Problem

repo-recall's activity signal (churn) was useful in raw form but is mostly noise in practice. Lockfiles, CI yaml, and migration sequences dominate the ranking; the actually-load-bearing files don't surface. The session-touched join helps but is itself flat: every Claude-visited file gets equal weight.

The missing signal is structural complexity per file and per function. A 5000-line file with 30 nested branches is structurally different from a 30-line config that changes weekly. Without per-function metrics, "what's complex in this codebase" requires the agent to re-derive on demand.

Proposal

Add rust-code-analysis (Mozilla) as a file-change-driven metrics pass in the repo-recall ingest tier. Compute per-file and per-function: cyclomatic complexity, cognitive complexity, halstead variants, LOC variants. Store in the join store. Query reads the columns like any other.

Placement: in repo-recall, not session-lattice. The metrics are part of the join, not a derived view. They're computed at ingest time on file change, stored, never recomputed on read. session-lattice consumes them via the existing read path; no file-content interface is needed.

Trigger: file change event (notify-driven once the per-source refresh-rate refactor lands, mtime-poll until then). Reparse only the changed file, recompute its metrics, update the row.

Why earn its keep

The strongest queries this unlocks:

  • Function-level complexity ranking independent of churn. "What are the gnarliest functions in this codebase, period."
  • Did Claude touch the gnarly functions? Join with sessions-touched. Probably the most interesting agent-perspective question repo-recall can newly answer.
  • Long-tail complexity outliers per repo as a structural-debt signal that doesn't depend on activity.

This is not a complexity-weighted-churn play. Churn isn't load-bearing enough to weight against. The wins are activity-independent.

Coverage caveat

rust-code-analysis supports a curated language set: Python, Rust, JS/TS, Java, C/C++, Kotlin, and a few more. Check the current list against the real coilysiren/* mix before committing. Languages outside the list get null metrics; the structural-facts pass (sibling issue) still gives them language detection and LOC.

Adding a language to rust-code-analysis is upstream work in their crate, not local. If a frequently-used language is missing, that's a deferrable problem, not a blocker.

Out of scope

  • Symbol cross-reference (stack-graphs was considered and dropped; coverage gaps are too uneven).
  • Rust-specific deep parsing via syn (dropped).
  • Tree-sitter without rust-code-analysis on top (no clear use case).

Open sub-questions

  • Per-function metrics storage shape. Probably a separate table keyed by (file_id, function_name, function_start_line) rather than flattened into the file row.
  • Whether to expose per-function metrics in recall_repo directly or behind a dedicated recall_function_complexity MCP tool.

Origin

Conversation 2026-05-17. Sibling issues: structural-facts pass (cheap baseline), search-router (ripgrep + nucleo), per-source refresh rates.

_Originally filed by @coilysiren on 2026-05-17T20:50:34Z - [https://github.com/coilysiren/repo-recall/issues/189](https://github.com/coilysiren/repo-recall/issues/189)_ **Problem** repo-recall's activity signal (churn) was useful in raw form but is mostly noise in practice. Lockfiles, CI yaml, and migration sequences dominate the ranking; the actually-load-bearing files don't surface. The session-touched join helps but is itself flat: every Claude-visited file gets equal weight. The missing signal is **structural complexity per file and per function**. A 5000-line file with 30 nested branches is structurally different from a 30-line config that changes weekly. Without per-function metrics, "what's complex in this codebase" requires the agent to re-derive on demand. **Proposal** Add **rust-code-analysis** (Mozilla) as a file-change-driven metrics pass in the repo-recall ingest tier. Compute per-file and per-function: cyclomatic complexity, cognitive complexity, halstead variants, LOC variants. Store in the join store. Query reads the columns like any other. Placement: **in repo-recall, not session-lattice**. The metrics are part of the join, not a derived view. They're computed at ingest time on file change, stored, never recomputed on read. session-lattice consumes them via the existing read path; no file-content interface is needed. Trigger: file change event (notify-driven once the per-source refresh-rate refactor lands, mtime-poll until then). Reparse only the changed file, recompute its metrics, update the row. **Why earn its keep** The strongest queries this unlocks: - **Function-level complexity ranking** independent of churn. "What are the gnarliest functions in this codebase, period." - **Did Claude touch the gnarly functions?** Join with sessions-touched. Probably the most interesting agent-perspective question repo-recall can newly answer. - **Long-tail complexity outliers** per repo as a structural-debt signal that doesn't depend on activity. This is **not** a complexity-weighted-churn play. Churn isn't load-bearing enough to weight against. The wins are activity-independent. **Coverage caveat** rust-code-analysis supports a curated language set: Python, Rust, JS/TS, Java, C/C++, Kotlin, and a few more. Check the current list against the real `coilysiren/*` mix before committing. Languages outside the list get null metrics; the structural-facts pass (sibling issue) still gives them language detection and LOC. Adding a language to rust-code-analysis is upstream work in their crate, not local. If a frequently-used language is missing, that's a deferrable problem, not a blocker. **Out of scope** - Symbol cross-reference (stack-graphs was considered and dropped; coverage gaps are too uneven). - Rust-specific deep parsing via syn (dropped). - Tree-sitter without rust-code-analysis on top (no clear use case). **Open sub-questions** - Per-function metrics storage shape. Probably a separate table keyed by `(file_id, function_name, function_start_line)` rather than flattened into the file row. - Whether to expose per-function metrics in `recall_repo` directly or behind a dedicated `recall_function_complexity` MCP tool. **Origin** Conversation 2026-05-17. Sibling issues: structural-facts pass (cheap baseline), search-router (ripgrep + nucleo), per-source refresh rates.
coilysiren added
P4
and removed
P3
labels 2026-05-31 07:01:16 +00:00
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/repo-recall#30
No description provided.