Structural-facts ingest pass: tokei + hyperpolyglot + manifest detection #29
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally filed by @coilysiren on 2026-05-17T20:50:34Z - https://github.com/coilysiren/repo-recall/issues/190
Problem
repo-recall today ingests git log, GitHub state, Claude session JSONL, coily audit, and doc file-health. It does not know what kind of code is in any of the repos it indexes. Every fresh Claude session re-derives "is this Python? where's the entry point? what's the directory shape?" from scratch via Glob/Grep/Read.
That gap matters when the agent is hopping across 30+ cloned repos and wants a snapshot of "what is this repo." The activity ranking ("hot files by churn") was useful when raw but is mostly noise in practice; structural baseline facts would be a strictly more useful first-pass signal.
Proposal
Add a structural-facts pass to repo-recall's ingest tier, file-change-driven. Cheapest possible scope:
ignorefor the gitignore-respecting walk.tokeias a library for LOC and file counts per language, broad bundled language coverage.hyperpolyglotfor per-file language detection where tokei's heuristic isn't sharp enough (vendored detection,.hC-vs-C++ disambiguation).pyproject.toml,Cargo.toml,package.json,go.mod,Gemfile, etc.Output: per-file structural facts joined into the existing schema. Per-repo aggregates derivable in the JOIN layer.
Why this is the cheap structural pass and not tree-sitter / rust-code-analysis
Tree-sitter and code-metrics are a separate, heavier add tracked in a sibling issue. The bet here is that file-extension + manifest + README-first-paragraph is enough structural signal to make "rank files by (sessions-touched * manifest-or-entrypoint-bonus)" a strictly better hot-paths answer than churn alone. If it isn't, the next layer can be added without throwing this work away.
Out of scope
Open sub-questions
recall_repo_structureMCP tool or fold the facts into the existingrecall_repoprojection. Probably the latter, since the structural data is per-file andrecall_repoalready returns per-repo file lists.Origin
Conversation 2026-05-17. Sibling issues: search-router (ripgrep + nucleo), code-metrics (rust-code-analysis), per-source refresh rates.