Loop 2 layer 2: one-shot Python script for substring frequency from session JSONL #11

Open
opened 2026-05-23 20:55:37 +00:00 by coilysiren · 0 comments
Owner

Originally filed by @coilysiren on 2026-05-20T12:01:04Z - https://github.com/coilysiren/voice-flow-learning-loop/issues/6

Context

Layer 2 of the 8-layer build path for Loop 2 documented in README.md. Automates the manual eyeball pass from layer 1.

Goal

A one-shot Python script (stdlib only, no deps) that:

  • Walks Claude Code session JSONL files (~/.claude/projects/*/).
  • Extracts user-turn text only (skip assistant outputs and tool calls).
  • Counts substrings above a length floor (default 30 chars).
  • Sorts by raw count.
  • Outputs a flat candidate list (same shape as layer 1's output).

Substring extraction strategy can be naive at this layer — overlapping n-grams above a length threshold, then dedupe contained substrings. Better strategies land in layer 6.

Done when

  • Script lives at scripts/find-snippet-candidates.py (or similar location in this repo).
  • python scripts/find-snippet-candidates.py produces a sensible candidate list against current session JSONL.
  • README updated to point at the script.
  • Manual layer 1 run reproducible via the script.

Depends on: layer 1 (value hypothesis confirmed). Unblocks: layer 3.

_Originally filed by @coilysiren on 2026-05-20T12:01:04Z - [https://github.com/coilysiren/voice-flow-learning-loop/issues/6](https://github.com/coilysiren/voice-flow-learning-loop/issues/6)_ **Context** Layer 2 of the 8-layer build path for Loop 2 documented in [README.md](../blob/main/README.md). Automates the manual eyeball pass from [layer 1](#). **Goal** A one-shot Python script (stdlib only, no deps) that: - Walks Claude Code session JSONL files (`~/.claude/projects/*/`). - Extracts user-turn text only (skip assistant outputs and tool calls). - Counts substrings above a length floor (default 30 chars). - Sorts by raw count. - Outputs a flat candidate list (same shape as layer 1's output). Substring extraction strategy can be naive at this layer — overlapping n-grams above a length threshold, then dedupe contained substrings. Better strategies land in layer 6. **Done when** - Script lives at `scripts/find-snippet-candidates.py` (or similar location in this repo). - `python scripts/find-snippet-candidates.py` produces a sensible candidate list against current session JSONL. - README updated to point at the script. - Manual layer 1 run reproducible via the script. **Depends on:** layer 1 (value hypothesis confirmed). **Unblocks:** layer 3.
coilysiren added
P2
and removed
P1
labels 2026-05-31 07:01:24 +00:00
Commenting is not possible because the repository is archived.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/voice-flow-learning-loop#11
No description provided.