Add VAD silence-commit daemon for Wispr hands-free mode #103

Open
opened 2026-05-29 09:48:38 +00:00 by coilysiren · 0 comments
Owner

Hand-coded systems piece of the voice toolset. Watches the raw mic with silero-vad and fires the Wispr commit chord + Enter after ~2s of silence, closing the one Wispr mode the clipboard-based siblings can't: hands-free toggle (Ctrl+Win+Space), which has no release gesture to arm on.

Siblings (clipboard-paste detection, require manual dictation-end):

  • autohotkey/wispr-auto-enter.ahk (Windows PTT)
  • hammerspoon/init.lua (macOS PTT)

Lands as agentic-os/voice/vad-daemon.py (single-file, MIT, standalone deps). Hardens the design skeleton:

  • model.reset_states() per session (silero-vad is recurrent; state drifts across stop/start without it)
  • inference moved off the PortAudio callback to a worker thread (torch in the audio thread drops frames)
  • exact-512 frame chunking (device blocksize is not guaranteed to hand back exactly 512)
  • UDP protocol extended to start / cancel / go for the cancel + override phrases in the design's next-level section

Windows-only key presses via keybd_event; dry-run logging on other platforms so the VAD pipeline is testable off-Windows. Tuning knobs (--silence-timeout, --vad-threshold, --device, --commit-delay) exposed as CLI flags so iteration needs no file edits.

Blog-post candidate: "a small accessibility tool that makes voice-driven agentic coding feel natural."

Hand-coded systems piece of the voice toolset. Watches the raw mic with silero-vad and fires the Wispr commit chord + Enter after ~2s of silence, closing the one Wispr mode the clipboard-based siblings can't: hands-free toggle (Ctrl+Win+Space), which has no release gesture to arm on. Siblings (clipboard-paste detection, require manual dictation-end): - `autohotkey/wispr-auto-enter.ahk` (Windows PTT) - `hammerspoon/init.lua` (macOS PTT) Lands as `agentic-os/voice/vad-daemon.py` (single-file, MIT, standalone deps). Hardens the design skeleton: - `model.reset_states()` per session (silero-vad is recurrent; state drifts across stop/start without it) - inference moved off the PortAudio callback to a worker thread (torch in the audio thread drops frames) - exact-512 frame chunking (device blocksize is not guaranteed to hand back exactly 512) - UDP protocol extended to `start` / `cancel` / `go` for the cancel + override phrases in the design's next-level section Windows-only key presses via `keybd_event`; dry-run logging on other platforms so the VAD pipeline is testable off-Windows. Tuning knobs (`--silence-timeout`, `--vad-threshold`, `--device`, `--commit-delay`) exposed as CLI flags so iteration needs no file edits. Blog-post candidate: "a small accessibility tool that makes voice-driven agentic coding feel natural."
coilysiren added
P4
and removed
P3
labels 2026-05-31 07:00:04 +00:00
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/agentic-os#103
No description provided.