sshd: add ClientAliveInterval to reap zombie mobile sessions #102

Open
opened 2026-05-24 17:59:25 +00:00 by coilysiren · 0 comments
Owner

Problem

Mobile SSH from Kai's phone (Termux + Tailscale) leaves zombie sessions attached on kai-server when the phone backgrounds Tailscale (Android battery optimization). Counted 7 zombie pty sessions from the same source IP, oldest from ~36h prior, all idle, sshd had no idea they were dead.

Why this matters

The zombies don't break anything immediately but they (a) noise up w / last / session audits, (b) hold pty slots, (c) make it harder to spot a genuinely active mobile session during the next mobile-SSH debug round. Tracking doc for that debug effort: coilysiren/mobile-ssh-debug.md on Kai's workstation (not yet in a repo).

Proposed fix

Add to /etc/ssh/sshd_config.d/ (new drop-in, e.g. 99-mobile-keepalive.conf):

ClientAliveInterval 60
ClientAliveCountMax 3

This reaps dead sessions in ~3 minutes (60s × 3 unanswered probes) without affecting healthy long-running sessions. Mosh sessions are unaffected because they don't traverse sshd after handoff.

Out of scope (root cause is client-side)

  • Android battery optimization killing tailscaled in the background is the actual reconnect-hang cause. Fix is Settings → Apps → Tailscale → Battery → Unrestricted on the phone. This ticket is the server-side hygiene piece only.

Filed by Claude under coilysiren/agentic-os-kai AGENTS "Default TODO Destination" rule.

**Problem** Mobile SSH from Kai's phone (Termux + Tailscale) leaves zombie sessions attached on kai-server when the phone backgrounds Tailscale (Android battery optimization). Counted 7 zombie pty sessions from the same source IP, oldest from ~36h prior, all idle, sshd had no idea they were dead. **Why this matters** The zombies don't break anything immediately but they (a) noise up `w` / `last` / session audits, (b) hold pty slots, (c) make it harder to spot a genuinely active mobile session during the next mobile-SSH debug round. Tracking doc for that debug effort: `coilysiren/mobile-ssh-debug.md` on Kai's workstation (not yet in a repo). **Proposed fix** Add to `/etc/ssh/sshd_config.d/` (new drop-in, e.g. `99-mobile-keepalive.conf`): ``` ClientAliveInterval 60 ClientAliveCountMax 3 ``` This reaps dead sessions in ~3 minutes (60s × 3 unanswered probes) without affecting healthy long-running sessions. Mosh sessions are unaffected because they don't traverse sshd after handoff. **Out of scope (root cause is client-side)** - Android battery optimization killing tailscaled in the background is the actual reconnect-hang cause. Fix is `Settings → Apps → Tailscale → Battery → Unrestricted` on the phone. This ticket is the server-side hygiene piece only. **Filed by Claude** under coilysiren/agentic-os-kai AGENTS "Default TODO Destination" rule.
coilysiren added
P4
and removed
P3
labels 2026-05-31 07:00:42 +00:00
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-flight-deck/infrastructure#102
No description provided.