scripts/host-watch.sh: generic tailnet-host SSH watchdog with diag capture on recovery #152
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
scripts/host-diag.shexists ad-hoc in this session's/tmp/but isn't installed in the repo, isn't reachable viacoily exec, and is hard-coded to one operator's Mac. The watch loop that fires it on dead->alive recovery has the same problem. Promote both into the repo as a single generic verb.Scope
scripts/host-diag.sh- run on the remote host viassh -- bash -s <. Captures listening sockets, sshd state, journals (ssh / k3s / tailscaled), dmesg, conntrack stats, iptables filter + nat, nft ruleset, ufw, fail2ban, interfaces, top RSS, auth.log.scripts/host-watch.sh- pollscoily ssh <alias> -- echo aliveeveryPOLL_INTERVALseconds (default 15). On dead->alive transition, streamshost-diag.shinto the remote and writes the output locally asrecovery-<ts>.txt. State log + snapshots underOUT_DIR(default/tmp/host-watch-<alias>).Makefiletargethost-watchwithhost=<alias>arg..coily/coily.yamlverbhost-watchdelegating to the Make target.Use
Runs interactively until killed (Ctrl-C or
pkill -f host-watch.sh). Originally written to catch the kai-server host-namespace outage during coilysiren/infrastructure#151. Generic enough to point at any tailnet host where coily ssh works.Out of scope