Cap coily-update.service memory and disable brew build-from-source (3am cc1plus OOM livelock) #186
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
coily-update.servicerunsbrew update && brew upgrade(all formulae) daily at 03:00 on kai-server. When linuxbrew lacks a prebuilt bottle for a formula it builds from source, spawning a parallelcc1pluscompiler swarm. On 2026-05-30 that produced ~16 concurrentcc1plusprocesses at ~1-1.5GB RSS each (~16-20GB), stacked on the resident workload (EcoServer ~4GB + k3s/SigNoz/ClickHouse/gitea/grafana ~8-10GB). Total committed memory blew past 31GB usable RAM + 2GB swap, swap hit 0, and the host fell into a multi-hour reclaim-thrash livelock that only a power-cycle recovered. Same fingerprint on 2026-05-29.The OOM killer never recovered the box because it only ever reached the small high-
oom_score_adjk8s pods (java, repo-recall, coredns, signoz-otel-col), while the unprotected brew build (oom_score_adj 0) was never targeted.Ask (stop-the-bleeding, host-level fix)
coily-update.service:MemoryMax=6G(a from-source build then gets OOM-killed inside its own cgroup; the host survives untouched).OOMScoreAdjust=1000(make the build the preferred OOM victim, not the k8s pods).HOMEBREW_NO_BUILD_FROM_SOURCE=1(skip formulae lacking a bottle rather than compiling), or at minimumHOMEBREW_MAKE_JOBS=2(cap compiler fan-out so a source build cannot saturate all cores/RAM).Either of (1) or (2) alone prevents the host kill. Both is belt-and-suspenders.
Related
coilysiren-pull-all,coily-update,claude-remote-control-restart) piles up at 3am; consider staggering if source builds are kept.Context
/etc/systemd/system/coily-update.service->infrastructure/scripts/coily-update.sh.coily-update.timer,OnCalendar=*-*-* 03:00:00(daily despite the script comment saying "weekly").Found during the 2026-05-30 crash investigation.