Force package managers through local proxies with default-deny egress #64
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally filed by @coilysiren on 2026-04-29T08:54:26Z - https://github.com/coilysiren/coily/issues/35
Problem
Package managers (brew, npm, pip, cargo, gem, etc.) currently reach the internet directly. That's a wide unsupervised egress surface during agent runs. Any
npm install,pip install,cargo install, orbrew installcan pull from arbitrary registries, mirrors, or postinstall scripts with no per-host gating. Lockdown's argv validation does not see what those subprocesses fetch.The non-pkgmgr passthroughs (
coily aws,coily gh,coily kubectl,coily docker,coily tailscale) have a softer version of the same gap: their argv is audited, the hosts they hit are not. A platform-engineer egress story is "which command went where," not "which command ran."Proposal
One in-process Go HTTP CONNECT proxy that coily starts on
127.0.0.1:0for the duration of a wrapped invocation. The child subprocess inheritsHTTPS_PROXY/HTTP_PROXYpointing at the proxy. The proxy logs every CONNECT, joins the rows back to the parent invocation's audit-row id, and (per-binary) either enforces a default-deny allowlist or observes silently.Two enforcement modes:
decision=deny, and the underlying tool sees a connection failure.aws,gh,kubectl,docker,tailscale. No allowlist. Every CONNECT is forwarded and logged. This is the #33 story.Allowlists are pinned in code, not user-configurable for v0.1. Defaults follow the original #35 sketch:
formulae.brew.sh,ghcr.io,objects.githubusercontent.com,github.com,raw.githubusercontent.comregistry.npmjs.orgpypi.org,files.pythonhosted.orgcrates.io,static.crates.io,index.crates.io,github.comrubygems.org,index.rubygems.orgCONNECT-only. No TLS interception, no CA install on either Mac or Windows hosts. Hostnames come from the SNI /
Host:line in the CONNECT verb.Architecture
Audit-row shape
Extend
audit.Recordwith one optional field:One row per host contacted per parent invocation. Aggregation (one record per
(parent_id, host)pair, summing bytes / durations) happens in the proxy before flushing to the parent record so acoily npm installrun that opens 200 connections toregistry.npmjs.orgdoes not produce 200 audit rows.Phases
Phase 1: tracer bullet (this issue, batch 1)
Stops with a pause so Kai can review before expanding to the other 11 pkgmgrs.
Scope:
pkg/egress/new package:proxy.go- HTTP CONNECT proxy.New(allowlist []string, mode Mode)returns a*Proxy.Start(ctx)listens on127.0.0.1:0and returns the proxy URL.Stop()returns the collected[]EgressRow. Hostname match: exact match against the allowlist plus suffix match for any allowlist entry beginning with*.(e.g.*.amazonaws.com). Tracer-bullet allowlist for brew is exact-match only; suffix matching can land in Phase 2.allowlist.go- per-binary allowlist as a Go map.var Allowlists = map[string][]string{ "brew": {...}, "npm": {...}, ... }. Phase 1 lands the brew entry; Phase 2 fills the rest.proxy_test.go- unit tests: allowed CONNECT forwards bytes, denied CONNECT returns 403 with audit row markeddecision=deny, byte counters and duration populate.pkg/audit/audit.go:Egress []EgressRowtoRecord. AddEgressRowstruct.Append/Wrapsignatures. Caller populatesEgresson the base record beforewriter.Wrapruns.audit_test.goto assert round-trip serialization includes egress rows when present and omits when absent.pkg/shell/shell.go:Env []stringfield onRunner. When non-nil,Exec/Capturesetcmd.Env = append(os.Environ(), r.Env...). When nil, default behavior (inherits os.Environ viacmd.Env == nil).pkg/ops/passthrough/passthrough.go:Modetype withModeEnforce/ModeObserveconstants.WithEgress(allowlist []string, mode Mode)option.Commandwraps the action: starts proxy, setsHTTPS_PROXY/HTTP_PROXYenv on the shell.Runner for this invocation only (use a per-call shadow Runner so concurrent commands stay independent), runs the underlying action, stops proxy, attaches collected rows to the audit base record.passthrough.Commandbuild its ownverb.Specthat decorates the base record post-action, or (b) add anOnCompletehook toverb.Specthat runs afterActionreturns and gets a chance to mutate the record. Implement (b); it's the cleaner seam and unlocks future per-verb side-channel data without re-plumbing.pkg/verb/verb.go:OnComplete func(*audit.Record)toSpec. Called insidewriter.Wrapafterfn()returns, with a pointer to the record being finalized. Updateverb_test.goto cover the hook.cmd/coily/ops_pkgmgrs.go:passthrough.WithEgress(egress.Allowlists["brew"], passthrough.ModeEnforce)forbrewonly in Phase 1.docs/features/25-egress-proxy.mdnew feature doc:End-to-end smoke test (manual, not committed as a test): on Mac, run
coily brew search wgetandcoily brew install jq(or another tiny formula); verify the audit row contains egress rows for the expected hosts; runcoily brew install some-formula-pulling-from-an-unallowed-hostto confirm a deny path actually fails the install.Pause for Kai's confirmation before Phase 2.
Phase 2: full sweep (after confirmation)
pkg/egress/allowlist.gowith the remaining 11 pkgmgr entries.passthrough.WithEgress(...)on the remaining 11 pkgmgr commands incmd/coily/ops_pkgmgrs.go.passthrough.WithEgress(nil, passthrough.ModeObserve)onaws,gh,kubectl,docker,tailscale. Allowlist is nil in observe mode; every host is forwarded and logged.*.foo.com) toegress.Proxyif any pkgmgr's real-world host pattern needs it. Likely yes for cargo (some crates fetch from*.crates.io).cmd/coily/ops_audit.goand thecoily audit show/coily audit tailrendering paths to surface egress rows in human-readable output. (Ifcoily audit showdoesn't exist yet per the unresolved.md "What I would build next" #1, add it as part of this phase.)docs/features/25-egress-proxy.mdwith the full surface, the suffix-match semantics, the observe-mode story, and a sample audit row.Acceptance
Phase 1:
pkg/egress/lands with tests passing.coily brew search wgetruns end-to-end and produces an audit row withegress: [{host: formulae.brew.sh, decision: allow, ...}, ...].coily brew install <tiny-formula>runs end-to-end and audits the expected hosts.egress: [{host: ..., decision: deny}]row and the underlying CONNECT returns 403.coily brew --versioninvocation.Phase 2:
coily aws s3 lsrecords one egress row per region endpoint contacted.coily gh issue listrecords one egress row forapi.github.com.coily audit show <id>renders argv + egress together (or the rendering path is filed as its own follow-up issue ifaudit showdoesn't yet exist).docs/features/25-egress-proxy.mdcovers the full surface.Out of scope
AWS_CA_BUNDLE, kubeconfig CAs, system trust forgh). Deliberate non-goal.~/.coily/config.yamllater if needed.Notes
The "lockdown wires the env vars" framing in the original #35 sketch is replaced by "the passthrough wrapper wires the env vars." Lockdown already denies the raw pkgmgr binaries (
Bash(brew:*),Bash(npm:*), etc.), so the only path to invoke them is throughcoily <pkgmgr>, and the wrapper's job is to setHTTPS_PROXYbefore exec. No lockdown rule changes are needed for this issue.