coily: sandbox-safe privileged execution via filesystem RPC #49

Open
opened 2026-05-23 20:54:02 +00:00 by coilysiren · 0 comments
Owner

Originally filed by @coilysiren on 2026-05-10T23:15:37Z - https://github.com/coilysiren/coily/issues/111

coily: sandbox-safe privileged execution via filesystem RPC

Why

Today, an agent running inside a Cowork-mode Linux sandbox cannot perform privileged operations on Kai's behalf. Concrete example from this session: the agent wanted to file a GitHub issue on coilysiren/coilyco-ai (a private repo) and could not, because the GitHub MCP server in the sandbox does not have private-repo scope, no gh CLI is installed, no SSH keys are mounted, no PAT is available in the sandbox env, and AWS SSM is not reachable (coily is darwin/arm64, AWS CLI is not installed, AWS creds are not mounted).

The agent has read/write access to ~/projects/coilysiren but zero identity. Granting identity by pasting a PAT into the sandbox is the obvious bad answer. We want a design where:

  1. Tokens never enter the sandbox process memory.
  2. The Mac retains full control over what gets executed in Kai's name.
  3. Auth surface scales to all the tools coily wraps (gh, aws, git, linear, anything), not just GitHub.
  4. The same shape works from Mac Cowork and from mobile Dispatch sessions, which both route execution back to the Mac.
  5. There is an emergency lockout reachable from a phone.

This issue proposes the design and lists the components to build.

Current state of the sandbox (for grounding)

Confirmed in this session:

  • Linux user gifted-determined-planck, ephemeral session id.
  • HOME=/sessions/gifted-determined-planck, no .ssh, no .gitconfig, no .config/gh, no .aws, no .netrc.
  • Only mounts: ~/projects/coilysiren (RW, live bidirectional with Mac filesystem), outputs (RW scratch), uploads (RO).
  • Outbound network is forced through proxies on the Mac: https_proxy=http://localhost:3128, SOCKS at localhost:1080, with GIT_SSH_COMMAND wrapping ssh via socat.
  • The proxy has an allowlist. github.com returns 200 with remote_ip=127.0.0.1, api.github.com and example.com return 403, 100.100.100.100 (Tailscale MagicDNS) and *.ts.net return 403. So the proxy resolves and terminates TLS, and the allowlist blocks tailnet access by design.
  • Clear sandbox marker env var: SANDBOX_RUNTIME=1. Also CLAUDE_CODE_HOST_HTTP_PROXY_PORT and CLAUDE_CODE_HOST_SOCKS_PROXY_PORT.

These properties mean the Mac is in full control of what the sandbox can reach over the network, and the sandbox has a writable shared filesystem channel to the Mac.

Design overview

The Mac is the trust point. coily wraps every privileged tool end to end. When a sandbox-side coily invocation needs privilege, it issues an RPC over the shared filesystem to a Mac-side executor. The Mac-side executor reads creds from Keychain (or other native stores), runs the real CLI, and returns the result. Tokens never cross the sandbox boundary.

sandbox-side coily
  ↓ writes descriptor
~/projects/coilysiren/.coily/requests/<uuid>.yaml
  ↓ launchd WatchPaths fires
Mac-side coily executor
  ↓ validate descriptor (HMAC + allowlist)
  ↓ exec real CLI with native creds
  ↓ write response
~/projects/coilysiren/.coily/responses/<uuid>.yaml
  ↓ fsnotify wakes sandbox-side coily
sandbox-side coily returns to caller

This is RPC over filesystem with launchd as the trigger. No persistent daemon. No proxy. No socket plumbing.

Why filesystem RPC and not a credential-injection proxy

An earlier draft of this design proposed a Mac-side intercepting proxy that injected Authorization: Bearer <PAT-from-Keychain> into outbound API calls. That works, but it loses to filesystem RPC for almost every coily case because:

  • Many privileged operations are not HTTP. git push over ssh, mas install, osascript, pbcopy. Proxies are useless for those, RPC handles them uniformly.
  • gh, aws, doctl, etc. already solved auth correctly. Reusing their native paths is strictly safer than reimplementing injection.
  • The audit unit is "what command was requested," not "what HTTP calls were emitted." The first matches the human mental model.
  • Less code, less surface, fewer ways to break.

The proxy shape is still useful for the narrow case of a sandbox-side process bypassing coily and talking to an API directly with a library. Reserve it for that. coily owns the wrappers, so this case is rare.

Components

1. SANDBOX_RUNTIME detection

Sandbox-side coily detects it is running in a Cowork sandbox by checking, in order:

  • $SANDBOX_RUNTIME == "1". Cleanest signal.
  • Fallbacks: $LOGNAME matches ^[a-z]+-[a-z]+-[a-z]+$ (the dashed-adjective name shape), $HOME starts with /sessions/, presence of $CLAUDE_CODE_HOST_HTTP_PROXY_PORT.

When detection trips, coily switches to RPC mode for any subcommand that requires privilege. Read-only operations that work without creds (eg coily gh repo view on a public repo) skip RPC and run locally.

Mac-side coily is the same binary, built for darwin/arm64. It does not key off SANDBOX_RUNTIME because it is never set on the Mac. Same command graph, different default execution mode.

2. Descriptor schema

YAML at ~/projects/coilysiren/.coily/requests/<uuid>.yaml:

id: <uuid>
created_at: <rfc3339-utc>
nonce: <random-uint64>
principal: <hmac-key-id>
subcommand: gh
args:
  - issue
  - create
  - --title
  - "..."
  - --body
  - "..."
env:
  GH_REPO: coilysiren/coilyco-ai
requires_confirm: false
timeout_sec: 30
hmac: <base64-sha256-tag-over-canonical-body>

Canonicalization for HMAC uses JCS (RFC 8785) over the body with the hmac field omitted. Mac-side coily verifies, then refuses on mismatch.

3. Mac-side trigger

A launchd LaunchAgent with WatchPaths on ~/projects/coilysiren/.coily/requests/. Fires Mac-side coily on every change. No always-on daemon. Plist sketch:

<key>WatchPaths</key>
<array>
  <string>/Users/kai/projects/coilysiren/.coily/requests</string>
</array>
<key>ProgramArguments</key>
<array>
  <string>/usr/local/bin/coily</string>
  <string>_executor_drain</string>
</array>

The _executor_drain subcommand iterates pending descriptors FIFO, holds an exclusive lock for the duration of each exec, refuses to start a second exec while the lock is held. Anything queued during a run gets processed on the next WatchPaths fire.

4. Mac-side execution

For each descriptor:

  1. Validate HMAC over canonical body. Refuse on mismatch.
  2. Check nonce against the sliding window in ~/.coily/state/seen-nonces.db. Refuse on replay.
  3. Check lockout state (see section 7). Refuse on lockout.
  4. Resolve subcommand to a policy entry. Each policy entry declares allowlisted args, allowlisted env vars, rate limit class, confirm requirement.
  5. Apply per-subcommand args allowlist. Refuse on policy violation.
  6. Token bucket check for the rate limit class. Refuse on throttle.
  7. If requires_confirm is true or the policy entry forces it, send a macOS notification with allow/deny actions. Block on Kai's response.
  8. Fetch any needed creds from Keychain.
  9. exec the real CLI with creds in env (never in args), output captured.
  10. Write ~/projects/coilysiren/.coily/responses/<uuid>.yaml with stdout, stderr, exit code, duration.
  11. Append a sanitized row to ~/.coily/audit/<yyyy-mm>.jsonl. Credentials and full response bodies never land in audit.

5. Sandbox-side wait and resume

Sandbox-side coily uses fsnotify on ~/projects/coilysiren/.coily/responses/ and waits for <uuid>.yaml. Backstop poll loop with timeout. On timeout it writes ~/projects/coilysiren/.coily/teardowns/<uuid>.yaml so a hung Mac-side executor cannot leave orphans.

6. Concurrency and rate limiting

Defaults:

  • Concurrency: 1. One exec at a time across all subcommands. Expand only when measured pain forces splitting reads from writes into separate queues.
  • Per-command timeout: 30s default, overridable per descriptor up to a policy ceiling.
  • Token bucket per principal: 60 requests per minute default, per-subcommand overrides. Stored in ~/.coily/state/buckets.yaml.

Refusal taxonomy with distinct codes: hmac-fail, replay-detected, lockout-active, pause-active, rate-limit, concurrency-busy, policy-deny, command-timeout, confirm-rejected. Sandbox-side coily branches on the code. Notably, rate-limit triggers a sandbox-side wait-and-retry with backoff. Other codes fail fast.

7. Emergency stop

Two states, two verbs, two recovery costs.

coily lockout (nuclear, human only)

  • Reachable from anywhere: Mac CLI, Android Termux widget over Tailscale SSH, Mac hotkey via Hammerspoon, voice via Google Assistant routine plus Tasker. The most likely real-world entry point is the phone.
  • Effects: writes ~/.coily/state/lockout.yaml (outside the cowork mount, sandbox cannot see or write it), SIGTERMs any in-flight subprocess with SIGKILL after a 5s grace, rotates the HMAC key in Keychain. Push notification confirms.
  • Sandbox-side coily's HMAC key is now stale. Every subsequent request fails hmac-fail. There is no recovery path from inside the sandbox.
  • Recovery: coily unlock on the Mac, Touch ID gated, plus a manual re-bootstrap of the HMAC key into the sandbox via a separate command. Audit captures the lockout event, every refused request after it, and the unlock event.

coily pause (soft, sandbox self-imposed)

  • Sandbox-side coily pulls this when it detects a stuck loop, excessive retries, or a request that looks self-inflicted-bad. Could also be triggered manually.
  • Effects: writes ~/.coily/state/pause.yaml on the Mac side (same out-of-mount location, set indirectly via a descriptor-style RPC with no exec). Subsequent requests refuse with pause-active. HMAC is left untouched.
  • Recovery: coily resume on the Mac, no Touch ID, no re-bootstrap. Cheap intentionally so the sandbox is encouraged to pause itself when uncertain.

Default-safe failure

  • If Mac-side coily cannot read its lockout/pause state file for any reason, treat as locked. Unable-to-confirm-safe means refuse.
  • Auto-lockout on macOS session lock via a launchd agent on com.apple.screenIsLocked. Stays locked until explicitly unlocked, separate from Mac session unlock.
  • Inactivity timeout. After N minutes with no successful exec, drop to pause. Cheap insurance against "I walked away and forgot."

Always-confirm, orthogonal to lockout

Some subcommands carry requires_confirm: true in policy regardless of lockout state. git push --force, aws iam delete-*, anything destructive in a configurable set. Mac-side coily sends a notification with allow/deny actions. Lockout answers "stop everything," confirms answer "approve this specific one." Both wanted.

8. HMAC details

  • 32-byte random key. Stored on Mac side in Keychain. Stored on sandbox side at ~/.coily/key chmod 600. Bootstrap is the only time the key crosses sides, after that it never moves.
  • Tag is HMAC-SHA256(key, canonical(body)) with hmac field omitted from input. Canonicalization is JCS or equivalent deterministic JSON serialization.
  • Nonces in descriptors plus a sliding window of seen nonces on the Mac side. Replay protection.
  • Optional: sign responses with the same key. Sandbox verifies. Mutual auth over the filesystem.
  • Future graduations: multiple HMAC keys for multiple sandbox principals (one per agent context), Mac-side resolves to per-principal policy. Skip until you actually run multiple sandbox principals.

HMAC's value is provenance, not authorization. It catches stray writes from outside coily (Dropbox sync, editor plugins, second tools dropping files in the request path) and protects against tampering between write and read. It does not protect against a compromised coily process. Authorization at the request level (policy allowlists, rate limits, confirms) is what catches misbehavior.

Build order: ship without HMAC, add it the moment you notice a second writer in .coily/requests/. Trigger is real noise, not hypothetical threats.

Client coverage

  • Mac Cowork sessions. Full design works. Sandbox on Mac, mount on Mac, launchd on Mac, Keychain on Mac. Single trust point.
  • Mobile Dispatch. Full design works as long as Kai's Mac is open and the desktop app is running. Dispatch is a remote UI over the same Mac-side execution context. Sandbox, mount, launchd, and Keychain are still all on the Mac. The Mac-must-be-open constraint is the real limiter, not the design.
  • Mobile Code sessions. Off the table. The sandbox there is Anthropic-hosted, not Mac-mounted. No Keychain, no launchd, no ~/projects/coilysiren to write descriptors into. The shared filesystem primitive does not exist.

If Code sessions ever need to participate, the move is to relocate the trust point off the Mac onto an always-on tailnet host. Same shape, different transport, different host. Ongoing cost of maintaining that host is its own decision.

Phone-side entry points (Android)

Primary: Termux widget calls a script. Script is one ssh line.

tailscale ssh kai-mac coily lockout --reason "android widget"

No public surface, mutual auth via tailnet identity. Voice trigger via Google Assistant routine plus Tasker is one extra hop. "Hey Google, coily lockout." Voice latency is the only cost.

Fallback channel for when Tailscale is wedged: a Gmail label watcher on the Mac. Send an email with a specific label or to a specific alias, label triggers a launchd-watched flag file. Latency is Gmail polling, seconds to minutes. Slow but reliable when nothing else works.

Mac-side hotkey

Hammerspoon (or Karabiner): hold control-option-P for 2 seconds, fires coily lockout. Useful when at the Mac and not in a terminal.

File layout summary

~/projects/coilysiren/.coily/       (mounted, shared with sandbox)
  requests/<uuid>.yaml              (sandbox writes, Mac reads)
  responses/<uuid>.yaml             (Mac writes, sandbox reads)
  teardowns/<uuid>.yaml             (sandbox writes when giving up)

~/.coily/                           (Mac-only, outside mount)
  key                               (HMAC key, gitignored by living outside repo)
  state/
    lockout.yaml
    pause.yaml
    seen-nonces.db
    buckets.yaml
  audit/
    YYYY-MM.jsonl
  policy/
    gh.yaml                         (per-subcommand args allowlist + confirm rules)
    aws.yaml
    git.yaml

Sandbox-side coily reads its HMAC key from ~/.coily/key inside the sandbox (not in the mount). Bootstrap copies it once at provisioning, never again. Lockout rotation invalidates the sandbox's copy.

Implementation order

  1. Linux build of coily. Cross-compile from cmd/. Drop into bin/coily-linux-arm64. Sandbox-side coily --help should work.
  2. SANDBOX_RUNTIME detector. Tiny utility. Test on Mac (false) and in sandbox (true).
  3. Descriptor schema and request/response shape. YAML structs, JCS canonicalization, no HMAC yet.
  4. launchd plist with WatchPaths. Install via make install-launchagent.
  5. Mac-side _executor_drain subcommand. FIFO queue, exclusive lock, no policy yet (allow-all). Wire end to end for coily gh issue create. Get the round trip working.
  6. Per-subcommand policy. Start with gh only. Allowlist of subcommand args, allowlisted env vars.
  7. Token bucket and concurrency. Add the two refusal codes. Sandbox-side branching.
  8. Lockout and pause. Both verbs, Mac-side state files outside the mount, refusal codes wired through.
  9. Confirms for dangerous subcommands. macOS UserNotifications with action buttons. Block on response.
  10. HMAC and nonces. Bootstrap command for keying both sides. Replay window.
  11. Audit log. JSONL rows per accepted and refused request.
  12. Android Termux widget. Document the setup, commit the widget script.
  13. Mac hotkey. Hammerspoon config in the repo.

Steps 1-5 are the meat. Steps 6-9 are policy. Steps 10-13 harden.

Open questions and tradeoffs

  • Sandbox-side polling vs fsnotify. Linux fsnotify on a FUSE-mounted folder may or may not deliver events reliably. Verify before committing. Fallback is short polling on response paths.
  • Bootstrapping the HMAC key. First run, no key on either side. Either Kai pastes a one-time base64 blob from the Mac into the sandbox via the Cowork chat, or the first-run sequence is run on the Mac with the sandbox folder mounted but no sandbox active, with the Mac dropping the key file directly into the sandbox HOME. The latter is cleaner because no secret crosses the agent's context.
  • Multi-sandbox principals. Future you may want one HMAC key per agent context. Skip until you have the concrete need. Schema already supports it via the principal field.
  • Cross-machine. This design assumes one Mac, one sandbox. If you ever run coily across multiple Macs sharing a project tree (rare), state files need to disambiguate.
  • macOS upgrades. launchd WatchPaths and Keychain APIs are stable, but Touch ID and UserNotifications APIs have shifted in recent macOS releases. Pin to current macOS behavior and re-verify on each major upgrade.
  • What happens during a hard reboot mid-exec. Pending request descriptor remains, but the response will never come. Sandbox-side coily should detect missing-response after timeout and roll forward to teardown. Mac-side coily on next start should sweep .coily/requests/ and refuse anything older than N minutes with stale-on-recovery.

Why we ended up here

This design started as "how do I let Claude file a GitHub issue on a private repo" and grew, deliberately, into "how do coily-wrapped privileged operations work for any agent context that has filesystem access to the Mac." The original problem is solved by the first invocation: coily gh issue create from a Cowork sandbox produces a real issue on coilysiren/coilyco-ai without the sandbox ever seeing a token. Everything else here is the framework that makes adding the next privileged op (and the one after that, and the lockout button, and the mobile widget) cheap.

_Originally filed by @coilysiren on 2026-05-10T23:15:37Z - [https://github.com/coilysiren/coily/issues/111](https://github.com/coilysiren/coily/issues/111)_ # coily: sandbox-safe privileged execution via filesystem RPC ## Why Today, an agent running inside a Cowork-mode Linux sandbox cannot perform privileged operations on Kai's behalf. Concrete example from this session: the agent wanted to file a GitHub issue on `coilysiren/coilyco-ai` (a private repo) and could not, because the GitHub MCP server in the sandbox does not have private-repo scope, no `gh` CLI is installed, no SSH keys are mounted, no PAT is available in the sandbox env, and AWS SSM is not reachable (coily is darwin/arm64, AWS CLI is not installed, AWS creds are not mounted). The agent has read/write access to `~/projects/coilysiren` but zero identity. Granting identity by pasting a PAT into the sandbox is the obvious bad answer. We want a design where: 1. Tokens never enter the sandbox process memory. 2. The Mac retains full control over what gets executed in Kai's name. 3. Auth surface scales to all the tools coily wraps (`gh`, `aws`, `git`, `linear`, anything), not just GitHub. 4. The same shape works from Mac Cowork and from mobile Dispatch sessions, which both route execution back to the Mac. 5. There is an emergency lockout reachable from a phone. This issue proposes the design and lists the components to build. ## Current state of the sandbox (for grounding) Confirmed in this session: - Linux user `gifted-determined-planck`, ephemeral session id. - `HOME=/sessions/gifted-determined-planck`, no `.ssh`, no `.gitconfig`, no `.config/gh`, no `.aws`, no `.netrc`. - Only mounts: `~/projects/coilysiren` (RW, live bidirectional with Mac filesystem), outputs (RW scratch), uploads (RO). - Outbound network is forced through proxies on the Mac: `https_proxy=http://localhost:3128`, SOCKS at `localhost:1080`, with `GIT_SSH_COMMAND` wrapping ssh via socat. - The proxy has an allowlist. `github.com` returns 200 with `remote_ip=127.0.0.1`, `api.github.com` and `example.com` return 403, `100.100.100.100` (Tailscale MagicDNS) and `*.ts.net` return 403. So the proxy resolves and terminates TLS, and the allowlist blocks tailnet access by design. - Clear sandbox marker env var: `SANDBOX_RUNTIME=1`. Also `CLAUDE_CODE_HOST_HTTP_PROXY_PORT` and `CLAUDE_CODE_HOST_SOCKS_PROXY_PORT`. These properties mean the Mac is in full control of what the sandbox can reach over the network, and the sandbox has a writable shared filesystem channel to the Mac. ## Design overview The Mac is the trust point. coily wraps every privileged tool end to end. When a sandbox-side coily invocation needs privilege, it issues an RPC over the shared filesystem to a Mac-side executor. The Mac-side executor reads creds from Keychain (or other native stores), runs the real CLI, and returns the result. Tokens never cross the sandbox boundary. ``` sandbox-side coily ↓ writes descriptor ~/projects/coilysiren/.coily/requests/<uuid>.yaml ↓ launchd WatchPaths fires Mac-side coily executor ↓ validate descriptor (HMAC + allowlist) ↓ exec real CLI with native creds ↓ write response ~/projects/coilysiren/.coily/responses/<uuid>.yaml ↓ fsnotify wakes sandbox-side coily sandbox-side coily returns to caller ``` This is RPC over filesystem with launchd as the trigger. No persistent daemon. No proxy. No socket plumbing. ## Why filesystem RPC and not a credential-injection proxy An earlier draft of this design proposed a Mac-side intercepting proxy that injected `Authorization: Bearer <PAT-from-Keychain>` into outbound API calls. That works, but it loses to filesystem RPC for almost every coily case because: - Many privileged operations are not HTTP. `git push` over ssh, `mas install`, `osascript`, `pbcopy`. Proxies are useless for those, RPC handles them uniformly. - `gh`, `aws`, `doctl`, etc. already solved auth correctly. Reusing their native paths is strictly safer than reimplementing injection. - The audit unit is "what command was requested," not "what HTTP calls were emitted." The first matches the human mental model. - Less code, less surface, fewer ways to break. The proxy shape is still useful for the narrow case of a sandbox-side process bypassing coily and talking to an API directly with a library. Reserve it for that. coily owns the wrappers, so this case is rare. ## Components ### 1. SANDBOX_RUNTIME detection Sandbox-side coily detects it is running in a Cowork sandbox by checking, in order: - `$SANDBOX_RUNTIME == "1"`. Cleanest signal. - Fallbacks: `$LOGNAME` matches `^[a-z]+-[a-z]+-[a-z]+$` (the dashed-adjective name shape), `$HOME` starts with `/sessions/`, presence of `$CLAUDE_CODE_HOST_HTTP_PROXY_PORT`. When detection trips, coily switches to RPC mode for any subcommand that requires privilege. Read-only operations that work without creds (eg `coily gh repo view` on a public repo) skip RPC and run locally. Mac-side coily is the same binary, built for darwin/arm64. It does not key off `SANDBOX_RUNTIME` because it is never set on the Mac. Same command graph, different default execution mode. ### 2. Descriptor schema YAML at `~/projects/coilysiren/.coily/requests/<uuid>.yaml`: ```yaml id: <uuid> created_at: <rfc3339-utc> nonce: <random-uint64> principal: <hmac-key-id> subcommand: gh args: - issue - create - --title - "..." - --body - "..." env: GH_REPO: coilysiren/coilyco-ai requires_confirm: false timeout_sec: 30 hmac: <base64-sha256-tag-over-canonical-body> ``` Canonicalization for HMAC uses JCS (RFC 8785) over the body with the `hmac` field omitted. Mac-side coily verifies, then refuses on mismatch. ### 3. Mac-side trigger A launchd LaunchAgent with `WatchPaths` on `~/projects/coilysiren/.coily/requests/`. Fires Mac-side coily on every change. No always-on daemon. Plist sketch: ```xml <key>WatchPaths</key> <array> <string>/Users/kai/projects/coilysiren/.coily/requests</string> </array> <key>ProgramArguments</key> <array> <string>/usr/local/bin/coily</string> <string>_executor_drain</string> </array> ``` The `_executor_drain` subcommand iterates pending descriptors FIFO, holds an exclusive lock for the duration of each exec, refuses to start a second exec while the lock is held. Anything queued during a run gets processed on the next WatchPaths fire. ### 4. Mac-side execution For each descriptor: 1. Validate HMAC over canonical body. Refuse on mismatch. 2. Check nonce against the sliding window in `~/.coily/state/seen-nonces.db`. Refuse on replay. 3. Check lockout state (see section 7). Refuse on lockout. 4. Resolve subcommand to a policy entry. Each policy entry declares allowlisted args, allowlisted env vars, rate limit class, confirm requirement. 5. Apply per-subcommand args allowlist. Refuse on policy violation. 6. Token bucket check for the rate limit class. Refuse on throttle. 7. If `requires_confirm` is true or the policy entry forces it, send a macOS notification with allow/deny actions. Block on Kai's response. 8. Fetch any needed creds from Keychain. 9. `exec` the real CLI with creds in env (never in args), output captured. 10. Write `~/projects/coilysiren/.coily/responses/<uuid>.yaml` with stdout, stderr, exit code, duration. 11. Append a sanitized row to `~/.coily/audit/<yyyy-mm>.jsonl`. Credentials and full response bodies never land in audit. ### 5. Sandbox-side wait and resume Sandbox-side coily uses fsnotify on `~/projects/coilysiren/.coily/responses/` and waits for `<uuid>.yaml`. Backstop poll loop with timeout. On timeout it writes `~/projects/coilysiren/.coily/teardowns/<uuid>.yaml` so a hung Mac-side executor cannot leave orphans. ### 6. Concurrency and rate limiting Defaults: - Concurrency: 1. One exec at a time across all subcommands. Expand only when measured pain forces splitting reads from writes into separate queues. - Per-command timeout: 30s default, overridable per descriptor up to a policy ceiling. - Token bucket per principal: 60 requests per minute default, per-subcommand overrides. Stored in `~/.coily/state/buckets.yaml`. Refusal taxonomy with distinct codes: `hmac-fail`, `replay-detected`, `lockout-active`, `pause-active`, `rate-limit`, `concurrency-busy`, `policy-deny`, `command-timeout`, `confirm-rejected`. Sandbox-side coily branches on the code. Notably, rate-limit triggers a sandbox-side wait-and-retry with backoff. Other codes fail fast. ### 7. Emergency stop Two states, two verbs, two recovery costs. **`coily lockout` (nuclear, human only)** - Reachable from anywhere: Mac CLI, Android Termux widget over Tailscale SSH, Mac hotkey via Hammerspoon, voice via Google Assistant routine plus Tasker. The most likely real-world entry point is the phone. - Effects: writes `~/.coily/state/lockout.yaml` (outside the cowork mount, sandbox cannot see or write it), SIGTERMs any in-flight subprocess with SIGKILL after a 5s grace, rotates the HMAC key in Keychain. Push notification confirms. - Sandbox-side coily's HMAC key is now stale. Every subsequent request fails `hmac-fail`. There is no recovery path from inside the sandbox. - Recovery: `coily unlock` on the Mac, Touch ID gated, plus a manual re-bootstrap of the HMAC key into the sandbox via a separate command. Audit captures the lockout event, every refused request after it, and the unlock event. **`coily pause` (soft, sandbox self-imposed)** - Sandbox-side coily pulls this when it detects a stuck loop, excessive retries, or a request that looks self-inflicted-bad. Could also be triggered manually. - Effects: writes `~/.coily/state/pause.yaml` on the Mac side (same out-of-mount location, set indirectly via a descriptor-style RPC with no exec). Subsequent requests refuse with `pause-active`. HMAC is left untouched. - Recovery: `coily resume` on the Mac, no Touch ID, no re-bootstrap. Cheap intentionally so the sandbox is encouraged to pause itself when uncertain. **Default-safe failure** - If Mac-side coily cannot read its lockout/pause state file for any reason, treat as locked. Unable-to-confirm-safe means refuse. - Auto-lockout on macOS session lock via a launchd agent on `com.apple.screenIsLocked`. Stays locked until explicitly unlocked, separate from Mac session unlock. - Inactivity timeout. After N minutes with no successful exec, drop to pause. Cheap insurance against "I walked away and forgot." **Always-confirm, orthogonal to lockout** Some subcommands carry `requires_confirm: true` in policy regardless of lockout state. `git push --force`, `aws iam delete-*`, anything destructive in a configurable set. Mac-side coily sends a notification with allow/deny actions. Lockout answers "stop everything," confirms answer "approve this specific one." Both wanted. ### 8. HMAC details - 32-byte random key. Stored on Mac side in Keychain. Stored on sandbox side at `~/.coily/key` chmod 600. Bootstrap is the only time the key crosses sides, after that it never moves. - Tag is `HMAC-SHA256(key, canonical(body))` with `hmac` field omitted from input. Canonicalization is JCS or equivalent deterministic JSON serialization. - Nonces in descriptors plus a sliding window of seen nonces on the Mac side. Replay protection. - Optional: sign responses with the same key. Sandbox verifies. Mutual auth over the filesystem. - Future graduations: multiple HMAC keys for multiple sandbox principals (one per agent context), Mac-side resolves to per-principal policy. Skip until you actually run multiple sandbox principals. HMAC's value is provenance, not authorization. It catches stray writes from outside coily (Dropbox sync, editor plugins, second tools dropping files in the request path) and protects against tampering between write and read. It does not protect against a compromised coily process. Authorization at the request level (policy allowlists, rate limits, confirms) is what catches misbehavior. Build order: ship without HMAC, add it the moment you notice a second writer in `.coily/requests/`. Trigger is real noise, not hypothetical threats. ## Client coverage - **Mac Cowork sessions.** Full design works. Sandbox on Mac, mount on Mac, launchd on Mac, Keychain on Mac. Single trust point. - **Mobile Dispatch.** Full design works as long as Kai's Mac is open and the desktop app is running. Dispatch is a remote UI over the same Mac-side execution context. Sandbox, mount, launchd, and Keychain are still all on the Mac. The Mac-must-be-open constraint is the real limiter, not the design. - **Mobile Code sessions.** Off the table. The sandbox there is Anthropic-hosted, not Mac-mounted. No Keychain, no launchd, no `~/projects/coilysiren` to write descriptors into. The shared filesystem primitive does not exist. If Code sessions ever need to participate, the move is to relocate the trust point off the Mac onto an always-on tailnet host. Same shape, different transport, different host. Ongoing cost of maintaining that host is its own decision. ## Phone-side entry points (Android) Primary: Termux widget calls a script. Script is one ssh line. ```sh tailscale ssh kai-mac coily lockout --reason "android widget" ``` No public surface, mutual auth via tailnet identity. Voice trigger via Google Assistant routine plus Tasker is one extra hop. "Hey Google, coily lockout." Voice latency is the only cost. Fallback channel for when Tailscale is wedged: a Gmail label watcher on the Mac. Send an email with a specific label or to a specific alias, label triggers a launchd-watched flag file. Latency is Gmail polling, seconds to minutes. Slow but reliable when nothing else works. ## Mac-side hotkey Hammerspoon (or Karabiner): hold control-option-P for 2 seconds, fires `coily lockout`. Useful when at the Mac and not in a terminal. ## File layout summary ``` ~/projects/coilysiren/.coily/ (mounted, shared with sandbox) requests/<uuid>.yaml (sandbox writes, Mac reads) responses/<uuid>.yaml (Mac writes, sandbox reads) teardowns/<uuid>.yaml (sandbox writes when giving up) ~/.coily/ (Mac-only, outside mount) key (HMAC key, gitignored by living outside repo) state/ lockout.yaml pause.yaml seen-nonces.db buckets.yaml audit/ YYYY-MM.jsonl policy/ gh.yaml (per-subcommand args allowlist + confirm rules) aws.yaml git.yaml ``` Sandbox-side coily reads its HMAC key from `~/.coily/key` inside the sandbox (not in the mount). Bootstrap copies it once at provisioning, never again. Lockout rotation invalidates the sandbox's copy. ## Implementation order 1. **Linux build of coily.** Cross-compile from cmd/. Drop into `bin/coily-linux-arm64`. Sandbox-side `coily --help` should work. 2. **SANDBOX_RUNTIME detector.** Tiny utility. Test on Mac (false) and in sandbox (true). 3. **Descriptor schema and request/response shape.** YAML structs, JCS canonicalization, no HMAC yet. 4. **launchd plist with WatchPaths.** Install via `make install-launchagent`. 5. **Mac-side `_executor_drain` subcommand.** FIFO queue, exclusive lock, no policy yet (allow-all). Wire end to end for `coily gh issue create`. Get the round trip working. 6. **Per-subcommand policy.** Start with `gh` only. Allowlist of subcommand args, allowlisted env vars. 7. **Token bucket and concurrency.** Add the two refusal codes. Sandbox-side branching. 8. **Lockout and pause.** Both verbs, Mac-side state files outside the mount, refusal codes wired through. 9. **Confirms for dangerous subcommands.** macOS UserNotifications with action buttons. Block on response. 10. **HMAC and nonces.** Bootstrap command for keying both sides. Replay window. 11. **Audit log.** JSONL rows per accepted and refused request. 12. **Android Termux widget.** Document the setup, commit the widget script. 13. **Mac hotkey.** Hammerspoon config in the repo. Steps 1-5 are the meat. Steps 6-9 are policy. Steps 10-13 harden. ## Open questions and tradeoffs - **Sandbox-side polling vs fsnotify.** Linux fsnotify on a FUSE-mounted folder may or may not deliver events reliably. Verify before committing. Fallback is short polling on response paths. - **Bootstrapping the HMAC key.** First run, no key on either side. Either Kai pastes a one-time base64 blob from the Mac into the sandbox via the Cowork chat, or the first-run sequence is run on the Mac with the sandbox folder mounted but no sandbox active, with the Mac dropping the key file directly into the sandbox HOME. The latter is cleaner because no secret crosses the agent's context. - **Multi-sandbox principals.** Future you may want one HMAC key per agent context. Skip until you have the concrete need. Schema already supports it via the `principal` field. - **Cross-machine.** This design assumes one Mac, one sandbox. If you ever run coily across multiple Macs sharing a project tree (rare), state files need to disambiguate. - **macOS upgrades.** launchd WatchPaths and Keychain APIs are stable, but Touch ID and UserNotifications APIs have shifted in recent macOS releases. Pin to current macOS behavior and re-verify on each major upgrade. - **What happens during a hard reboot mid-exec.** Pending request descriptor remains, but the response will never come. Sandbox-side coily should detect missing-response after timeout and roll forward to teardown. Mac-side coily on next start should sweep `.coily/requests/` and refuse anything older than N minutes with `stale-on-recovery`. ## Why we ended up here This design started as "how do I let Claude file a GitHub issue on a private repo" and grew, deliberately, into "how do coily-wrapped privileged operations work for any agent context that has filesystem access to the Mac." The original problem is solved by the first invocation: `coily gh issue create` from a Cowork sandbox produces a real issue on `coilysiren/coilyco-ai` without the sandbox ever seeing a token. Everything else here is the framework that makes adding the next privileged op (and the one after that, and the lockout button, and the mobile widget) cheap.
coilysiren added
P3
and removed
P2
labels 2026-05-31 06:59:49 +00:00
Sign in to join this conversation.
No labels
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilyco-bridge/coily#49
No description provided.