- Python 99.8%
- Dockerfile 0.1%
|
Some checks are pending
Re-apply the managed agentic-os pre-commit block at rev v0.11.1 (context-load-points, owner-agnostic closes-issue, repo-pointer-skills, trufflehog, coily-trailer) and clear the pre-existing documentation debt that blocked auto-land: - Restructure AGENTS.md to the 9-section catalog-trifecta template. - Relocate root SCOPE.md to docs/scope.md, trimmed under the 4000c cap. - Trim README.md and docs/FEATURES.md under the 4000c cap. - Exempt subagent load-points (agents/*.md), SKILL.md bodies, and long-form docs/ reference material via documentation-layout excludes. closes #21 Audit-log: coily://1780119218/AGPHOX5K - coily git commit Audit-log: coily://1780125125/AGPHPOOM - coily git commit Audit-log: coily://1780161333/AGPHTYSJ - coily git commit |
||
|---|---|---|
| .agents/skills/gauntlet | ||
| .claude | ||
| .claude-plugin | ||
| .coily | ||
| .gauntlet | ||
| .github/workflows | ||
| agents | ||
| docs | ||
| gauntlet | ||
| scripts | ||
| skills | ||
| tests | ||
| .dockerignore | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| .python-version | ||
| AGENTS.md | ||
| CLAUDE.md | ||
| docker-compose.yml | ||
| Dockerfile | ||
| pyproject.toml | ||
| README.md | ||
| uv.lock | ||
⚔️🛡️🎯 Gauntlet
Two-role adversarial MCP server that infers software correctness by observing how code behaves under sustained, targeted attack. Quality control for dark-factory environments where code is written by bots and verified by attack.
Run your service through the gauntlet. Point a host Claude Code agent at a running service, hand it the trial set, and the gauntlet is what the service survives. The host plays Attacker and Inspector; Gauntlet provides the deterministic tools (config loading, plan execution, risk-report assembly).
AI-written code can look correct while hiding behavioral failures. Traditional tests miss this because the same agent wrote code and tests. Gauntlet's Attacker context assumes the code is broken, and each Trial's blockers never load into that context, preserving a train/test split.
An Attacker uses a Trial aimed at a Target to generate Plans. Gauntlet's Drone executes those Plans as a User. An Inspector watches and surfaces Findings. Hidden Vitals are checked independently to produce a Clearance.
See docs/architecture.md for the model, docs/usage.md for the runbook, docs/development.md for dev setup.
Install
Gauntlet ships as a Claude Code plugin bundling the MCP server and the host skill:
claude plugin marketplace add coilysiren/gauntlet
claude plugin install gauntlet@coilysiren-gauntlet
Restart Claude Code so the skill, MCP server, and subagents register. Confirm with /mcp and "run gauntlet". No Anthropic creds needed; the host has auth. Local dev: git clone ... && claude --plugin-dir path/to/gauntlet.
The plugin delivers the MCP server, the gauntlet skill (orchestrator loop), gauntlet-author skill (spec to trial YAMLs), and gauntlet-attacker / -inspector / -holdout-evaluator subagents whose MCP allowlists enforce the train/test split. The Attacker subagent literally cannot call get_trial. The full MCP tool surface is listed in docs/FEATURES.md.
Project config
your-project/
└── .gauntlet/
└── trials/
├── task_ownership.yaml
└── ...
Trials define reusable attack strategies. blockers are externally observable truths about expected behavior, never loaded into the Attacker context:
title: Users cannot modify each other's tasks
description: >
The task API must enforce resource ownership.
blockers:
- A PATCH request by a non-owner is rejected with 403
- The task body is unchanged after an unauthorized PATCH attempt
- A GET by the owner after an unauthorized PATCH returns the original data
If the SUT requires auth, the orchestrator passes user_headers to execute_plan (a dict[str, dict[str, str]] mapping user names to headers). Users without an entry fall back to X-User: <name>.
See also
- AGENTS.md, docs/FEATURES.md, .coily/coily.yaml.
- Prior art (RESTler, Schemathesis, ToolFuzz) and the full model live in
docs/architecture.md.
Cross-reference convention from coilysiren/agentic-os#59.