Archived

⚔️🛡️🎯 Gauntlet is a two-role adversarial MCP server that infers software correctness by observing how code behaves under sustained, targeted attack. Built for dark-factory environments where code is written by bots and verified by attack.

adversarial-mcp-server attack-driven-verification

This repository has been archived on 2026-06-17. You can view files and clone it, but you cannot make any changes to its state, such as pushing and creating new issues, pull requests or comments.

Python 99.8%
Dockerfile 0.1%

Find a file

Kai Siren 2c7677c304 Some checks failed CI / Test (push) Has been cancelled Details CI / Docker (push) Has been cancelled Details CI / Lint & type-check (push) Has been cancelled Details TruffleHog / Scan for secrets (push) Has been cancelled Details Version bump guard / Require version bump on code changes (push) Has been cancelled Details docs: route dev commands through ward (coily -> ward migration) Flip Surface A dev-verb routing to `ward exec` / .ward/ward.yaml. coily ops cloud passthroughs stay on coily (no ward ops surface yet). closes #30 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>		2026-06-16 23:41:31 -07:00
.agents/skills/repo-gauntlet	chore: remove broken coily-trailer prepare-commit-msg hook	2026-06-05 22:57:36 -07:00
.claude	lockdown: sync to coily v2.45.0 [skip ci]	2026-05-28 09:28:15 +00:00
.claude-plugin	chore: repoint coilysiren/* GitHub refs to new org after move	2026-05-30 23:58:14 -07:00
.coily	chore(catalog): drop providesApis, bump agentic-os hook to v0.6.0	2026-05-27 22:57:12 -07:00
.gauntlet	rename Weapon -> Trial throughout	2026-04-23 10:57:48 -07:00
.github/workflows	chore(ci): remove codex-review-gate + undraft-and-poke-codex workflows	2026-05-16 15:43:12 -07:00
.ward	chore: add .ward/ward.yaml bridge (.coily -> .ward migration)	2026-06-14 15:06:54 -07:00
agents	fix: subagent allowlists work under plugin namespace	2026-04-24 19:14:01 -07:00
docs	chore: repoint coilysiren/* GitHub refs to new org after move	2026-05-30 23:58:14 -07:00
gauntlet	chore: clean up code-comments violations after agentic-os v0.2.8	2026-05-25 20:14:12 -07:00
scripts	chore: adopt coilysiren/agentic-os v0.2.1 upstream-ref pre-commit suite	2026-05-15 23:30:36 -07:00
skills	skills/gauntlet: auto-fix high risk in auto mode	2026-04-24 19:36:55 -07:00
tests	chore: clean up code-comments violations after agentic-os v0.2.8	2026-05-25 20:14:12 -07:00
.dockerignore	Add .dockerignore to exclude agent worktrees and caches	2026-04-19 04:25:07 -07:00
.gitignore	chore: commit coily lockdown baseline, gitignore host-local Claude state	2026-04-24 15:42:10 -07:00
.pre-commit-config.yaml	chore: remove broken coily-trailer prepare-commit-msg hook	2026-06-05 22:57:36 -07:00
.python-version	uv init	2026-04-10 01:10:53 -07:00
AGENTS.md	docs: route dev commands through ward (coily -> ward migration)	2026-06-16 23:41:31 -07:00
CLAUDE.md	Convert CLAUDE.md symlink to @AGENTS.md import	2026-04-23 13:35:45 -07:00
docker-compose.yml	Drop CliAdapter, WebDriverAdapter, demo_api, InMemoryHttpApi, rule assertions	2026-04-19 03:02:47 -07:00
Dockerfile	fix: remove nonexistent main.py from mypy targets and Dockerfile CMD	2026-04-10 22:09:43 -07:00
pyproject.toml	chore(catalog): land agentic-os v0.11.1 hook block	2026-05-30 10:52:36 -07:00
README.md	chore: repoint coilysiren/* GitHub refs to new org after move	2026-05-30 23:58:14 -07:00
uv.lock	Bump python-multipart in the uv group across 1 directory (#11 )	2026-05-13 11:43:18 -07:00

README.md

⚔️🛡️🎯 Gauntlet

Two-role adversarial MCP server that infers software correctness by observing how code behaves under sustained, targeted attack. Quality control for dark-factory environments where code is written by bots and verified by attack.

Run your service through the gauntlet. Point a host Claude Code agent at a running service, hand it the trial set, and the gauntlet is what the service survives. The host plays Attacker and Inspector; Gauntlet provides the deterministic tools (config loading, plan execution, risk-report assembly).

AI-written code can look correct while hiding behavioral failures. Traditional tests miss this because the same agent wrote code and tests. Gauntlet's Attacker context assumes the code is broken, and each Trial's blockers never load into that context, preserving a train/test split.

An Attacker uses a Trial aimed at a Target to generate Plans. Gauntlet's Drone executes those Plans as a User. An Inspector watches and surfaces Findings. Hidden Vitals are checked independently to produce a Clearance.

See docs/architecture.md for the model, docs/usage.md for the runbook, docs/development.md for dev setup.

Install

Gauntlet ships as a Claude Code plugin bundling the MCP server and the host skill:

claude plugin marketplace add coilysiren/gauntlet
claude plugin install gauntlet@coilysiren-gauntlet

Restart Claude Code so the skill, MCP server, and subagents register. Confirm with /mcp and "run gauntlet". No Anthropic creds needed; the host has auth. Local dev: git clone ... && claude --plugin-dir path/to/gauntlet.

The plugin delivers the MCP server, the gauntlet skill (orchestrator loop), gauntlet-author skill (spec to trial YAMLs), and gauntlet-attacker / -inspector / -holdout-evaluator subagents whose MCP allowlists enforce the train/test split. The Attacker subagent literally cannot call get_trial. The full MCP tool surface is listed in docs/FEATURES.md.

Project config

your-project/
└── .gauntlet/
    └── trials/
        ├── task_ownership.yaml
        └── ...

Trials define reusable attack strategies. blockers are externally observable truths about expected behavior, never loaded into the Attacker context:

title: Users cannot modify each other's tasks
description: >
  The task API must enforce resource ownership.
blockers:
  - A PATCH request by a non-owner is rejected with 403
  - The task body is unchanged after an unauthorized PATCH attempt
  - A GET by the owner after an unauthorized PATCH returns the original data

If the SUT requires auth, the orchestrator passes user_headers to execute_plan (a dict[str, dict[str, str]] mapping user names to headers). Users without an entry fall back to X-User: <name>.

README.md

⚔️🛡️🎯 Gauntlet

Install

Project config

See also