Blog post: Gauntlet, inferring correctness by attack #16

New issue

Closed

opened 2026-05-23 20:55:40 +00:00 by coilysiren · 1 comment

coilysiren commented

2026-05-23 20:55:40 +00:00

Owner

Originally filed by @coilysiren on 2026-05-03T19:53:28Z - https://github.com/coilysiren/website/issues/1344

🤖 Filed by Claude Code on Kai's behalf.

Working title: Gauntlet, inferring correctness by attack

Hook: Tests check what you expected. Attacks find what you didn't. Gauntlet is a two-role adversarial MCP server (Attacker, Inspector) that runs sustained, targeted attacks against an HTTP service and infers correctness from how it behaves under fire.

Beats:

The premise: dark-factory environments where bots write code and humans need invariants verified, not behavior described.
Why two roles: Attacker plans without seeing blocker text; Inspector reads results without seeing trial intent. The information asymmetry is the design.
Holdout evaluator as the third leg: fresh context, derives acceptance plans from blocker text directly.
Walkthrough on a real service (probably eco-spec-tracker or backend).
What this catches that property tests / fuzzers don't.
Open question: where this lives in CI, given how long a real run is.

Why now: Gauntlet is novel enough as a design to deserve a real explainer, not just a README.

Audience: people thinking about LLM-driven QA, agentic test loops, and adversarial verification.

🤖 Filed by Claude Code on Kai's behalf.

Moved from coilysiren/coilyco-ai#12.

_Originally filed by @coilysiren on 2026-05-03T19:53:28Z - [https://github.com/coilysiren/website/issues/1344](https://github.com/coilysiren/website/issues/1344)_ > 🤖 Filed by Claude Code on Kai's behalf. **Working title:** Gauntlet, inferring correctness by attack **Hook:** Tests check what you expected. Attacks find what you didn't. Gauntlet is a two-role adversarial MCP server (Attacker, Inspector) that runs sustained, targeted attacks against an HTTP service and infers correctness from how it behaves under fire. **Beats:** - The premise: dark-factory environments where bots write code and humans need invariants verified, not behavior described. - Why two roles: Attacker plans without seeing blocker text; Inspector reads results without seeing trial intent. The information asymmetry is the design. - Holdout evaluator as the third leg: fresh context, derives acceptance plans from blocker text directly. - Walkthrough on a real service (probably eco-spec-tracker or backend). - What this catches that property tests / fuzzers don't. - Open question: where this lives in CI, given how long a real run is. **Why now:** Gauntlet is novel enough as a design to deserve a real explainer, not just a README. **Audience:** people thinking about LLM-driven QA, agentic test loops, and adversarial verification. > 🤖 Filed by Claude Code on Kai's behalf. --- *Moved from coilysiren/coilyco-ai#12.*

coilysiren commented

2026-05-30 05:43:03 +00:00

Author

Owner

Iceboxed in the 2026-05-29 backlog burn-down: Speculative blog post draft. Reopen anytime if it becomes real.

coilysiren

2026-05-30 05:43:03 +00:00