Blog post: Gauntlet, inferring correctness by attack #16

Closed
opened 2026-05-23 20:55:40 +00:00 by coilysiren · 1 comment
Owner

Originally filed by @coilysiren on 2026-05-03T19:53:28Z - https://github.com/coilysiren/website/issues/1344

🤖 Filed by Claude Code on Kai's behalf.

Working title: Gauntlet, inferring correctness by attack

Hook: Tests check what you expected. Attacks find what you didn't. Gauntlet is a two-role adversarial MCP server (Attacker, Inspector) that runs sustained, targeted attacks against an HTTP service and infers correctness from how it behaves under fire.

Beats:

  • The premise: dark-factory environments where bots write code and humans need invariants verified, not behavior described.
  • Why two roles: Attacker plans without seeing blocker text; Inspector reads results without seeing trial intent. The information asymmetry is the design.
  • Holdout evaluator as the third leg: fresh context, derives acceptance plans from blocker text directly.
  • Walkthrough on a real service (probably eco-spec-tracker or backend).
  • What this catches that property tests / fuzzers don't.
  • Open question: where this lives in CI, given how long a real run is.

Why now: Gauntlet is novel enough as a design to deserve a real explainer, not just a README.

Audience: people thinking about LLM-driven QA, agentic test loops, and adversarial verification.

🤖 Filed by Claude Code on Kai's behalf.


Moved from coilysiren/coilyco-ai#12.

_Originally filed by @coilysiren on 2026-05-03T19:53:28Z - [https://github.com/coilysiren/website/issues/1344](https://github.com/coilysiren/website/issues/1344)_ > 🤖 Filed by Claude Code on Kai's behalf. **Working title:** Gauntlet, inferring correctness by attack **Hook:** Tests check what you expected. Attacks find what you didn't. Gauntlet is a two-role adversarial MCP server (Attacker, Inspector) that runs sustained, targeted attacks against an HTTP service and infers correctness from how it behaves under fire. **Beats:** - The premise: dark-factory environments where bots write code and humans need invariants verified, not behavior described. - Why two roles: Attacker plans without seeing blocker text; Inspector reads results without seeing trial intent. The information asymmetry is the design. - Holdout evaluator as the third leg: fresh context, derives acceptance plans from blocker text directly. - Walkthrough on a real service (probably eco-spec-tracker or backend). - What this catches that property tests / fuzzers don't. - Open question: where this lives in CI, given how long a real run is. **Why now:** Gauntlet is novel enough as a design to deserve a real explainer, not just a README. **Audience:** people thinking about LLM-driven QA, agentic test loops, and adversarial verification. > 🤖 Filed by Claude Code on Kai's behalf. --- *Moved from coilysiren/coilyco-ai#12.*
Author
Owner

Iceboxed in the 2026-05-29 backlog burn-down: Speculative blog post draft. Reopen anytime if it becomes real.

Iceboxed in the 2026-05-29 backlog burn-down: Speculative blog post draft. Reopen anytime if it becomes real.
coilysiren 2026-05-30 05:43:03 +00:00
  • closed this issue
  • added the
    icebox
    label
Sign in to join this conversation.
No labels
icebox
P0
P1
P2
P3
P4
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
coilysiren/website#16
No description provided.