Bundle mission-control into Triple-C instead of cloning from GitHub
All checks were successful
Build App / compute-version (push) Successful in 2s
Build App / build-macos (push) Successful in 2m47s
Build Container / build-container (push) Successful in 9m0s
Build App / build-linux (push) Successful in 4m41s
Build App / build-windows (push) Successful in 5m33s
Build App / create-tag (push) Successful in 3s
Build App / sync-to-github (push) Successful in 10s

The mission-control (Flight Control) project is being closed upstream.
This embeds the project files directly in the repo under container/mission-control/,
bakes them into the Docker image at /opt/mission-control, and copies them into place
at container startup instead of git cloning from GitHub.

Also adds missing osc52-clipboard, audio-shim, and triple-c-sso-refresh to the
programmatic Docker build context in image.rs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-03 09:09:15 -07:00
parent 57a7cee544
commit 2dffef0767
43 changed files with 7212 additions and 37 deletions

View File

@@ -0,0 +1,126 @@
# Flight Debrief — Project Crew
Crew definitions for post-flight analysis. The Flight Director interviews the
human and project-side agents to capture both execution and design perspectives.
## Crew
### Developer
- **Context**: {project}/
- **Model**: Sonnet
- **Role**: Provides developer perspective on flight execution. Reviews what was
built, identifies technical debt introduced, evaluates implementation quality,
and surfaces issues that flight logs may not capture.
- **Actions**: debrief-interview
### Architect
- **Context**: {project}/
- **Model**: Sonnet
- **Role**: Closes the design feedback loop. Evaluates whether the design decisions
made during flight planning held up in practice. Reviews architectural impact of
what was built and whether the approach should be adjusted for future flights.
- **Actions**: debrief-design-review
## Interaction Protocol
### Developer Interview
1. Flight Director loads flight context (mission, flight, legs, log, actual code)
2. Flight Director spawns **Developer** to review implementation and provide feedback
3. Developer examines code changes, test coverage, patterns used, debt introduced
4. Developer provides structured debrief input
### Architect Interview
1. Flight Director spawns **Architect** to review whether design decisions held up
2. Architect compares flight-design spec against actual implementation
3. Architect evaluates architectural impact and provides feedback for future flights
### Human Interview
1. Flight Director interviews human with targeted questions based on flight log
2. Keep lightweight — 2-3 questions max
### Synthesis
1. Flight Director synthesizes Developer input + Architect input + human input + document analysis
2. Generates debrief artifact
## Template Variables
The Flight Director substitutes these variables in prompts at runtime:
| Variable | Description |
|----------|-------------|
| `{project-slug}` | Project identifier from projects.md |
| `{flight-number}` | Current flight number |
| `{flight-artifact-path}` | Path to the flight artifact file |
## Prompts
### Developer: Debrief Interview
```
role: developer
phase: flight-debrief
project: {project-slug}
flight: {flight-number}
action: debrief-interview
Review the implementation produced during this flight. Examine the code changes,
test coverage, and architectural decisions made.
Provide structured input for the debrief:
**Implementation Quality**:
- Does the code follow project conventions?
- Are there patterns that should be documented?
- What technical debt was introduced?
**Leg Spec Accuracy**:
- Were leg specs clear and sufficient for implementation?
- What was missing or misleading?
- Were acceptance criteria verifiable?
**Testing Assessment**:
- Is test coverage adequate?
- Are there untested edge cases?
- Do tests meaningfully validate behavior?
**Recommendations**:
- What should future flights in this area account for?
- Are there refactoring opportunities?
- What documentation is missing?
```
### Architect: Debrief Design Review
```
role: architect
phase: flight-debrief
project: {project-slug}
flight: {flight-number}
action: debrief-design-review
Read the flight artifact at {flight-artifact-path}. Compare the design decisions,
technical approach, and leg breakdown that were planned against what was actually
implemented.
Provide structured input for the debrief:
**Design Decisions Assessment**:
- Which design decisions held up well in practice?
- Which decisions had to be revised during implementation? Why?
- Were there decisions that should have been made differently?
**Architectural Impact**:
- Did the implementation maintain or improve the system's architecture?
- Were there unplanned structural changes? Are they sound?
- Did the approach create any architectural debt?
**Flight Design Accuracy**:
- Was the technical approach feasible as specified?
- Were prerequisites correctly identified?
- Was the leg breakdown appropriate for the actual work?
**Forward-Looking**:
- What should future flight designs in this area account for?
- Are there architectural patterns that emerged worth standardizing?
- What design assumptions should be revisited?
```

View File

@@ -0,0 +1,72 @@
# Flight Design — Project Crew
Crew definitions for flight specification. The Flight Director designs the
technical spec and uses project-side agents to validate against the real codebase.
## Crew
### Architect
- **Context**: {project}/
- **Model**: Sonnet
- **Role**: Reviews flight specs for technical soundness. Validates design
decisions, prerequisites, technical approach, and leg breakdown against
architecture best practices and actual codebase state. Ensures the flight
is buildable and well-structured.
- **Actions**: review-flight-design
## Interaction Protocol
### Design Review
1. Flight Director creates flight spec and interviews human
2. Flight Director spawns **Architect** to review against codebase
3. Architect evaluates design decisions, prerequisites, approach, leg breakdown
4. Flight Director incorporates feedback
5. Max 2 review cycles — escalate to human if unresolved
## Template Variables
The Flight Director substitutes these variables in prompts at runtime:
| Variable | Description |
|----------|-------------|
| `{project-slug}` | Project identifier from projects.md |
| `{flight-number}` | Current flight number |
| `{flight-artifact-path}` | Path to the flight artifact file |
## Prompts
### Architect: Review Flight Design
```
role: architect
phase: flight-design-review
project: {project-slug}
flight: {flight-number}
action: review-flight-design
Read the flight artifact at {flight-artifact-path}. Cross-reference its design
decisions, prerequisites, technical approach, and leg breakdown against the actual
codebase state and architecture best practices.
Evaluate:
1. Design decisions — are they sound given the real codebase and architecture?
2. Prerequisites — are they accurate? Is anything missing or already done?
3. Technical approach — is it feasible? Does it follow existing patterns?
4. Leg breakdown — are legs well-scoped, properly ordered, with correct dependencies?
5. Codebase state — does the spec account for current working tree, existing tooling,
and conventions that might affect implementation?
6. Architecture — does the approach maintain or improve system structure?
Provide structured output:
**Overall assessment**: approve | approve with changes | needs rework
**Issues** (ranked by severity):
- [high/medium/low] Description — recommended fix
**Suggestions** (non-blocking improvements):
- Description
**Questions** (for the designer to clarify):
- Question
```

View File

@@ -0,0 +1,216 @@
# Leg Execution — Project Crew
Crew definitions and interaction protocol for implementing flight legs.
The Flight Director (Mission Control) orchestrates this phase using the
/agentic-workflow skill.
## Crew
### Developer
- **Context**: {working-directory}/
- **Model**: Sonnet
- **Role**: Implements code changes. Also performs design reviews against real
codebase to validate leg specs before implementation.
- **Actions**: implement, fix-review-issues, commit, review-leg-design
### Reviewer
- **Context**: {working-directory}/
- **Model**: Sonnet (NEVER Opus)
- **Role**: Reviews code changes for quality, correctness, and criteria compliance.
Has NO knowledge of Developer's reasoning — only sees resulting changes.
- **Actions**: review
### Accessibility Reviewer (optional)
- **Context**: {working-directory}/
- **Model**: Sonnet
- **Enabled**: false
- **Role**: Reviews UI changes for accessibility compliance. Evaluates against
WCAG 2.1 AA standards, screen reader compatibility, keyboard navigation,
color contrast, ARIA usage, and semantic HTML. Only spawn when the leg
involves user-facing interface changes.
- **Actions**: review-accessibility
## Separation Rules
- Developer and Reviewer load the target project's CLAUDE.md and conventions
- Reviewer has NO knowledge of Developer's reasoning — only resulting changes
- Each agent instance gets fresh context (no carryover between legs)
**Note:** Handoff signals (`[HANDOFF:review-needed]`, `[HANDOFF:confirmed]`, `[BLOCKED:reason]`, `[COMPLETE:leg]`) are defined by the Flight Control methodology in the agentic-workflow skill, not in this file. Do not modify signal names here — they must match what the Flight Director expects.
## Interaction Protocol
### Design Review
1. Flight Director spawns **Developer** for design review
2. Developer reviews leg against codebase, provides structured assessment
3. Flight Director incorporates feedback
4. Max 2 review cycles — escalate to human if unresolved
### Implementation
1. Flight Director spawns **Developer** to implement
2. Developer implements to acceptance criteria, updates flight log
3. Developer signals [HANDOFF:review-needed] — does NOT commit
### Code Review
1. Flight Director spawns **Reviewer** to evaluate all uncommitted changes
2. If **Accessibility Reviewer** is enabled and leg involves UI changes,
spawn in parallel with Reviewer
3. If issues: Flight Director spawns new **Developer** to fix
4. Loop until all reviewers signal [HANDOFF:confirmed] — max 3 cycles
### Commit
1. Flight Director spawns **Developer** to commit
2. Developer commits code + artifacts, signals [COMPLETE:leg]
## Template Variables
The Flight Director substitutes these variables in prompts at runtime:
| Variable | Description | Available In |
|----------|-------------|-------------|
| `{project-slug}` | Project identifier from projects.md | All prompts |
| `{flight-number}` | Current flight number | All prompts |
| `{leg-number}` | Current leg number | Leg-scoped prompts |
| `{leg-artifact-path}` | Path to the leg artifact file | review-leg-design |
| `{working-directory}` | Resolved working directory for the agent (project root for branch strategy, worktree path for worktree strategy) | All prompts |
| `{reviewer-issues}` | Full text of reviewer feedback (dynamic) | fix-review-issues |
## Prompts
### Developer: Review Leg Design
```
role: developer
phase: leg-design-review
project: {project-slug}
flight: {flight-number}
leg: {leg-number}
action: review-leg-design
Read the leg artifact at {leg-artifact-path}. Cross-reference its acceptance
criteria, implementation guidance, and file references against the actual codebase.
Evaluate:
1. Acceptance criteria — specific, verifiable, complete?
2. Implementation guidance — complete and correctly ordered?
3. Edge cases — missing scenarios?
4. Codebase state — account for working tree, existing tooling, uncommitted changes?
5. File/line references — accurate against current codebase?
6. Dependencies — prerequisite legs completed? Outputs available?
Provide structured output:
**Overall assessment**: approve | approve with changes | needs rework
**Issues** (ranked by severity):
- [high/medium/low] Description — recommended fix
**Suggestions** (non-blocking):
- Description
**Questions** (for the designer):
- Question
```
### Developer: Implement
```
role: developer
phase: leg-implementation
project: {project-slug}
flight: {flight-number}
leg: {leg-number}
action: implement
Read leg artifact. Update leg status to in-flight. Implement to acceptance criteria.
Run tests with a timeout flag appropriate to this project's test runner — fail fast,
do not wait indefinitely for hanging tests. If a test hangs, isolate and fix it.
Update flight log with outcomes. Propagate changes to artifacts (flight, mission, leg),
CLAUDE.md, README, and other project documentation as needed. Do NOT commit yet —
signal [HANDOFF:review-needed] when implementation is complete.
```
### Reviewer: Review
```
role: reviewer
phase: leg-review
project: {project-slug}
flight: {flight-number}
leg: {leg-number}
action: review
Review all changes since the last commit. Evaluate against:
1. Leg acceptance criteria — are all criteria met?
2. Code quality — style, clarity, maintainability
3. Correctness — edge cases, error handling, security
4. Tests — coverage, meaningful assertions, no regressions
5. Artifacts — flight log updated, leg status correct
Signal [HANDOFF:confirmed] if all changes are satisfactory.
If issues found, list them with severity (blocking/non-blocking) and specific
file:line references.
```
### Accessibility Reviewer: Review Accessibility
```
role: accessibility-reviewer
phase: leg-review
project: {project-slug}
flight: {flight-number}
leg: {leg-number}
action: review-accessibility
Review all UI changes since the last commit for accessibility compliance.
Evaluate against:
1. WCAG 2.1 AA — do changes meet Level AA success criteria?
2. Semantic HTML — proper heading hierarchy, landmark regions, form labels?
3. Keyboard navigation — all interactive elements reachable and operable?
4. Screen readers — ARIA attributes correct and meaningful? Live regions?
5. Color and contrast — minimum 4.5:1 for text, 3:1 for large text/UI?
6. Focus management — visible focus indicators, logical tab order?
Signal [HANDOFF:confirmed] if all changes are accessible.
If issues found, list them with severity (blocking/non-blocking), WCAG criterion
reference, and specific file:line references.
```
### Developer: Fix Review Issues
```
role: developer
phase: leg-implementation
project: {project-slug}
flight: {flight-number}
leg: {leg-number}
action: fix-review-issues
Address the following review feedback:
{reviewer-issues}
Fix all blocking issues. Non-blocking issues: fix if straightforward, otherwise
note as accepted. Signal [HANDOFF:review-needed] when fixes are complete.
```
### Developer: Commit
```
role: developer
phase: leg-implementation
project: {project-slug}
flight: {flight-number}
leg: {leg-number}
action: commit
Review has passed. Before committing, complete ALL post-completion checklist items
in the leg artifact:
1. Check off all acceptance criteria in the leg artifact
2. Update leg status to completed
3. Check off this leg in flight.md
4. If final leg: update flight.md status to landed, check off flight in mission.md
Then commit all changes (code + artifacts) with appropriate message.
Signal [COMPLETE:leg].
```

View File

@@ -0,0 +1,74 @@
# Mission Debrief — Project Crew
Crew definitions for post-mission retrospective. The Flight Director interviews
both the human and a project-side Architect to capture strategic technical perspective.
## Crew
### Architect
- **Context**: {project}/
- **Model**: Sonnet
- **Role**: Provides architectural perspective on mission outcomes. Evaluates
whether the system evolved well across flights, identifies structural issues,
and assesses long-term maintainability of what was built.
- **Actions**: debrief-interview
## Interaction Protocol
### Architect Interview
1. Flight Director loads full mission context (all flights, logs, debriefs, code)
2. Flight Director spawns **Architect** to review overall system evolution
3. Architect examines architectural changes across all flights
4. Architect provides structured debrief input
### Human Interview
1. Flight Director interviews human with mission-level questions
2. Covers coordination experience, outcome satisfaction, process feedback
### Synthesis
1. Flight Director synthesizes Architect input + human input + document analysis
2. Generates mission debrief artifact
## Template Variables
The Flight Director substitutes these variables in prompts at runtime:
| Variable | Description |
|----------|-------------|
| `{project-slug}` | Project identifier from projects.md |
## Prompts
### Architect: Debrief Interview
```
role: architect
phase: mission-debrief
project: {project-slug}
action: debrief-interview
Review the system changes produced across all flights in this mission. Examine
the architectural evolution, pattern consistency, and structural health.
Provide structured input for the debrief:
**Architectural Assessment**:
- Did the system's architecture improve, maintain, or degrade?
- Are there structural issues that emerged across flights?
- Were design decisions consistent across the mission?
**Pattern Analysis**:
- What patterns were established? Are they good ones?
- Is there inconsistency that should be reconciled?
- Are there reusable patterns worth documenting?
**Technical Debt**:
- What debt was introduced across the mission?
- What's the priority for addressing it?
- Are there quick wins vs. long-term concerns?
**Forward-Looking**:
- What architectural considerations should the next mission account for?
- Are there scaling or performance concerns on the horizon?
- What documentation or conventions should be established?
```

View File

@@ -0,0 +1,72 @@
# Mission Design — Project Crew
Crew definitions for mission planning. The Flight Director interviews the human
and uses project-side agents to validate technical viability.
## Crew
### Architect
- **Context**: {project}/
- **Model**: Sonnet
- **Role**: Validates technical viability of proposed outcomes. Ensures business
goals align with what's actually possible given the codebase, stack, and
constraints. Does NOT add implementation details — focuses on feasibility,
risks, and architectural implications.
- **Actions**: validate-mission
## Interaction Protocol
### Research & Interview
1. Flight Director researches codebase and external context
2. Flight Director interviews human about outcomes, stakeholders, constraints, criteria
3. Human must explicitly sign off before proceeding — iterate until approved
### Technical Viability Check
1. Flight Director spawns **Architect** to review draft mission against codebase
2. Architect evaluates: Are proposed outcomes achievable? Are there technical risks
the mission doesn't account for? Does the stack support what's being asked?
3. Architect provides assessment — feasible / feasible with caveats / not feasible
4. Flight Director incorporates feedback, re-interviews human if scope changes
5. Human gives final sign-off
## Template Variables
The Flight Director substitutes these variables in prompts at runtime:
| Variable | Description |
|----------|-------------|
| `{project-slug}` | Project identifier from projects.md |
## Prompts
### Architect: Validate Mission
```
role: architect
phase: mission-design
project: {project-slug}
action: validate-mission
Read the draft mission artifact. Cross-reference proposed outcomes and success
criteria against the actual codebase, stack, and project constraints.
Evaluate:
1. Technical feasibility — can the proposed outcomes be achieved with this stack?
2. Architectural implications — does this require significant structural changes?
3. Risk factors — what technical risks could block success?
4. Constraints accuracy — are stated constraints complete and correct?
5. Sizing — is the scope realistic for a mission (days-to-weeks)?
Provide structured output:
**Feasibility**: feasible | feasible with caveats | not feasible
**Risks** (ranked by impact):
- [high/medium/low] Description — mitigation
**Caveats** (if feasible with caveats):
- Description
**Questions** (for the Flight Director):
- Question
```

View File

@@ -0,0 +1,592 @@
# Routine Maintenance — Project Crew
Crew definitions for codebase health inspection. The Flight Director
coordinates specialist reviewers for automated checks and an Architect
for severity assessment and roundtable moderation.
## Crew
### Inspector
- **Context**: {project}/
- **Model**: Sonnet
- **Role**: Performs broad read-only codebase inspection across all applicable
categories. Runs test suites, linters, type checkers, audit commands, and
manual code review. Returns structured findings without modifying any files.
- **Actions**: inspect-codebase
### Security Reviewer
- **Context**: {project}/
- **Model**: Sonnet
- **Role**: Performs focused manual security review of authentication flows,
injection surfaces, secrets handling, CORS/CSP configuration, and data
exposure risks. Goes deeper than the Inspector's Category 1 automated checks
with targeted code path analysis.
- **Actions**: review-security
### CI/CD Reviewer (optional)
- **Context**: {project}/
- **Model**: Sonnet
- **Enabled**: false (enable when project has CI/CD pipelines)
- **Role**: Reviews CI/CD pipeline configuration, build security, deployment
practices, and environment consistency. Evaluates pipeline definitions,
secret management in CI, and deployment safeguards.
- **Actions**: review-cicd
### Accessibility Reviewer (optional)
- **Context**: {project}/
- **Model**: Sonnet
- **Enabled**: false (enable when project has user-facing UI)
- **Role**: Reviews codebase for accessibility compliance against WCAG 2.1 AA
standards. Evaluates semantic HTML, keyboard navigation, screen reader
compatibility, color contrast, ARIA usage, and focus management.
- **Actions**: review-accessibility
### Architect
- **Context**: {project}/
- **Model**: Opus
- **Role**: Reviews all reviewer findings alongside debrief context. Assigns
severity per finding, challenges questionable assessments, moderates
roundtable discussion with specialist reviewers, and produces final codebase
assessment with maintenance scope recommendation.
- **Actions**: assess-findings, moderate-roundtable
## Separation Rules
- All reviewers are strictly **read-only** — they may run commands but must NEVER modify files
- Each reviewer operates independently during Phase 4 — no cross-reviewer communication
- The Architect sees all reviewer findings but not their internal reasoning
- Roundtable discussion is mediated by the Flight Director, not direct agent-to-agent
**Note:** Handoff signals are not used in this crew. The routine-maintenance workflow is
sequential (review → assess → roundtable → report) and does not use the leg-based
handoff protocol.
## Interaction Protocol
### Delegation Planning
1. Flight Director loads context, conducts scoping interview with human
2. Flight Director assesses project size and identifies module boundaries
3. Flight Director builds delegation plan (agent count, scope assignments, partitioning)
4. Human approves or adjusts the plan
### Specialist Review
1. Flight Director spawns agents per the delegation plan — Inspector(s) + Security Reviewer always, CI/CD and Accessibility if enabled
2. Each agent receives its scope assignment and output discipline rules
3. All reviewers perform read-only checks and return structured findings
4. For partitioned Inspectors: Flight Director merges and de-duplicates findings
### Initial Assessment
1. Flight Director spawns **Architect** (Opus) with all reviewer findings + debrief context
2. Architect assigns initial severity per finding
3. Architect raises challenges or questions directed at specific reviewers
### Roundtable
1. Flight Director routes Architect's challenges to the relevant reviewers
2. Each challenged reviewer responds with evidence, rebuttals, or concurrence
3. Flight Director collects responses and spawns Architect for final resolution
4. Architect produces final assessment incorporating roundtable discussion
5. Max 2 roundtable cycles — unresolved disagreements go to the human
### Human Review and Scoping
1. Flight Director presents findings to human, grouped by severity
2. Human confirms, overrides, or adjusts findings
3. If Maintenance Required: Flight Director recommends a shortlist (~5-7 items); human selects scope for maintenance mission
4. Deferred findings remain in the report for future cycles
### Synthesis
1. Flight Director generates maintenance report artifact
2. If confirmed: Flight Director creates maintenance mission scaffold
## Template Variables
The Flight Director substitutes these variables in prompts at runtime:
| Variable | Description |
|----------|-------------|
| `{project-slug}` | Project identifier from projects.md |
| `{applicable-categories}` | Numbered list of categories to inspect (1-7 always, 8-10 conditional) |
| `{project-stack}` | Language, framework, test runner, linter, formatter, type checker, audit tool |
| `{known-debt}` | Debt items from mission debrief and flight debriefs (if available, otherwise "None — ad-hoc inspection") |
| `{known-security-debt}` | Security-specific debt items extracted from debriefs (if available, otherwise "None") |
| `{known-cicd-debt}` | CI/CD-specific debt items extracted from debriefs (if available, otherwise "None") |
| `{areas-of-concern}` | User-specified areas of concern from scoping interview |
| `{scope-assignment}` | Scope restriction from the delegation plan (files, directories, or "full project") |
| `{all-reviewer-findings}` | Combined structured findings from all reviewers (used in Architect prompts) |
| `{architect-challenges}` | Architect's challenges directed at a specific reviewer (used in roundtable) |
| `{roundtable-responses}` | All reviewer rebuttals and responses from the roundtable (used in resolution) |
## Prompts
### Inspector: Inspect Codebase
```
role: inspector
phase: routine-maintenance
project: {project-slug}
action: inspect-codebase
Perform a read-only codebase inspection across the following categories:
{applicable-categories}
Project stack: {project-stack}
Known debt from prior debriefs, if available (do not re-flag as new discoveries):
{known-debt}
User areas of concern:
{areas-of-concern}
IMPORTANT: You are strictly READ-ONLY. You may run test suites, linters, type
checkers, audit commands, and read any file. You must NEVER modify source files,
configuration, dependencies, or any other project file.
**Scope assignment**: If a scope restriction is provided, inspect only the
specified files and directories. Run automated tools against the full project
(tools are fast and comprehensive), but limit manual code review to the assigned
scope. If no scope restriction is given, inspect the full project.
For each applicable category, perform the checks listed below and report findings.
**Category 1 — Security**:
- Review auth paths (focus on recently changed code if mission context is available)
- Check input sanitization on endpoints
- Verify CORS/CSP configuration
- Scan for hardcoded secrets (API keys, tokens, passwords)
- Review third-party data flow for exposure risks
**Category 2 — Test Systems**:
- Run the test suite and report results
- Check coverage delta (if tooling available)
- Find new code paths without test coverage
- Detect flaky tests (tests that pass/fail inconsistently)
- Check test performance (slow tests)
- Find hardcoded test data that should be fixtures
**Category 3 — Dependency Health**:
- Run the dependency audit command (npm audit, cargo audit, etc.)
- Check for outdated dependencies
- Find unused dependencies
- Verify lockfile is consistent
- Check license compliance
- Check for Dependabot/Renovate PRs and security alerts
- Assess auto-merge eligibility for patch updates
**Category 4 — Code Quality**:
- Run linter and formatter check (report violations, do NOT fix)
- Find dead code (unused exports, unreachable branches)
- Grep for TODOs/FIXMEs/HACKs (focus on recently introduced ones if mission context is available)
- Detect code duplication
- Check pattern consistency with existing codebase
**Category 5 — Type & API Safety**:
- Run the type checker and report errors
- Find `any` casts (TypeScript), `unsafe` blocks (Rust), or equivalent
- Check for unhandled errors or missing error types
- Detect API contract drift (mismatched types between client/server)
- Find deprecated API usage
**Category 6 — Documentation**:
- Check README accuracy against current state
- Verify new public interfaces have documentation
- Find stale comments referencing old behavior
- Check CHANGELOG for completeness
- Verify CLAUDE.md accuracy
**Category 7 — Git & Branch Hygiene**:
- List stale branches (merged but not deleted)
- Find large committed files (>1MB)
- Scan for secrets in recent git history
- Check commit message quality
- Check for GitHub/remote warnings (secret scanning, code scanning alerts)
- Find merge conflicts against main
- Check upstream divergence
**Category 8 — CI/CD Pipeline** (if applicable):
- Check CI status on main/default branch
- Detect build time regression
- Find skipped or disabled checks
- Check config drift between environments
**Category 9 — Infrastructure & Config** (if applicable):
- Check env var documentation (.env.example vs actual usage)
- Find pending database migrations
- Find temporary feature flags that should be removed
**Category 10 — Performance & Observability** (if applicable):
- Find new operations without logging/tracing
- Detect potential N+1 queries
- Check bundle size (if web project)
- Find resource cleanup issues (unclosed connections, missing cleanup)
**Output discipline**: Keep findings concise. Do not paste full command output,
full file contents, or long dependency lists. Summarize and reference.
**Output format**: Return findings as a structured list per category:
## Category {N}: {Name}
### Finding: {title}
- **Evidence**: {one-line summary with file paths and line numbers}
- **Impact**: {what could go wrong}
- **Recommendation**: {what to do about it}
Include code excerpts only for Critical or High severity findings.
If a category has no issues, report:
## Category {N}: {Name}
No issues found.
```
### Security Reviewer: Review Security
```
role: security-reviewer
phase: routine-maintenance
project: {project-slug}
action: review-security
Perform a focused, manual security review of the codebase. You go deeper than
automated scanning — trace actual code paths and evaluate security posture.
Project stack: {project-stack}
Known security debt from prior debriefs (do not re-flag as new discoveries):
{known-security-debt}
User areas of concern:
{areas-of-concern}
IMPORTANT: You are strictly READ-ONLY. You may run commands and read any file.
You must NEVER modify source files, configuration, dependencies, or any other
project file.
**Scope assignment**: Review only the files and areas specified. If no scope
restriction is given, review the full project.
**Output discipline**: Keep findings concise. Include code excerpts only for
Critical or High severity findings. Do not paste full file contents or raw
command output.
**Review areas**:
1. **Authentication & Authorization**
- Trace auth flows end-to-end (login, token refresh, logout)
- Check for missing auth checks on protected routes/endpoints
- Verify role-based access control is enforced consistently
- Look for privilege escalation paths
2. **Injection Surfaces**
- SQL/NoSQL injection: check all database queries for parameterization
- Command injection: check shell executions, subprocess calls
- XSS: check output encoding in templates and API responses
- Path traversal: check file system operations with user input
3. **Secrets & Configuration**
- Scan for hardcoded credentials, API keys, tokens in source
- Check .env files are gitignored
- Verify secrets are not logged or included in error responses
- Check for overly permissive CORS configuration
4. **Data Handling**
- Review PII/sensitive data flows — where is it stored, logged, transmitted?
- Check encryption at rest and in transit
- Verify sensitive data is not cached inappropriately
- Check for data leakage in error messages or debug output
5. **Dependency Risk**
- Cross-reference critical dependencies against known CVE databases
- Check for dependencies with known supply-chain risks
- Verify integrity checks (lockfile hashes, checksums)
**Output format**: Return findings as a structured list:
### Finding: {title}
- **Severity estimate**: critical | high | medium | low
- **Attack vector**: {how this could be exploited}
- **Evidence**: {specific code paths, file:line references}
- **Recommendation**: {what to do about it}
If no security issues found, state: "No security issues identified."
```
### CI/CD Reviewer: Review CI/CD
```
role: cicd-reviewer
phase: routine-maintenance
project: {project-slug}
action: review-cicd
Perform a focused review of the project's CI/CD pipeline configuration,
build security, and deployment practices.
Project stack: {project-stack}
Known CI/CD debt from prior debriefs (do not re-flag as new discoveries):
{known-cicd-debt}
User areas of concern:
{areas-of-concern}
IMPORTANT: You are strictly READ-ONLY. You may run commands and read any file.
You must NEVER modify source files, configuration, dependencies, or any other
project file.
**Output discipline**: Keep findings concise. Include code excerpts only for
Critical or High severity findings. Do not paste full file contents or raw
command output.
**Review areas**:
1. **Pipeline Configuration**
- Review pipeline definitions (GitHub Actions, GitLab CI, Concourse, etc.)
- Check for outdated action/image versions
- Verify branch protection rules are consistent with pipeline triggers
- Detect redundant or overlapping pipeline steps
2. **Build Security**
- Check for secrets exposed in build logs or artifacts
- Verify pipeline secrets are scoped appropriately (not org-wide when repo-level suffices)
- Check for unpinned dependencies in build steps (e.g., `uses: action@main` vs `@v4.1.0`)
- Review build artifact permissions and retention policies
3. **Deployment Safeguards**
- Verify deployment gates exist (approval steps, environment protection rules)
- Check rollback capability — is there a documented or automated rollback path?
- Verify environment promotion flow (dev → staging → prod) is enforced
- Check for drift between environment configurations
4. **Pipeline Health**
- Check recent build success rates and durations
- Identify flaky pipeline steps
- Find disabled or skipped checks that should be active
- Check for resource waste (oversized runners, unnecessary matrix builds)
**Output format**: Return findings as a structured list:
### Finding: {title}
- **Severity estimate**: critical | high | medium | low
- **Evidence**: {specific config files, pipeline definitions, line references}
- **Impact**: {what could go wrong}
- **Recommendation**: {what to do about it}
If no CI/CD issues found, state: "No CI/CD issues identified."
```
### Accessibility Reviewer: Review Accessibility
```
role: accessibility-reviewer
phase: routine-maintenance
project: {project-slug}
action: review-accessibility
Perform a focused accessibility review of the project's user-facing UI.
Evaluate against WCAG 2.1 AA standards.
Project stack: {project-stack}
IMPORTANT: You are strictly READ-ONLY. You may run commands and read any file.
You must NEVER modify source files, configuration, dependencies, or any other
project file.
**Output discipline**: Keep findings concise. Include code excerpts only for
Critical or High severity findings. Do not paste full file contents or raw
command output.
**Review areas**:
1. **Semantic HTML & Structure**
- Check heading hierarchy (h1-h6 in logical order)
- Verify landmark regions (main, nav, aside, footer)
- Check form labels and fieldset/legend usage
- Verify list markup for list-like content
2. **Keyboard Navigation**
- Check all interactive elements are reachable via Tab
- Verify custom widgets have appropriate keyboard handlers
- Check for keyboard traps (modals, dropdowns)
- Verify skip-to-content links exist
3. **Screen Reader Compatibility**
- Check ARIA attributes for correctness and necessity
- Verify dynamic content updates use live regions
- Check image alt text (present, meaningful, not redundant)
- Verify form error messages are associated with inputs
4. **Visual & Color**
- Check text contrast ratios (4.5:1 normal, 3:1 large text)
- Verify UI component contrast (3:1 against background)
- Check that color is not the sole indicator of meaning
- Verify visible focus indicators on all interactive elements
5. **Motion & Timing**
- Check for prefers-reduced-motion support on animations
- Verify no auto-playing media without controls
- Check for appropriate timeouts with user notification
**Output format**: Return findings as a structured list:
### Finding: {title}
- **WCAG criterion**: {e.g., 1.1.1 Non-text Content, Level A}
- **Severity estimate**: critical | high | medium | low
- **Evidence**: {specific components, file:line references}
- **Recommendation**: {what to do about it}
If no accessibility issues found, state: "No accessibility issues identified."
```
### Architect: Assess Findings
```
role: architect
phase: routine-maintenance
project: {project-slug}
action: assess-findings
Review all specialist findings and assign severity ratings. You have access to:
- All reviewer findings (provided below)
- Known debt context from debriefs and prior maintenance reports (if available)
{all-reviewer-findings}
Known debt from debriefs, if available (already acknowledged — note as "previously identified" if re-found):
{known-debt}
For each finding, assign one of:
- **Pass** — No issue (reviewer flagged something that is actually fine)
- **Advisory** — Minor issue, acceptable to defer
- **Action Required** — Should be addressed before next major work cycle
- **Critical** — Blocks further work, immediate attention needed
**Assessment criteria**:
- Does this finding represent a real risk, or is it noise?
- Is the severity proportional to the actual impact?
- Would this compound if left for another cycle?
- Is this a new discovery or previously acknowledged debt?
- Do multiple reviewers corroborate the same issue?
- Are any reviewer assessments questionable — too alarmist or too dismissive?
**Challenge reviewers** where you disagree or need clarification. For each
challenge, name the reviewer and provide your specific question or objection.
This initiates the roundtable discussion.
**Output format**:
## Overall Assessment
{Flight Ready | Maintenance Required}
## Findings
| # | Source | Category | Finding | Initial Severity | New/Known | Notes |
|---|--------|----------|---------|-----------------|-----------|-------|
| 1 | {reviewer} | {cat} | {title} | {severity} | {new/known} | {brief note} |
## Challenges for Roundtable
### To {Reviewer Name}: {question or objection}
{Context for why you're challenging this finding — what seems off, what
additional evidence would change your assessment, or why you think the
severity should be different.}
## Severity Summary
- Critical: {N}
- Action Required: {N}
- Advisory: {N}
- Pass: {N}
## Recommended Maintenance Scope
(Only if Maintenance Required)
Group related Action Required and Critical findings into suggested flight scopes:
### Flight: {suggested title}
- Finding #{N}: {title}
- Finding #{N}: {title}
- Rationale: {why these group together}
```
### Reviewer: Roundtable Rebuttal
```
role: {reviewer-role}
phase: routine-maintenance
project: {project-slug}
action: roundtable-rebuttal
The Architect has challenged one or more of your findings during the
severity assessment roundtable. Respond to each challenge with evidence.
Architect's challenges:
{architect-challenges}
For each challenge:
1. **Provide additional evidence** — code paths, specific examples, tool output
that supports your finding
2. **Concede if appropriate** — if the Architect raises a valid point, adjust
your assessment rather than defending a weak position
3. **Clarify misunderstandings** — if the Architect misread your finding,
restate it with more precision
Be direct and evidence-based. The goal is consensus, not debate for its own sake.
**Output format**:
### Re: {Architect's challenge title}
- **Response**: {concur | rebut | clarify}
- **Evidence**: {additional code paths, line references, tool output}
- **Revised assessment** (if changed): {updated severity or recommendation}
```
### Architect: Roundtable Resolution
```
role: architect
phase: routine-maintenance
project: {project-slug}
action: roundtable-resolution
Review the roundtable responses from specialist reviewers and produce your
final assessment.
Reviewer responses:
{roundtable-responses}
For each challenged finding:
1. **Weigh the evidence** — did the reviewer provide convincing support?
2. **Assign final severity** — this is your call, but account for reviewer expertise
3. **Note reasoning** — briefly explain why you maintained or changed severity
If any disagreements remain unresolved, flag them for human review rather than
forcing consensus.
**Output format**:
## Roundtable Resolution
### Finding #{N}: {title}
- **Original severity**: {severity}
- **Reviewer response**: {concur | rebut | clarify} — {summary}
- **Final severity**: {severity}
- **Reasoning**: {why}
## Updated Overall Assessment
{Flight Ready | Maintenance Required}
## Updated Severity Summary
- Critical: {N}
- Action Required: {N}
- Advisory: {N}
- Pass: {N}
## Unresolved Disagreements (if any)
{Finding and both perspectives — for human to decide}
## Updated Recommended Maintenance Scope
(Only if Maintenance Required — incorporate roundtable outcomes)
### Flight: {suggested title}
- Finding #{N}: {title}
- Finding #{N}: {title}
- Rationale: {why these group together}
```