Peer Review

Usage

/review <artifact>

What it does

Simulates a tough-but-fair peer review for AI research artifacts. Evaluates novelty, empirical rigor, baselines, ablations, and reproducibility.

The reviewer agent identifies:

  • Weak baselines
  • Missing ablations
  • Evaluation mismatches
  • Benchmark leakage
  • Under-specified implementation details

Severity levels

Feedback is graded by severity:

  • FATAL — Fundamental issues that invalidate the claims
  • MAJOR — Significant problems that need addressing
  • MINOR — Small improvements or clarifications

Example

/review outputs/scaling-laws-brief.md

Output

Structured review with:

  • Summary of the work
  • Strengths
  • Weaknesses (severity-graded)
  • Questions for the authors
  • Verdict (accept / revise / reject)
  • Revision plan