Autoresearch — Feynman Docs

The autoresearch workflow runs a bounded research experiment loop that iteratively proposes changes, runs a benchmark, records evidence, and decides whether to keep or revert each change. It is designed for model, retrieval, prompt, architecture, or dataset experiments where the feedback signal is explicit.

Usage

From the REPL:

/autoresearch Optimize prompt engineering strategies for math reasoning on GSM8K

From the CLI:

feynman autoresearch "Optimize prompt engineering strategies for math reasoning on GSM8K"

Autoresearch runs in the active Feynman session after you confirm the benchmark, metric, environment, files in scope, and iteration limit.

How it works

The workflow begins by analyzing the research goal and designing an initial experiment plan. It then enters an iterative loop:

Hypothesis — The agent proposes a hypothesis or modification based on current results
Experiment — It designs and executes an experiment to test the hypothesis
Analysis — Results are analyzed and compared against prior iterations
Decision — The agent decides whether to continue the current direction, try a variation, or pivot to a new approach

Each iteration builds on the previous ones. The agent maintains a running log of what has been tried, what worked, what failed, and what the current best result is. This prevents repeating failed approaches and ensures the search progresses efficiently.

Monitoring and control

The loop writes autoresearch.md, autoresearch.jsonl, and benchmark output in the active workspace. Use those files, plus CHANGELOG.md milestone entries, to inspect the current best result, failed hypotheses, and next step.

Output format

Autoresearch produces a running experiment log that includes:

Experiment History — What was tried in each iteration with parameters and results
Best Configuration — The best-performing setup found so far
Ablation Results — Which factors mattered most based on the experiments run
Recommendations — Suggested next steps based on observed trends

When to use it

Use /autoresearch for research tasks that benefit from iterative exploration: hyperparameter optimization, prompt-strategy evaluation, architecture search, retrieval tuning, or dataset/benchmark ablations where the search space is large and the feedback signal is clear. It is not the right tool for answering a specific question (use /deepresearch for that) and it is not a generic code-optimization loop.