Meta-Harness/meta_harness

Meta-Harness

A minimal outer loop that delegates selection AND mutation to a skill-steered proposer over an append-only candidate history, returning a (score x cost) Pareto frontier.

Test-time searchMIT

About

Meta-Harness (Stanford IRIS Lab) strips the evolutionary outer loop to its bare minimum: there is no parent selection, no archive policy, and no mutation operator. The entire search lives in a skill-steered coding-agent proposer that reads the whole candidate history — every prior program, its score, its report, and its execution trace — and writes a fixed number of brand-new full programs each round. The outer loop's only jobs are to validate each candidate's interface, evaluate the valid ones, append their outcomes to a running summary, and recompute a Pareto frontier. The run's product is that frontier, not a single best program.

The proposer is constrained by near-verbatim steering rules carried in an editable `SKILL.md` file rather than in code, because the paper's practical-tips appendix found that editing the skill text moved results more than any loop constant. Those rules forbid parameter-only variants ("identical except constants => rewrite"), forbid dataset-specific hardcoding, forbid early stopping, cap each candidate's report at 30 lines, and rotate the search across six exploitation axes so successive rounds explore different mechanism families instead of clustering on one.

This is a faithful port of the reference implementation (the canonical `text_classification` example), with the code treated as ground truth wherever paper and code diverge. The original proposer is a Claude Code session with filesystem tools steered through `--append-system-prompt`; a chat proposer cannot browse, so this port serializes the exact slice the skill's reading list points the agent at — the evolution-summary table, the Pareto frontier, recent reports, errors-first trace excerpts, and the full source of the top frontier members — into a single prompt and FIFO-dispenses the parsed candidates one per Galapagos iteration.

Because Galapagos runs one child per iteration while the reference evaluates k candidates per proposer session, a reference run of N iterations maps to N*k Galapagos iterations. The bundled budget of 60 is 20 reference iterations times k=3 candidates. The cost axis of the frontier is the genome character count by default — the universal analogue of the reference's injected-context character count — though any task metric key may be named instead.

Composition

6/6 blocks

The six components this scaffold snaps together. Each block names its concrete implementation.

Population

append_only_pareto

Selection

proposer_delegated_frontier_anchor

Prompt

skill_steered_filesystem_view

Proposer

k_candidate_queue

Evaluator

task

Memory

evolution_summary_reports

Populationappend_only_pareto
The set of candidate solutions in play — the gene pool the search evolves over.
Selectionproposer_delegated_frontier_anchor
Decides which genomes survive and reproduce — tournament, elitism, novelty, or your own policy.
Promptskill_steered_filesystem_view
Assembles the context handed to the model — parents, feedback, instructions, examples.
Proposerk_candidate_queue
The LLM-driven variation operator — proposes new candidates by mutation and crossover.
Evaluatortask
Scores each candidate against the task — the fitness signal that drives selection.
Memoryevolution_summary_reports
Persists discoveries across generations — archives, islands, and lineage for the search.

Source

Meta-Harness: End-to-End Optimization of Model Harnesses (Stanford IRIS Lab, arXiv:2603.28052); reference implementation in references/test_time_search_scaffolds/meta_harness (text_classification example)

Quick facts

Downloads0

LicenseMIT

Default modelopenai/gpt-5.5

ControllerMetaHarnessScaffold

Use this scaffold

example.py

from galapagos import GalapagosScaffold

scaffold = GalapagosScaffold.from_card(name="meta_harness")
result = scaffold.run(task="<task_name>")

About

Meta-Harness/meta_harness

About

Composition

Tags

Source

Meta-Harness/meta_harness

About

Composition

Tags

Source