Meta-Harness
A minimal outer loop that delegates selection AND mutation to a skill-steered proposer over an append-only candidate history, returning a (score x cost) Pareto frontier.
# Meta-Harness — faithful port of the Meta-Harness reference defaults (stanford-iris-lab/
# meta-harness, reference_examples/text_classification — the canonical example; the code is the
# ground truth wherever it diverges from the paper, e.g. k=3 per the shipped SKILL.md).
# Sections mirror the six core components.
seed: 0
general:
max_iterations: 60 # reference --iterations 20 x k=3 candidates: the galapagos loop is
# one-candidate-per-iteration, so 20 reference iters = 60 here
population: # MetaHarnessPopulation (Pareto frontier)
cost_metric: genome_chars # frontier cost axis (minimize). genome_chars = len(genome content),
# the universal analogue of the reference's memory_context_chars
# (characters, not tokens); may name any task metric key instead
# (falls back to genome_chars when the key is absent)
proposer: # MetaHarnessProposer
candidates_per_proposal: 3 # k — SKILL.md "implement 3 new memory systems every iteration"
# (enforced by the steering; the Policy and PromptBuilder reuse k)
prompt_builder: # MetaHarnessPromptBuilder
skill: skills/meta-harness/SKILL.md
# the proposer steering — the search's PRIMARY HYPERPARAMETER (the
# paper's practical-tips appendix: editing the skill text moved
# results more than any loop constant). A real Agent-Skills SKILL.md,
# dir-per-skill like the reference's .claude/skills/meta-harness/;
# we ship it under skills/ (NOT .claude/) because the package tree
# and the hub Files tab exclude dot-directories. Relative paths
# resolve against the scaffold package dir first, then the cwd;
# absolute paths are taken as-is — so the researcher workflow is:
# copy the bundled SKILL.md, edit it, point this key at the copy.
# Body tokens {candidates_per_proposal} / {exploitation_axes} are
# substituted at load time (str.replace on that documented set only).
top_k_sources: 3 # full sources of this many frontier members in the prompt — the
# skill Step 3 copy-then-edit pool (the agent reads them from agents/)
reports_in_prompt: 6 # most recent <=30-line candidate reports replayed (reports/ analogue)
trace_errors: 2 # execution-trace excerpts sampled errors-first ...
trace_successes: 1 # ... then successes ("deep-read failed AND successful trajectories")
trace_max_chars: 1500 # clip per excerpt (the chat-port trace budget)
summary_max_rows: 200 # evolution-summary rows rendered into the prompt — ALL rows up to
# this cap (most recent kept, never fewer than the most recent 50)
# NOTE (deviations from the upstream config surface, all documented in scaffold.py):
# - the reference's 30 s subprocess import-check is compile(source) in-process (safe for arbitrary
# task programs); its 2400 s proposer timeout / 7200 s benchmark timeout are owned by the model
# host and the task evaluator in galapagos and are not re-exposed here.
# - the held-out test phase (Phase Final) is not ported: galapagos tasks own their split
# discipline; the run returns the Pareto frontier in result.summary instead.