galapagos
DocsHubLeaderboardPlaygroundNews
galapagos

six blocks · any task ·
better solutions emerge.

Platform

  • Hub
  • Leaderboard
  • Playground

Resources

  • Docs
  • API reference
  • Card spec

Community

  • GitHub
  • Contribute

Updates

  • News
  • Releases

© 2026 Galapagos. Licensed under Apache-2.0.

Build your own scaffold.

Hub/Scaffolds/Meta-Harness/Meta-Harness

Meta-Harness/meta_harness

Meta-Harness

A minimal outer loop that delegates selection AND mutation to a skill-steered proposer over an append-only candidate history, returning a (score x cost) Pareto frontier.

Test-time searchMIT
Scaffold cardFiles and versions
meta_harness/config.yaml
43 lines · 3.6 KByamlDownload
# Meta-Harness — faithful port of the Meta-Harness reference defaults (stanford-iris-lab/
# meta-harness, reference_examples/text_classification — the canonical example; the code is the
# ground truth wherever it diverges from the paper, e.g. k=3 per the shipped SKILL.md).
# Sections mirror the six core components.
seed: 0
general:
  max_iterations: 60          # reference --iterations 20 x k=3 candidates: the galapagos loop is
                              # one-candidate-per-iteration, so 20 reference iters = 60 here
population:                    # MetaHarnessPopulation (Pareto frontier)
  cost_metric: genome_chars   # frontier cost axis (minimize). genome_chars = len(genome content),
                              # the universal analogue of the reference's memory_context_chars
                              # (characters, not tokens); may name any task metric key instead
                              # (falls back to genome_chars when the key is absent)
proposer:                     # MetaHarnessProposer
  candidates_per_proposal: 3  # k — SKILL.md "implement 3 new memory systems every iteration"
                              # (enforced by the steering; the Policy and PromptBuilder reuse k)
prompt_builder:               # MetaHarnessPromptBuilder
  skill: skills/meta-harness/SKILL.md
                              # the proposer steering — the search's PRIMARY HYPERPARAMETER (the
                              # paper's practical-tips appendix: editing the skill text moved
                              # results more than any loop constant). A real Agent-Skills SKILL.md,
                              # dir-per-skill like the reference's .claude/skills/meta-harness/;
                              # we ship it under skills/ (NOT .claude/) because the package tree
                              # and the hub Files tab exclude dot-directories. Relative paths
                              # resolve against the scaffold package dir first, then the cwd;
                              # absolute paths are taken as-is — so the researcher workflow is:
                              # copy the bundled SKILL.md, edit it, point this key at the copy.
                              # Body tokens {candidates_per_proposal} / {exploitation_axes} are
                              # substituted at load time (str.replace on that documented set only).
  top_k_sources: 3            # full sources of this many frontier members in the prompt — the
                              # skill Step 3 copy-then-edit pool (the agent reads them from agents/)
  reports_in_prompt: 6        # most recent <=30-line candidate reports replayed (reports/ analogue)
  trace_errors: 2             # execution-trace excerpts sampled errors-first ...
  trace_successes: 1          # ... then successes ("deep-read failed AND successful trajectories")
  trace_max_chars: 1500       # clip per excerpt (the chat-port trace budget)
  summary_max_rows: 200       # evolution-summary rows rendered into the prompt — ALL rows up to
                              # this cap (most recent kept, never fewer than the most recent 50)
# NOTE (deviations from the upstream config surface, all documented in scaffold.py):
# - the reference's 30 s subprocess import-check is compile(source) in-process (safe for arbitrary
#   task programs); its 2400 s proposer timeout / 7200 s benchmark timeout are owned by the model
#   host and the task evaluator in galapagos and are not re-exposed here.
# - the held-out test phase (Phase Final) is not ported: galapagos tasks own their split
#   discipline; the run returns the Pareto frontier in result.summary instead.