galapagos
DocsHubLeaderboardPlaygroundNews
galapagos

six blocks · any task ·
better solutions emerge.

Platform

  • Hub
  • Leaderboard
  • Playground

Resources

  • Docs
  • API reference
  • Card spec

Community

  • GitHub
  • Contribute

Updates

  • News
  • Releases

© 2026 Galapagos. Licensed under Apache-2.0.

Build your own scaffold.

Hub/Scaffolds/Meta-Harness/Meta-Harness

Meta-Harness/meta_harness

Meta-Harness

A minimal outer loop that delegates selection AND mutation to a skill-steered proposer over an append-only candidate history, returning a (score x cost) Pareto frontier.

Test-time searchMIT
Scaffold cardFiles and versions

About

Meta-Harness (Stanford IRIS Lab) strips the evolutionary outer loop to its bare minimum: there is no parent selection, no archive policy, and no mutation operator. The entire search lives in a skill-steered coding-agent proposer that reads the whole candidate history — every prior program, its score, its report, and its execution trace — and writes a fixed number of brand-new full programs each round. The outer loop's only jobs are to validate each candidate's interface, evaluate the valid ones, append their outcomes to a running summary, and recompute a Pareto frontier. The run's product is that frontier, not a single best program.

The proposer is constrained by near-verbatim steering rules carried in an editable `SKILL.md` file rather than in code, because the paper's practical-tips appendix found that editing the skill text moved results more than any loop constant. Those rules forbid parameter-only variants ("identical except constants => rewrite"), forbid dataset-specific hardcoding, forbid early stopping, cap each candidate's report at 30 lines, and rotate the search across six exploitation axes so successive rounds explore different mechanism families instead of clustering on one.

This is a faithful port of the reference implementation (the canonical `text_classification` example), with the code treated as ground truth wherever paper and code diverge. The original proposer is a Claude Code session with filesystem tools steered through `--append-system-prompt`; a chat proposer cannot browse, so this port serializes the exact slice the skill's reading list points the agent at — the evolution-summary table, the Pareto frontier, recent reports, errors-first trace excerpts, and the full source of the top frontier members — into a single prompt and FIFO-dispenses the parsed candidates one per Galapagos iteration.

Because Galapagos runs one child per iteration while the reference evaluates k candidates per proposer session, a reference run of N iterations maps to N*k Galapagos iterations. The bundled budget of 60 is 20 reference iterations times k=3 candidates. The cost axis of the frontier is the genome character count by default — the universal analogue of the reference's injected-context character count — though any task metric key may be named instead.

Composition

6/6 blocks

The six components this scaffold snaps together. Each block names its concrete implementation.

Population
append_only_pareto
Selection
proposer_delegated_frontier_anchor
Prompt
skill_steered_filesystem_view
Proposer
k_candidate_queue
Evaluator
task
Memory
evolution_summary_reports
  • Populationappend_only_pareto

    The set of candidate solutions in play — the gene pool the search evolves over.

  • Selectionproposer_delegated_frontier_anchor

    Decides which genomes survive and reproduce — tournament, elitism, novelty, or your own policy.

  • Promptskill_steered_filesystem_view

    Assembles the context handed to the model — parents, feedback, instructions, examples.

  • Proposerk_candidate_queue

    The LLM-driven variation operator — proposes new candidates by mutation and crossover.

  • Evaluatortask

    Scores each candidate against the task — the fitness signal that drives selection.

  • Memoryevolution_summary_reports

    Persists discoveries across generations — archives, islands, and lineage for the search.

Tags

minimal-outer-loopskill-steeredpareto-frontierappend-onlyproposer-delegatedcoding-agent-port

Source

Meta-Harness: End-to-End Optimization of Model Harnesses (Stanford IRIS Lab, arXiv:2603.28052); reference implementation in references/test_time_search_scaffolds/meta_harness (text_classification example)

Quick facts

Downloads0
LicenseMIT
Default modelopenai/gpt-5.5
ControllerMetaHarnessScaffold

Use this scaffold

example.py
from galapagos import GalapagosScaffold

scaffold = GalapagosScaffold.from_card(name="meta_harness")
result = scaffold.run(task="<task_name>")