galapagos
DocsHubLeaderboardPlaygroundNews
galapagos

six blocks · any task ·
better solutions emerge.

Platform

  • Hub
  • Leaderboard
  • Playground

Resources

  • Docs
  • API reference
  • Card spec

Community

  • GitHub
  • Contribute

Updates

  • News
  • Releases

© 2026 Galapagos. Licensed under Apache-2.0.

Build your own scaffold.

Hub/Scaffolds/Meta-Harness/Meta-Harness

Meta-Harness/meta_harness

Meta-Harness

A minimal outer loop that delegates selection AND mutation to a skill-steered proposer over an append-only candidate history, returning a (score x cost) Pareto frontier.

Test-time searchMIT
Scaffold cardFiles and versions
meta_harness/memory.py
91 lines · 4.8 KBpythonDownload
"""Meta-Harness Memory component — the ``evolution_summary.jsonl`` + ``reports/`` analogue.

One file per component (see scaffold.py). The reference's cross-iteration knowledge lives in
three filesystem artifacts the proposer reads every session: ``evolution_summary.jsonl`` (one
JSON row per evaluated candidate, append-only), the ``reports/`` directory (<=30-line post-eval
reports — the proposer's compression layer written for its future self), and the execution
traces. This Memory holds the chat-port of all three:

* **rows** — append-only candidate records ``{name, iteration, score, cost, outcome, trace}``.
  ``outcome`` is ``"evaluated"`` or the literal ``"failed"`` (eval-gated children AND
  interface-validation failures), mirroring the reference's ``avg_val == 0 → "failed"`` rows.
  ``trace`` carries the evaluator's ``text_feedback`` (or the proposer's validation error) so the
  PromptBuilder can replay raw trace excerpts — the reference computes per-example traces and
  discards them in its shipped val-only config; persisting what galapagos evaluators emit is the
  sanctioned improvement aligned with the paper's ablation (raw traces beat summaries).
* **reports** — ``{name, iteration, report}``, the <=30-line per-candidate analyses parsed from
  the proposer's response.

Writers: the scaffold's ``after_step`` (evaluated children — note the galapagos convention that
observation is SKIPPED on NO_DIFF iterations, matching the reference writing nothing for an
abandoned proposal) and the proposer at proposal time (failed-validation rows). Reader: the
PromptBuilder. ``read()`` renders the summary table; ``read({"spec": "reports"})`` the reports.

``__bool__`` is True even when empty — the base scaffold falls back to ``NullMemory`` on falsy
memories.
"""
from __future__ import annotations

from ...components.memory import Memory


class MetaHarnessMemory(Memory):
    """Append-only evolution-summary rows + the per-candidate reports store."""

    def __init__(self) -> None:
        self._rows: list[dict] = []
        self._reports: list[dict] = []

    def __bool__(self) -> bool:
        return True

    # ---- Memory interface --------------------------------------------------------------------
    def read(self, spec: dict | None = None) -> str:
        spec = spec or {}
        if spec.get("spec") == "reports":
            return "\n\n".join(f"### iteration {r['iteration']} — {r['name']}\n{r['report']}"
                               for r in self._reports)
        if not self._rows:
            return ""
        lines = ["name | iteration | combined_score | cost | outcome"]
        for r in self._rows:
            lines.append(f"{r['name']} | {r['iteration']} | {r['score']:.4f} | "
                         f"{r['cost']:g} | {r['outcome']}")
        return "\n".join(lines)

    def write(self, knowledge: str, **meta) -> None:
        kind = meta.get("kind")
        if kind == "row":
            self.record_row(str(meta.get("name", "?")), int(meta.get("iteration", 0)),
                            float(meta.get("score", 0.0)), float(meta.get("cost", 0.0)),
                            str(meta.get("outcome", "evaluated")), str(meta.get("trace", "")))
        elif kind == "failed":
            self.record_failed(str(meta.get("name", "?")), int(meta.get("iteration", 0)),
                               float(meta.get("cost", 0.0)), str(meta.get("trace", "")))
        elif kind == "report":
            self.record_report(str(meta.get("name", "?")), int(meta.get("iteration", 0)),
                               str(meta.get("report", "")))

    # ---- evolution_summary.jsonl analogue -------------------------------------------------------
    def record_row(self, name: str, iteration: int, score: float, cost: float,
                   outcome: str, trace: str = "") -> None:
        """One appended row per evaluated candidate (``update_evolution_summary``)."""
        self._rows.append({"name": name, "iteration": iteration, "score": score,
                           "cost": cost, "outcome": outcome, "trace": trace})

    def record_failed(self, name: str, iteration: int, cost: float = 0.0,
                      trace: str = "") -> None:
        """A failed-validation candidate: score 0, the literal ``"failed"`` outcome."""
        self.record_row(name, iteration, 0.0, cost, "failed", trace)

    def record_report(self, name: str, iteration: int, report: str) -> None:
        """One <=30-line per-candidate report (the ``reports/`` compression layer)."""
        if report:
            self._reports.append({"name": name, "iteration": iteration, "report": report})

    # ---- PromptBuilder views ----------------------------------------------------------------------
    def rows(self) -> list[dict]:
        return list(self._rows)

    def reports(self) -> list[dict]:
        return list(self._reports)