Meta-Harness/meta_harness

Meta-Harness

A minimal outer loop that delegates selection AND mutation to a skill-steered proposer over an append-only candidate history, returning a (score x cost) Pareto frontier.

Test-time searchMIT

"""Meta-Harness Memory component — the ``evolution_summary.jsonl`` + ``reports/`` analogue. One file per component (see scaffold.py). The reference's cross-iteration knowledge lives in three filesystem artifacts the proposer reads every session: ``evolution_summary.jsonl`` (one JSON row per evaluated candidate, append-only), the ``reports/`` directory (<=30-line post-eval reports — the proposer's compression layer written for its future self), and the execution traces. This Memory holds the chat-port of all three: * **rows** — append-only candidate records ``{name, iteration, score, cost, outcome, trace}``. ``outcome`` is ``"evaluated"`` or the literal ``"failed"`` (eval-gated children AND interface-validation failures), mirroring the reference's ``avg_val == 0 → "failed"`` rows. ``trace`` carries the evaluator's ``text_feedback`` (or the proposer's validation error) so the PromptBuilder can replay raw trace excerpts — the reference computes per-example traces and discards them in its shipped val-only config; persisting what galapagos evaluators emit is the sanctioned improvement aligned with the paper's ablation (raw traces beat summaries). * **reports** — ``{name, iteration, report}``, the <=30-line per-candidate analyses parsed from the proposer's response. Writers: the scaffold's ``after_step`` (evaluated children — note the galapagos convention that observation is SKIPPED on NO_DIFF iterations, matching the reference writing nothing for an abandoned proposal) and the proposer at proposal time (failed-validation rows). Reader: the PromptBuilder. ``read()`` renders the summary table; ``read({"spec": "reports"})`` the reports. ``__bool__`` is True even when empty — the base scaffold falls back to ``NullMemory`` on falsy memories. """ from __future__ import annotations from ...components.memory import Memory class MetaHarnessMemory(Memory): """Append-only evolution-summary rows + the per-candidate reports store.""" def __init__(self) -> None: self._rows: list[dict] = [] self._reports: list[dict] = [] def __bool__(self) -> bool: return True # ---- Memory interface -------------------------------------------------------------------- def read(self, spec: dict | None = None) -> str: spec = spec or {} if spec.get("spec") == "reports": return "\n\n".join(f"### iteration {r['iteration']} — {r['name']}\n{r['report']}" for r in self._reports) if not self._rows: return "" lines = ["name | iteration | combined_score | cost | outcome"] for r in self._rows: lines.append(f"{r['name']} | {r['iteration']} | {r['score']:.4f} | " f"{r['cost']:g} | {r['outcome']}") return "\n".join(lines) def write(self, knowledge: str, **meta) -> None: kind = meta.get("kind") if kind == "row": self.record_row(str(meta.get("name", "?")), int(meta.get("iteration", 0)), float(meta.get("score", 0.0)), float(meta.get("cost", 0.0)), str(meta.get("outcome", "evaluated")), str(meta.get("trace", ""))) elif kind == "failed": self.record_failed(str(meta.get("name", "?")), int(meta.get("iteration", 0)), float(meta.get("cost", 0.0)), str(meta.get("trace", ""))) elif kind == "report": self.record_report(str(meta.get("name", "?")), int(meta.get("iteration", 0)), str(meta.get("report", ""))) # ---- evolution_summary.jsonl analogue ------------------------------------------------------- def record_row(self, name: str, iteration: int, score: float, cost: float, outcome: str, trace: str = "") -> None: """One appended row per evaluated candidate (``update_evolution_summary``).""" self._rows.append({"name": name, "iteration": iteration, "score": score, "cost": cost, "outcome": outcome, "trace": trace}) def record_failed(self, name: str, iteration: int, cost: float = 0.0, trace: str = "") -> None: """A failed-validation candidate: score 0, the literal ``"failed"`` outcome.""" self.record_row(name, iteration, 0.0, cost, "failed", trace) def record_report(self, name: str, iteration: int, report: str) -> None: """One <=30-line per-candidate report (the ``reports/`` compression layer).""" if report: self._reports.append({"name": name, "iteration": iteration, "report": report}) # ---- PromptBuilder views ---------------------------------------------------------------------- def rows(self) -> list[dict]: return list(self._rows) def reports(self) -> list[dict]: return list(self._reports)