Core components¶

Galapagos models every test-time-search scaffold — from a plain evolutionary loop to a multi-agent, self-modifying system — as a composition of six components plus one unit type. The same six interfaces express all of the reference methods (OpenEvolve, AlphaEvolve, ShinkaEvolve, SkyDiscover, GEPA, DGM, DeepEvolve, PAC-Evolve, AdaEvolve, EvoX, SeaEvo, Meta-Harness, CORAL, HyperAgents). Differences between methods are differences in which implementation fills each slot — not differences in architecture.

                    ┌──────────────── Memory ────────────────┐
                    │  free-form knowledge: notes, skills,    │
                    │  scratchpad, landscape, tactics         │
                    └────▲───────────────────────────┬────────┘
                    read │                      write │
   ┌────────────┐  ┌─────┴──────────┐  ┌─────────────┐  ┌──────────┐  ┌───────────┐
   │ Population │─▶│ SelectionPolicy │─▶│ PromptBuilder │─▶│ Proposer │─▶│ Evaluator │
   └─────▲──────┘  └────────────────┘  └─────────────┘  └──────────┘  └─────┬─────┘
         └──────────────────────── new scored Genome ◀──────────────────────┘

The loop in one sentence

Each iteration: select parents from the Population → build a prompt from them and Memory → propose a new candidate → evaluate it → add the scored Genome back to the Population (and optionally write what was learned to Memory). Repeat until the budget is spent.

All of the reference methods use Population, PromptBuilder, Proposer, and Evaluator — these are the universal backbone. Methods differentiate almost entirely on two axes: the SelectionPolicy (the adaptive intelligence) and Memory (the accumulated knowledge layer).

Genome — the unit of evolution¶

One candidate solution plus everything needed to select, evaluate, and trace it.

Field	Purpose
`content`	The artifact being evolved — code, a prompt-set, an agent codebase, a config, an idea.
`scores`	The metric dict from the Evaluator (e.g. `{"combined_score": 0.87, "latency": 12.0}`).
`parent_id` / lineage	Ancestry for crossover, backtracking, and migration.
`metadata`	Per-candidate data used for selection: feature coordinates (MAP-Elites), an instance-level success vector, an embedding, generation, island id.
`artifacts`	Evaluator side-output (stderr, profiling, traces) fed back into later prompts.

Genome vs. Memory

Per-candidate data that drives selection or lineage lives on the Genome; cross-candidate free-form knowledge that guides generation lives in Memory.

1. Population¶

The candidate store. A passive container that holds Genomes and answers add() / query(). It owns the structure of the search space but takes no initiative — all policy lives in SelectionPolicy.

class Population:
    def add(self, genome: Genome) -> None: ...
    def query(self, spec) -> list[Genome]: ...   # by island, by cell, top-k, frontier
    def best(self) -> Genome: ...

Structure	Used by
Islands (sub-populations + migration)	OpenEvolve, AlphaEvolve, ShinkaEvolve, DeepEvolve, AdaEvolve, PAC-Evolve
MAP-Elites grid (quality-diversity)	OpenEvolve, AlphaEvolve, DeepEvolve
Pareto archive	GEPA, Meta-Harness
Git-commit / lineage chain	DGM, CORAL, HyperAgents
Dual-space archive (code + strategy descriptions + embeddings)	SeaEvo, EvoX

This single abstraction must span backends as different as an in-memory SQLite archive and a shared git repository on disk.

2. SelectionPolicy¶

The active, stateful policy. Decides which parents and inspirations to draw from the Population, and how to adapt the search over time. This is where most cross-method differentiation lives.

class SelectionPolicy:
    def select(self, population: Population) -> Selection: ...   # parents + inspirations
    def observe(self, genome: Genome) -> None: ...              # update internal state

Mechanism	Used by
Explore/exploit split + fitness-weighting	OpenEvolve, AlphaEvolve, DeepEvolve
Bandit selection (UCB1 / power-law / beam)	ShinkaEvolve (UCB over LLM ensemble), AdaEvolve (UCB island routing)
Per-instance Pareto frontier sampling	GEPA
Score-proportional + novelty / child-count penalty	DGM, HyperAgents
Momentum-based backtracking (revert to an ancestor on stagnation)	PAC-Evolve
Strategy-as-code (the selection rule is itself an evolved program)	EvoX
Complementarity retrieval (k-means + behavioral coverage)	SeaEvo
Identity / agent-driven (the agent picks its own parent)	Meta-Harness, CORAL

No separate meta-controller

Agent-driven scaffolds (Meta-Harness, CORAL) have no explicit framework policy — the autonomous agent reads the leaderboard and chooses a parent itself. This is not a special mode; it is simply the identity policy (select returns the whole Population and defers the choice to the Proposer). Adaptive controllers (AdaEvolve's intensity, PAC's backtracking, EvoX's strategy evolution) are all just stateful SelectionPolicy implementations.

3. PromptBuilder¶

The renderer. Takes the selected Genomes and the current Memory and formats them into the LLM input ({"system", "user"}). It formats; it does not select what goes in — that already happened in SelectionPolicy.

class PromptBuilder:
    def build(self, selection: Selection, memory: Memory) -> dict[str, str]: ...

Technique	Used by
Multi-section template (current code + metrics + recent attempts + inspirations + artifacts)	OpenEvolve, AlphaEvolve, ShinkaEvolve, SkyDiscover
Stochastic template variation	OpenEvolve, AlphaEvolve
Reflective dataset from execution traces	GEPA
Diagnosis prompt (analyze logs → problem statement)	DGM
Deep-research planning (query → web search → synthesized report)	DeepEvolve
Mode-aware prompting (explore vs. exploit instructions)	AdaEvolve
Structured articulation (diagnose → strategy → code) + landscape guidance	SeaEvo
Injected meta-knowledge (meta-prompt / meta-scratchpad recommendations)	AlphaEvolve, ShinkaEvolve
Skill file / workflow steering doc	Meta-Harness (`SKILL.md`), CORAL (`CORAL.md`)

SkyDiscover's ContextBuilder is the canonical minimal example: it receives an already-sampled context (program metrics, context programs, previous attempts, errors) and renders five user-message sections — pure formatting, zero selection.

4. Proposer¶

The variation operator. Produces a new Genome from the built prompt. The interface is deliberately broad enough that an LLM diff, an autonomous CLI agent, and a nested Galapagos scaffold all satisfy it identically.

class Proposer:
    def propose(self, prompt, env: Env) -> Genome: ...

env exposes the Evaluator and Memory as tools. This is the key to uniformity: an autonomous agent that evaluates and takes notes mid-run is just a Proposer that uses env — not a separate "agent mode."

Operator	Used by
Diff / SEARCH-REPLACE mutation	OpenEvolve, AlphaEvolve, ShinkaEvolve, SkyDiscover, DeepEvolve
Full rewrite	most methods (configurable)
Crossover (combine parent + inspiration)	ShinkaEvolve
Novelty rejection sampling (embed → reject near-duplicates)	ShinkaEvolve
Reflective mutation (rewrite from trace patterns) + merge	GEPA
Agent-as-candidate (the agent edits its own codebase)	DGM, HyperAgents
Autonomous CLI agent as the variation operator	Meta-Harness, CORAL
Two-LLM (solution generator + strategy-program generator)	EvoX
Coder + automatic debugger loop	DeepEvolve

Self-similarity: meta-scaffolds are just nesting

A Galapagos scaffold itself satisfies the Proposer interface. Meta-scaffolds (EvoX's strategy evolution, nested search) are therefore just nesting — a Proposer that happens to run an inner search — not a new concept.

5. Evaluator¶

The deterministic verifiable scorer. Maps a candidate to a metric dict, optionally with artifacts. Owns sandboxing, staging, and multi-objective scoring. In Galapagos the Evaluator is supplied by the task, not the scaffold — so any scaffold can be pointed at any task. It must recompute the objective from the candidate's raw output (never trust a self-reported score), which is what makes a discovery verifiable.

class Evaluator:
    def evaluate(self, genome: Genome) -> Scores: ...   # metrics + artifacts

Technique	Used by
Cascade / staged evaluation with thresholds (cheap → expensive)	OpenEvolve, AlphaEvolve, DGM, HyperAgents
Multi-objective scoring	AlphaEvolve, GEPA, Meta-Harness (accuracy vs. token cost)
Multi-run aggregation + text feedback	ShinkaEvolve
Sandboxed execution (Docker, git worktree)	DGM, CORAL, HyperAgents
LLM-as-judge feedback	OpenEvolve, AlphaEvolve (simplicity / readability)
Instance-level success vectors	SeaEvo, GEPA
Custom interface (`evaluate() -> {"combined_score", ...}`)	DeepEvolve, CORAL (`TaskGrader`)

Artifacts produced here (stderr, profiling, traces) flow back through the Genome into the next PromptBuilder pass.

6. Memory¶

The cross-cutting knowledge store. Holds free-form knowledge — not candidates. It is read by the PromptBuilder early in the loop and written by the Proposer (or a post-evaluation step) late in the loop, so it spans the pipeline rather than sitting at one position.

class Memory:
    def read(self, spec) -> Knowledge: ...
    def write(self, knowledge: Knowledge) -> None: ...

Content	Used by
Meta-scratchpad (periodically synthesized design insights)	ShinkaEvolve
Co-evolved meta-prompts	AlphaEvolve
Global failure log + evolving idea pool	PAC-Evolve
Tactic history + accumulated adaptation signal	AdaEvolve
Strategy history + population-state descriptor	EvoX
Landscape guidance + strategy descriptions	SeaEvo
Shared notes + skills (multi-agent, with heartbeat consolidation)	CORAL
Filesystem / code-embedded memory (traces, JSON memory files)	Meta-Harness, HyperAgents
Idea-evolution chain + research reports	DeepEvolve

Memory vs. Population

Population stores candidates (Genomes) and answers "what solutions exist?" Memory stores knowledge (notes, skills, strategies) and answers "what have we learned?" SkyDiscover folds both into one database; Galapagos splits them so adaptive and multi-agent methods have a clean home for cross-candidate knowledge. Memory is optional — a plain evolutionary loop (e.g. OpenEvolve) leaves it empty.

Coverage matrix¶

✓ framework-level · △ agent-driven / implicit / partial · ✗ unused

Method	Population	SelectionPolicy	PromptBuilder	Proposer	Evaluator	Memory
OpenEvolve	✓	✓	✓	✓	✓	△
AlphaEvolve	✓	✓	✓	✓	✓	✓
ShinkaEvolve	✓	✓	✓	✓	✓	✓
SkyDiscover	✓	✓	✓	✓	✓	✗
GEPA	✓	✓	✓	✓	✓	△
DGM	✓	✓	✓	✓	✓	△
DeepEvolve	✓	✓	✓	✓	✓	✓
PAC-Evolve	✓	✓	✓	✓	✓	✓
AdaEvolve	✓	✓	✓	✓	✓	✓
EvoX	✓	✓	✓	✓	✓	✓
SeaEvo	✓	✓	✓	✓	✓	✓
Meta-Harness	✓	△	✓	✓	✓	✓
CORAL	✓	△	✓	✓	✓	✓
HyperAgents	✓	✓	✓	✓	✓	✓

Mapping to the implementation¶

The six components above are the conceptual roles. The shipped library (the evolutionary loop) implements each role as an abstract base class in galapagos.components, with one or more concrete implementations you plug into a slot. The loop is driven by the orchestrator, GalapagosScaffold.

Conceptual role (this page)	Base class	Shipped implementations
`Genome`	—	`Genome` — the dataclass unit of evolution (`content`, `scores`, `parent_id`, `lineage`, `metadata`, `artifacts`; `fitness = scores["combined_score"]`).
`Population`	`Population`	`InMemoryPopulation`; `IslandPopulation` (islands + MAP-Elites cells + ring migration).
`SelectionPolicy`	`SelectionPolicy`	`ExploreExploitPolicy`; `UCBBanditPolicy`; `IdentityPolicy` (delegated / agent-driven — returns the whole pool and defers to the Proposer).
`PromptBuilder`	`PromptBuilder`	`DefaultPromptBuilder` (pure formatting, no selection).
`Proposer`	`Proposer`	`DiffProposer` (SEARCH/REPLACE diff or full rewrite, no-op detection); `CrossoverProposer` (+ token-Jaccard novelty rejection).
`Evaluator`	`Evaluator`	`SubprocessEvaluator` — instantiated by the task as `task.evaluator`; runs the task's `evaluator.py` (the deterministic verifiable scorer) in a subprocess.
`Memory`	`Memory`	`NullMemory` (default — empty); `ScratchpadMemory` (a rolling meta-scratchpad).
orchestrator	—	`GalapagosScaffold` — runs the loop and enforces the budget.

A method is a choice of which implementation fills each slot — see the evolutionary loop for the OpenEvolve vs. AdaEvolve worked example.

See the Scaffold Card for how a method declares these six slots in YAML, and Write your own scaffold to compose your own from component instances, "module.Class" paths, or .py files via GalapagosScaffold.from_card(population=..., ...).