Core components¶
Galapagos models every test-time-search scaffold — from a plain evolutionary loop to a multi-agent, self-modifying system — as a composition of six components plus one unit type. The same six interfaces express all of the reference methods (OpenEvolve, AlphaEvolve, ShinkaEvolve, SkyDiscover, GEPA, DGM, DeepEvolve, PAC-Evolve, AdaEvolve, EvoX, SeaEvo, Meta-Harness, CORAL, HyperAgents). Differences between methods are differences in which implementation fills each slot — not differences in architecture.
┌──────────────── Memory ────────────────┐
│ free-form knowledge: notes, skills, │
│ scratchpad, landscape, tactics │
└────▲───────────────────────────┬────────┘
read │ write │
┌────────────┐ ┌─────┴──────────┐ ┌─────────────┐ ┌──────────┐ ┌───────────┐
│ Population │─▶│ SelectionPolicy │─▶│ PromptBuilder │─▶│ Proposer │─▶│ Evaluator │
└─────▲──────┘ └────────────────┘ └─────────────┘ └──────────┘ └─────┬─────┘
└──────────────────────── new scored Genome ◀──────────────────────┘
The loop in one sentence
Each iteration: select parents from the Population → build a prompt from them and Memory → propose a new candidate → evaluate it → add the scored Genome back to the Population (and optionally write what was learned to Memory). Repeat until the budget is spent.
All of the reference methods use Population, PromptBuilder, Proposer, and Evaluator — these are the universal backbone. Methods differentiate almost entirely on two axes: the SelectionPolicy (the adaptive intelligence) and Memory (the accumulated knowledge layer).
Genome — the unit of evolution¶
One candidate solution plus everything needed to select, evaluate, and trace it.
| Field | Purpose |
|---|---|
content |
The artifact being evolved — code, a prompt-set, an agent codebase, a config, an idea. |
scores |
The metric dict from the Evaluator (e.g. {"combined_score": 0.87, "latency": 12.0}). |
parent_id / lineage |
Ancestry for crossover, backtracking, and migration. |
metadata |
Per-candidate data used for selection: feature coordinates (MAP-Elites), an instance-level success vector, an embedding, generation, island id. |
artifacts |
Evaluator side-output (stderr, profiling, traces) fed back into later prompts. |
Genome vs. Memory
Per-candidate data that drives selection or lineage lives on the Genome; cross-candidate free-form knowledge that guides generation lives in Memory.
1. Population¶
The candidate store. A passive container that holds Genomes and answers add() / query(). It
owns the structure of the search space but takes no initiative — all policy lives in
SelectionPolicy.
class Population:
def add(self, genome: Genome) -> None: ...
def query(self, spec) -> list[Genome]: ... # by island, by cell, top-k, frontier
def best(self) -> Genome: ...
| Structure | Used by |
|---|---|
| Islands (sub-populations + migration) | OpenEvolve, AlphaEvolve, ShinkaEvolve, DeepEvolve, AdaEvolve, PAC-Evolve |
| MAP-Elites grid (quality-diversity) | OpenEvolve, AlphaEvolve, DeepEvolve |
| Pareto archive | GEPA, Meta-Harness |
| Git-commit / lineage chain | DGM, CORAL, HyperAgents |
| Dual-space archive (code + strategy descriptions + embeddings) | SeaEvo, EvoX |
This single abstraction must span backends as different as an in-memory SQLite archive and a shared git repository on disk.
2. SelectionPolicy¶
The active, stateful policy. Decides which parents and inspirations to draw from the Population, and how to adapt the search over time. This is where most cross-method differentiation lives.
class SelectionPolicy:
def select(self, population: Population) -> Selection: ... # parents + inspirations
def observe(self, genome: Genome) -> None: ... # update internal state
| Mechanism | Used by |
|---|---|
| Explore/exploit split + fitness-weighting | OpenEvolve, AlphaEvolve, DeepEvolve |
| Bandit selection (UCB1 / power-law / beam) | ShinkaEvolve (UCB over LLM ensemble), AdaEvolve (UCB island routing) |
| Per-instance Pareto frontier sampling | GEPA |
| Score-proportional + novelty / child-count penalty | DGM, HyperAgents |
| Momentum-based backtracking (revert to an ancestor on stagnation) | PAC-Evolve |
| Strategy-as-code (the selection rule is itself an evolved program) | EvoX |
| Complementarity retrieval (k-means + behavioral coverage) | SeaEvo |
| Identity / agent-driven (the agent picks its own parent) | Meta-Harness, CORAL |
No separate meta-controller
Agent-driven scaffolds (Meta-Harness, CORAL) have no explicit framework policy — the
autonomous agent reads the leaderboard and chooses a parent itself. This is not a special mode;
it is simply the identity policy (select returns the whole Population and defers the choice
to the Proposer). Adaptive controllers (AdaEvolve's intensity, PAC's backtracking, EvoX's
strategy evolution) are all just stateful SelectionPolicy implementations.
3. PromptBuilder¶
The renderer. Takes the selected Genomes and the current Memory and formats them into the LLM
input ({"system", "user"}). It formats; it does not select what goes in — that already
happened in SelectionPolicy.
| Technique | Used by |
|---|---|
| Multi-section template (current code + metrics + recent attempts + inspirations + artifacts) | OpenEvolve, AlphaEvolve, ShinkaEvolve, SkyDiscover |
| Stochastic template variation | OpenEvolve, AlphaEvolve |
| Reflective dataset from execution traces | GEPA |
| Diagnosis prompt (analyze logs → problem statement) | DGM |
| Deep-research planning (query → web search → synthesized report) | DeepEvolve |
| Mode-aware prompting (explore vs. exploit instructions) | AdaEvolve |
| Structured articulation (diagnose → strategy → code) + landscape guidance | SeaEvo |
| Injected meta-knowledge (meta-prompt / meta-scratchpad recommendations) | AlphaEvolve, ShinkaEvolve |
| Skill file / workflow steering doc | Meta-Harness (SKILL.md), CORAL (CORAL.md) |
SkyDiscover's ContextBuilder is the canonical minimal example: it receives an already-sampled
context (program metrics, context programs, previous attempts, errors) and renders five
user-message sections — pure formatting, zero selection.
4. Proposer¶
The variation operator. Produces a new Genome from the built prompt. The interface is
deliberately broad enough that an LLM diff, an autonomous CLI agent, and a nested Galapagos
scaffold all satisfy it identically.
env exposes the Evaluator and Memory as tools. This is the key to uniformity: an autonomous agent
that evaluates and takes notes mid-run is just a Proposer that uses env — not a separate "agent
mode."
| Operator | Used by |
|---|---|
| Diff / SEARCH-REPLACE mutation | OpenEvolve, AlphaEvolve, ShinkaEvolve, SkyDiscover, DeepEvolve |
| Full rewrite | most methods (configurable) |
| Crossover (combine parent + inspiration) | ShinkaEvolve |
| Novelty rejection sampling (embed → reject near-duplicates) | ShinkaEvolve |
| Reflective mutation (rewrite from trace patterns) + merge | GEPA |
| Agent-as-candidate (the agent edits its own codebase) | DGM, HyperAgents |
| Autonomous CLI agent as the variation operator | Meta-Harness, CORAL |
| Two-LLM (solution generator + strategy-program generator) | EvoX |
| Coder + automatic debugger loop | DeepEvolve |
Self-similarity: meta-scaffolds are just nesting
A Galapagos scaffold itself satisfies the Proposer interface. Meta-scaffolds (EvoX's strategy
evolution, nested search) are therefore just nesting — a Proposer that happens to run an inner
search — not a new concept.
5. Evaluator¶
The deterministic verifiable scorer. Maps a candidate to a metric dict, optionally with artifacts. Owns sandboxing, staging, and multi-objective scoring. In Galapagos the Evaluator is supplied by the task, not the scaffold — so any scaffold can be pointed at any task. It must recompute the objective from the candidate's raw output (never trust a self-reported score), which is what makes a discovery verifiable.
| Technique | Used by |
|---|---|
| Cascade / staged evaluation with thresholds (cheap → expensive) | OpenEvolve, AlphaEvolve, DGM, HyperAgents |
| Multi-objective scoring | AlphaEvolve, GEPA, Meta-Harness (accuracy vs. token cost) |
| Multi-run aggregation + text feedback | ShinkaEvolve |
| Sandboxed execution (Docker, git worktree) | DGM, CORAL, HyperAgents |
| LLM-as-judge feedback | OpenEvolve, AlphaEvolve (simplicity / readability) |
| Instance-level success vectors | SeaEvo, GEPA |
Custom interface (evaluate() -> {"combined_score", ...}) |
DeepEvolve, CORAL (TaskGrader) |
Artifacts produced here (stderr, profiling, traces) flow back through the Genome into the next
PromptBuilder pass.
6. Memory¶
The cross-cutting knowledge store. Holds free-form knowledge — not candidates. It is read by the PromptBuilder early in the loop and written by the Proposer (or a post-evaluation step) late in the loop, so it spans the pipeline rather than sitting at one position.
class Memory:
def read(self, spec) -> Knowledge: ...
def write(self, knowledge: Knowledge) -> None: ...
| Content | Used by |
|---|---|
| Meta-scratchpad (periodically synthesized design insights) | ShinkaEvolve |
| Co-evolved meta-prompts | AlphaEvolve |
| Global failure log + evolving idea pool | PAC-Evolve |
| Tactic history + accumulated adaptation signal | AdaEvolve |
| Strategy history + population-state descriptor | EvoX |
| Landscape guidance + strategy descriptions | SeaEvo |
| Shared notes + skills (multi-agent, with heartbeat consolidation) | CORAL |
| Filesystem / code-embedded memory (traces, JSON memory files) | Meta-Harness, HyperAgents |
| Idea-evolution chain + research reports | DeepEvolve |
Memory vs. Population
Population stores candidates (Genomes) and answers "what solutions exist?" Memory stores knowledge (notes, skills, strategies) and answers "what have we learned?" SkyDiscover folds both into one database; Galapagos splits them so adaptive and multi-agent methods have a clean home for cross-candidate knowledge. Memory is optional — a plain evolutionary loop (e.g. OpenEvolve) leaves it empty.
Coverage matrix¶
✓ framework-level · △ agent-driven / implicit / partial · ✗ unused
| Method | Population | SelectionPolicy | PromptBuilder | Proposer | Evaluator | Memory |
|---|---|---|---|---|---|---|
| OpenEvolve | ✓ | ✓ | ✓ | ✓ | ✓ | △ |
| AlphaEvolve | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| ShinkaEvolve | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| SkyDiscover | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| GEPA | ✓ | ✓ | ✓ | ✓ | ✓ | △ |
| DGM | ✓ | ✓ | ✓ | ✓ | ✓ | △ |
| DeepEvolve | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| PAC-Evolve | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| AdaEvolve | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| EvoX | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| SeaEvo | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Meta-Harness | ✓ | △ | ✓ | ✓ | ✓ | ✓ |
| CORAL | ✓ | △ | ✓ | ✓ | ✓ | ✓ |
| HyperAgents | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Mapping to the implementation¶
The six components above are the conceptual roles. The shipped library
(the evolutionary loop) implements each role as an abstract base class in
galapagos.components, with one or more concrete implementations you plug into a slot. The loop is
driven by the orchestrator, GalapagosScaffold.
| Conceptual role (this page) | Base class | Shipped implementations |
|---|---|---|
Genome |
— | Genome — the dataclass unit of evolution (content, scores, parent_id, lineage, metadata, artifacts; fitness = scores["combined_score"]). |
Population |
Population |
InMemoryPopulation; IslandPopulation (islands + MAP-Elites cells + ring migration). |
SelectionPolicy |
SelectionPolicy |
ExploreExploitPolicy; UCBBanditPolicy; IdentityPolicy (delegated / agent-driven — returns the whole pool and defers to the Proposer). |
PromptBuilder |
PromptBuilder |
DefaultPromptBuilder (pure formatting, no selection). |
Proposer |
Proposer |
DiffProposer (SEARCH/REPLACE diff or full rewrite, no-op detection); CrossoverProposer (+ token-Jaccard novelty rejection). |
Evaluator |
Evaluator |
SubprocessEvaluator — instantiated by the task as task.evaluator; runs the task's evaluator.py (the deterministic verifiable scorer) in a subprocess. |
Memory |
Memory |
NullMemory (default — empty); ScratchpadMemory (a rolling meta-scratchpad). |
| orchestrator | — | GalapagosScaffold — runs the loop and enforces the budget. |
A method is a choice of which implementation fills each slot — see the evolutionary loop for the OpenEvolve vs. AdaEvolve worked example.
See the Scaffold Card for how a method declares these six slots in YAML,
and Write your own scaffold to compose your own from component
instances, "module.Class" paths, or .py files via
GalapagosScaffold.from_card(population=..., ...).