Skip to content

Core components

Galapagos models every test-time-search scaffold — from a plain evolutionary loop to a multi-agent, self-modifying system — as a composition of six components plus one unit type. The same six interfaces express all of the reference methods (OpenEvolve, AlphaEvolve, ShinkaEvolve, SkyDiscover, GEPA, DGM, DeepEvolve, PAC-Evolve, AdaEvolve, EvoX, SeaEvo, Meta-Harness, CORAL, HyperAgents). Differences between methods are differences in which implementation fills each slot — not differences in architecture.

                    ┌──────────────── Memory ────────────────┐
                    │  free-form knowledge: notes, skills,    │
                    │  scratchpad, landscape, tactics         │
                    └────▲───────────────────────────┬────────┘
                    read │                      write │
   ┌────────────┐  ┌─────┴──────────┐  ┌─────────────┐  ┌──────────┐  ┌───────────┐
   │ Population │─▶│ SelectionPolicy │─▶│ PromptBuilder │─▶│ Proposer │─▶│ Evaluator │
   └─────▲──────┘  └────────────────┘  └─────────────┘  └──────────┘  └─────┬─────┘
         └──────────────────────── new scored Genome ◀──────────────────────┘

The loop in one sentence

Each iteration: select parents from the Population → build a prompt from them and Memory → propose a new candidate → evaluate it → add the scored Genome back to the Population (and optionally write what was learned to Memory). Repeat until the budget is spent.

All of the reference methods use Population, PromptBuilder, Proposer, and Evaluator — these are the universal backbone. Methods differentiate almost entirely on two axes: the SelectionPolicy (the adaptive intelligence) and Memory (the accumulated knowledge layer).


Genome — the unit of evolution

One candidate solution plus everything needed to select, evaluate, and trace it.

Field Purpose
content The artifact being evolved — code, a prompt-set, an agent codebase, a config, an idea.
scores The metric dict from the Evaluator (e.g. {"combined_score": 0.87, "latency": 12.0}).
parent_id / lineage Ancestry for crossover, backtracking, and migration.
metadata Per-candidate data used for selection: feature coordinates (MAP-Elites), an instance-level success vector, an embedding, generation, island id.
artifacts Evaluator side-output (stderr, profiling, traces) fed back into later prompts.

Genome vs. Memory

Per-candidate data that drives selection or lineage lives on the Genome; cross-candidate free-form knowledge that guides generation lives in Memory.


1. Population

The candidate store. A passive container that holds Genomes and answers add() / query(). It owns the structure of the search space but takes no initiative — all policy lives in SelectionPolicy.

class Population:
    def add(self, genome: Genome) -> None: ...
    def query(self, spec) -> list[Genome]: ...   # by island, by cell, top-k, frontier
    def best(self) -> Genome: ...
Structure Used by
Islands (sub-populations + migration) OpenEvolve, AlphaEvolve, ShinkaEvolve, DeepEvolve, AdaEvolve, PAC-Evolve
MAP-Elites grid (quality-diversity) OpenEvolve, AlphaEvolve, DeepEvolve
Pareto archive GEPA, Meta-Harness
Git-commit / lineage chain DGM, CORAL, HyperAgents
Dual-space archive (code + strategy descriptions + embeddings) SeaEvo, EvoX

This single abstraction must span backends as different as an in-memory SQLite archive and a shared git repository on disk.


2. SelectionPolicy

The active, stateful policy. Decides which parents and inspirations to draw from the Population, and how to adapt the search over time. This is where most cross-method differentiation lives.

class SelectionPolicy:
    def select(self, population: Population) -> Selection: ...   # parents + inspirations
    def observe(self, genome: Genome) -> None: ...              # update internal state
Mechanism Used by
Explore/exploit split + fitness-weighting OpenEvolve, AlphaEvolve, DeepEvolve
Bandit selection (UCB1 / power-law / beam) ShinkaEvolve (UCB over LLM ensemble), AdaEvolve (UCB island routing)
Per-instance Pareto frontier sampling GEPA
Score-proportional + novelty / child-count penalty DGM, HyperAgents
Momentum-based backtracking (revert to an ancestor on stagnation) PAC-Evolve
Strategy-as-code (the selection rule is itself an evolved program) EvoX
Complementarity retrieval (k-means + behavioral coverage) SeaEvo
Identity / agent-driven (the agent picks its own parent) Meta-Harness, CORAL

No separate meta-controller

Agent-driven scaffolds (Meta-Harness, CORAL) have no explicit framework policy — the autonomous agent reads the leaderboard and chooses a parent itself. This is not a special mode; it is simply the identity policy (select returns the whole Population and defers the choice to the Proposer). Adaptive controllers (AdaEvolve's intensity, PAC's backtracking, EvoX's strategy evolution) are all just stateful SelectionPolicy implementations.


3. PromptBuilder

The renderer. Takes the selected Genomes and the current Memory and formats them into the LLM input ({"system", "user"}). It formats; it does not select what goes in — that already happened in SelectionPolicy.

class PromptBuilder:
    def build(self, selection: Selection, memory: Memory) -> dict[str, str]: ...
Technique Used by
Multi-section template (current code + metrics + recent attempts + inspirations + artifacts) OpenEvolve, AlphaEvolve, ShinkaEvolve, SkyDiscover
Stochastic template variation OpenEvolve, AlphaEvolve
Reflective dataset from execution traces GEPA
Diagnosis prompt (analyze logs → problem statement) DGM
Deep-research planning (query → web search → synthesized report) DeepEvolve
Mode-aware prompting (explore vs. exploit instructions) AdaEvolve
Structured articulation (diagnose → strategy → code) + landscape guidance SeaEvo
Injected meta-knowledge (meta-prompt / meta-scratchpad recommendations) AlphaEvolve, ShinkaEvolve
Skill file / workflow steering doc Meta-Harness (SKILL.md), CORAL (CORAL.md)

SkyDiscover's ContextBuilder is the canonical minimal example: it receives an already-sampled context (program metrics, context programs, previous attempts, errors) and renders five user-message sections — pure formatting, zero selection.


4. Proposer

The variation operator. Produces a new Genome from the built prompt. The interface is deliberately broad enough that an LLM diff, an autonomous CLI agent, and a nested Galapagos scaffold all satisfy it identically.

class Proposer:
    def propose(self, prompt, env: Env) -> Genome: ...

env exposes the Evaluator and Memory as tools. This is the key to uniformity: an autonomous agent that evaluates and takes notes mid-run is just a Proposer that uses env — not a separate "agent mode."

Operator Used by
Diff / SEARCH-REPLACE mutation OpenEvolve, AlphaEvolve, ShinkaEvolve, SkyDiscover, DeepEvolve
Full rewrite most methods (configurable)
Crossover (combine parent + inspiration) ShinkaEvolve
Novelty rejection sampling (embed → reject near-duplicates) ShinkaEvolve
Reflective mutation (rewrite from trace patterns) + merge GEPA
Agent-as-candidate (the agent edits its own codebase) DGM, HyperAgents
Autonomous CLI agent as the variation operator Meta-Harness, CORAL
Two-LLM (solution generator + strategy-program generator) EvoX
Coder + automatic debugger loop DeepEvolve

Self-similarity: meta-scaffolds are just nesting

A Galapagos scaffold itself satisfies the Proposer interface. Meta-scaffolds (EvoX's strategy evolution, nested search) are therefore just nesting — a Proposer that happens to run an inner search — not a new concept.


5. Evaluator

The deterministic verifiable scorer. Maps a candidate to a metric dict, optionally with artifacts. Owns sandboxing, staging, and multi-objective scoring. In Galapagos the Evaluator is supplied by the task, not the scaffold — so any scaffold can be pointed at any task. It must recompute the objective from the candidate's raw output (never trust a self-reported score), which is what makes a discovery verifiable.

class Evaluator:
    def evaluate(self, genome: Genome) -> Scores: ...   # metrics + artifacts
Technique Used by
Cascade / staged evaluation with thresholds (cheap → expensive) OpenEvolve, AlphaEvolve, DGM, HyperAgents
Multi-objective scoring AlphaEvolve, GEPA, Meta-Harness (accuracy vs. token cost)
Multi-run aggregation + text feedback ShinkaEvolve
Sandboxed execution (Docker, git worktree) DGM, CORAL, HyperAgents
LLM-as-judge feedback OpenEvolve, AlphaEvolve (simplicity / readability)
Instance-level success vectors SeaEvo, GEPA
Custom interface (evaluate() -> {"combined_score", ...}) DeepEvolve, CORAL (TaskGrader)

Artifacts produced here (stderr, profiling, traces) flow back through the Genome into the next PromptBuilder pass.


6. Memory

The cross-cutting knowledge store. Holds free-form knowledge — not candidates. It is read by the PromptBuilder early in the loop and written by the Proposer (or a post-evaluation step) late in the loop, so it spans the pipeline rather than sitting at one position.

class Memory:
    def read(self, spec) -> Knowledge: ...
    def write(self, knowledge: Knowledge) -> None: ...
Content Used by
Meta-scratchpad (periodically synthesized design insights) ShinkaEvolve
Co-evolved meta-prompts AlphaEvolve
Global failure log + evolving idea pool PAC-Evolve
Tactic history + accumulated adaptation signal AdaEvolve
Strategy history + population-state descriptor EvoX
Landscape guidance + strategy descriptions SeaEvo
Shared notes + skills (multi-agent, with heartbeat consolidation) CORAL
Filesystem / code-embedded memory (traces, JSON memory files) Meta-Harness, HyperAgents
Idea-evolution chain + research reports DeepEvolve

Memory vs. Population

Population stores candidates (Genomes) and answers "what solutions exist?" Memory stores knowledge (notes, skills, strategies) and answers "what have we learned?" SkyDiscover folds both into one database; Galapagos splits them so adaptive and multi-agent methods have a clean home for cross-candidate knowledge. Memory is optional — a plain evolutionary loop (e.g. OpenEvolve) leaves it empty.


Coverage matrix

framework-level · agent-driven / implicit / partial · unused

Method Population SelectionPolicy PromptBuilder Proposer Evaluator Memory
OpenEvolve
AlphaEvolve
ShinkaEvolve
SkyDiscover
GEPA
DGM
DeepEvolve
PAC-Evolve
AdaEvolve
EvoX
SeaEvo
Meta-Harness
CORAL
HyperAgents

Mapping to the implementation

The six components above are the conceptual roles. The shipped library (the evolutionary loop) implements each role as an abstract base class in galapagos.components, with one or more concrete implementations you plug into a slot. The loop is driven by the orchestrator, GalapagosScaffold.

Conceptual role (this page) Base class Shipped implementations
Genome Genome — the dataclass unit of evolution (content, scores, parent_id, lineage, metadata, artifacts; fitness = scores["combined_score"]).
Population Population InMemoryPopulation; IslandPopulation (islands + MAP-Elites cells + ring migration).
SelectionPolicy SelectionPolicy ExploreExploitPolicy; UCBBanditPolicy; IdentityPolicy (delegated / agent-driven — returns the whole pool and defers to the Proposer).
PromptBuilder PromptBuilder DefaultPromptBuilder (pure formatting, no selection).
Proposer Proposer DiffProposer (SEARCH/REPLACE diff or full rewrite, no-op detection); CrossoverProposer (+ token-Jaccard novelty rejection).
Evaluator Evaluator SubprocessEvaluator — instantiated by the task as task.evaluator; runs the task's evaluator.py (the deterministic verifiable scorer) in a subprocess.
Memory Memory NullMemory (default — empty); ScratchpadMemory (a rolling meta-scratchpad).
orchestrator GalapagosScaffold — runs the loop and enforces the budget.

A method is a choice of which implementation fills each slot — see the evolutionary loop for the OpenEvolve vs. AdaEvolve worked example.

See the Scaffold Card for how a method declares these six slots in YAML, and Write your own scaffold to compose your own from component instances, "module.Class" paths, or .py files via GalapagosScaffold.from_card(population=..., ...).