Skip to content

Python API

The package is galapagos (pip install name: galapagos; editable: pip install -e .). The public surface is small — import everything from the top level.

import galapagos as gx

# primary classes
gx.GalapagosModel, gx.GalapagosConfig, gx.GalapagosScaffold, gx.GalapagosTask
gx.AdaEvolveScaffold, gx.BeamSearchScaffold, gx.BestOfNScaffold, gx.EvoXScaffold, gx.OpenEvolveScaffold, gx.TopKScaffold

# records
gx.Genome, gx.Selection, gx.EvalResult, gx.RunState, gx.RunResult, gx.Budget

# registry + functional loaders
gx.available_scaffolds, gx.available_tasks, gx.registered_scaffolds
gx.AutoScaffold, gx.register_scaffold
gx.load_scaffold, gx.load_model, gx.load_task, gx.load_config

Primary classes

GalapagosModel

GalapagosModel.from_card(name=None, host=None, *, path=None, **kw) -> GalapagosModel

Resolve a model from a name + host (or a model-card YAML at path). Hosts: openai, openrouter, togetherai (alias together), litellm, vllm, huggingface (alias hf), azure, bedrock, anthropic, google. The API key is read from OPENAI_API_KEY. Subclasses implement generate(prompt: Prompt) -> Generation.

model = gx.GalapagosModel.from_card(name="openai/gpt-4o-mini", host="openrouter")
model = gx.load_model("openai/gpt-4o-mini", host="openrouter")   # functional alias

GalapagosConfig

GalapagosConfig.from_config(scaffold_name=None, *, path=None, **overrides) -> GalapagosConfig
cfg.get(dotted, default=None) -> Any
cfg.set(dotted, value) -> GalapagosConfig
cfg.section(name) -> dict
cfg.as_dict() -> dict

A thin config object over a nested dict, accessed by dotted paths. from_config(scaffold_name="openevolve") loads the scaffold's bundled defaults; from_config(path="cfg.yaml") loads a file.

cfg = gx.GalapagosConfig.from_config(scaffold_name="openevolve")
cfg.set("database.num_islands", 8).set("budget.max_iterations", 200)
cfg.get("budget.max_iterations")     # -> 200

GalapagosScaffold

The orchestrator that drives the six components around the loop.

GalapagosScaffold.from_card(name=None, *, path=None, config=None, model=None,
                            population=None, selection_policy=None, prompt_builder=None,
                            proposer=None, evaluator=None, memory=None, seed=None, **kw) -> GalapagosScaffold
scaffold.run(task=None, *, max_iterations=None) -> RunResult

Three load modes: by name (from_card("openevolve", model=...) — dispatches via the registry), subclass defaults (OpenEvolveScaffold.from_card() — loads its own card + config + default model), and build-your-own (pass component instances / module.Class paths / .py paths to the six role kwargs).

@classmethod
def build_components(cls, config, model) -> dict   # the five scaffold-side components (override in a subclass)

# adaptation hooks (no-ops by default):
def before_step(self) -> None: ...                 # before selection
def after_step(self, child: Genome, result) -> None: ...   # after eval (result is None on a no-op)
def periodic(self) -> None: ...                    # once per iteration, after the step

The seven runnable subclasses are AdaEvolveScaffold, BeamSearchScaffold, BestOfNScaffold, EvoXScaffold, MetaHarnessScaffold, OpenEvolveScaffold, and TopKScaffold. See Write your own scaffold.

GalapagosTask

GalapagosTask.from_card(name=None, *, path=None) -> GalapagosTask
task.context -> str               # the problem statement injected into prompts
task.status -> str                # 'stable' | 'experimental' | 'spec' | 'external'
task.runnable -> bool             # True iff it ships a seed + evaluator
task.initial_genome() -> Genome   # the seed Genome
task.evaluator -> Evaluator | None  # a SubprocessEvaluator over the task's evaluator.py

The task supplies the Evaluator (not the scaffold), so any scaffold runs against any task.


The six components

from galapagos.components import .... Every component is an abstract base with shipped impls.

Population

class Population:                       # the candidate store
    def add(self, genome: Genome) -> bool: ...      # returns whether admitted
    def query(self, spec: dict | None = None) -> list[Genome]: ...
    def all(self) -> list[Genome]: ...
    def best(self) -> Genome | None: ...
Impl Purpose
InMemoryPopulation(capacity=1000) A bounded top-k / leaderboard list kept sorted by fitness.
IslandPopulation(num_islands=4, migration_interval=25, migration_rate=2, descriptor=None) Islands of MAP-Elites cells with periodic ring migration.

SelectionPolicy

class SelectionPolicy:                  # the active, stateful policy
    def select(self, population, state: RunState | None = None) -> Selection: ...
    def observe(self, genome: Genome, state: RunState | None = None) -> None: ...
Impl Purpose
ExploreExploitPolicy(seed=0, explore_ratio=0.3, num_inspirations=3) Explore/exploit split + fitness-weighted exploit; diverse inspirations.
UCBBanditPolicy(seed=0, num_islands=4, c=1.4, num_inspirations=2) UCB1 routing over islands; mirrors posteriors into state.signals['ucb'].
IdentityPolicy() Delegated/agent-driven: returns the whole population, defers the choice to the Proposer.

PromptBuilder

class PromptBuilder:                    # the renderer (pure formatting, no selection)
    def build(self, selection: Selection, memory=None, state: RunState | None = None) -> Prompt: ...
Impl Purpose
DefaultPromptBuilder(system_message=None, max_inspiration_chars=600, include_memory=True) The canonical multi-section template (task → metrics → feedback → inspirations → memory → current program).

Proposer

class Proposer:                         # the variation operator
    def propose(self, prompt, env: Env) -> Genome: ...

Env(model, selection, evaluator=None, memory=None, state=None) is the toolbox handed to a Proposer.

Impl Purpose
DiffProposer() One LLM call → SEARCH/REPLACE diff (or full rewrite) applied to the parent; no-op detection.
CrossoverProposer(novelty_threshold=0.9, recent=12) Crossover + token-Jaccard novelty rejection (one resample on near-duplicates).

Helper: apply_edit(parent_code, response) -> (new_code, changed).

Evaluator

class Evaluator:                        # the pure scorer (supplied by the task)
    def evaluate(self, genome: Genome) -> EvalResult: ...
Impl Purpose
SubprocessEvaluator(evaluator_path, timeout=120, suffix=".py") Runs the task's evaluator.py in an isolated subprocess.

Task evaluator contract: evaluate(program_path) -> dict with at least combined_score (float), and optional validity / status / per_instance / artifacts.text_feedback.

Memory

class Memory:                           # free-form knowledge (optional)
    def read(self, spec: dict | None = None) -> str: ...
    def write(self, knowledge: str, **meta) -> None: ...
Impl Purpose
NullMemory() The empty memory — the default.
ScratchpadMemory(max_notes=8) A rolling meta-scratchpad of distilled design insights.

Records

from galapagos import Genome, Selection, EvalResult, RunState, RunResult, Budget

Genome

@dataclass
class Genome:
    content: str                        # the artifact being evolved (code, prompts, config, ...)
    id: str                             # auto-assigned 'g000001'
    parent_id: str | None = None
    lineage: str = ""
    scores: dict[str, float] = {}       # filled by the Evaluator
    metadata: dict = {}                 # selection data: island, cell, generation, embeddings, ...
    artifacts: dict = {}                # evaluator side-output (text_feedback, traces)

    @property
    def fitness(self) -> float          # scores['combined_score'], else mean of numeric scores, else -inf
    def child(self, content, **metadata) -> Genome   # descendant with lineage wired up

Selection

@dataclass
class Selection:
    parent: Genome | None               # the parent to mutate (None => delegated selection)
    inspirations: list[Genome] = []     # context-only inspirations
    pool: list[Genome] = []             # the full visible population (for delegated selection)

EvalResult

@dataclass
class EvalResult:
    metrics: dict[str, float] = {}      # must contain 'combined_score'
    artifacts: dict = {}
    valid: bool = True                  # gates admission
    per_instance: list[float] | None = None   # per-test-case success vector
    text_feedback: str | None = None    # surfaced into later prompts

    @property
    def combined_score(self) -> float

RunState

@dataclass
class RunState:
    iteration: int = 0
    cost_usd: float = 0.0
    prompt_tokens: int = 0
    completion_tokens: int = 0
    best: Genome | None = None
    run_dir: str | None = None
    task_context: str = ""
    signals: dict = {}                  # adaptive policies stash cross-cutting state here
    started_at: float

    def record_cost(self, cost_usd, prompt_tokens=0, completion_tokens=0) -> None
    @property
    def elapsed_s(self) -> float

Budget

@dataclass
class Budget:
    max_iterations: int = 100
    max_usd: float | None = None
    target_score: float | None = None
    patience: int | None = None         # stop after N iters with no best-score gain
    wallclock_s: float | None = None

The run stops as soon as any configured bound is hit. Built from the config's budget section.

RunResult

@dataclass
class RunResult:
    best: Genome | None
    summary: dict = {}                  # {scaffold, task, iterations, evaluations, best_score, cost_usd, no_diff, population_size}
    run_dir: str | None = None
    history: list[Genome] = []          # the seed + every evaluated child, in order

    @property
    def best_score(self) -> float       # best.fitness, or -inf

Loaders & registry

load_model(name=None, host=None, *, path=None, base_url=None, **kw) -> GalapagosModel
load_config(scaffold_name=None, *, path=None, **overrides) -> GalapagosConfig
load_scaffold(name=None, *, path=None, model=None, config=None, **kw) -> GalapagosScaffold
load_task(name=None, *, path=None) -> GalapagosTask

available_scaffolds() -> list[str]      # all bundled scaffold cards (all runnable)
available_tasks() -> list[str]          # all bundled task cards
registered_scaffolds() -> list[str]     # the runnable subset (== available_scaffolds() today)

@register_scaffold("name")              # decorator: wire a Scaffold subclass to its card name
AutoScaffold.from_card(name, ...)       # name -> runnable scaffold (used internally by load_scaffold)

The functional load_* functions are thin aliases for the corresponding *.from_card classmethods.

Cards

from galapagos.cards.registry import (
    load_scaffold_card, load_task_card, available_scaffolds, available_tasks,
)
from galapagos.cards.schema import ScaffoldCard, TaskCard, ModelCard, VerificationCard

See Scaffold & Task cards for the schemas.