Python API¶
The package is galapagos (pip install name: galapagos; editable: pip install -e .). The public
surface is small — import everything from the top level.
import galapagos as gx
# primary classes
gx.GalapagosModel, gx.GalapagosConfig, gx.GalapagosScaffold, gx.GalapagosTask
gx.AdaEvolveScaffold, gx.BeamSearchScaffold, gx.BestOfNScaffold, gx.EvoXScaffold, gx.OpenEvolveScaffold, gx.TopKScaffold
# records
gx.Genome, gx.Selection, gx.EvalResult, gx.RunState, gx.RunResult, gx.Budget
# registry + functional loaders
gx.available_scaffolds, gx.available_tasks, gx.registered_scaffolds
gx.AutoScaffold, gx.register_scaffold
gx.load_scaffold, gx.load_model, gx.load_task, gx.load_config
Primary classes¶
GalapagosModel¶
Resolve a model from a name + host (or a model-card YAML at path). Hosts: openai,
openrouter, togetherai (alias together), litellm, vllm, huggingface (alias hf),
azure, bedrock, anthropic, google. The API key is read from OPENAI_API_KEY. Subclasses
implement generate(prompt: Prompt) -> Generation.
model = gx.GalapagosModel.from_card(name="openai/gpt-4o-mini", host="openrouter")
model = gx.load_model("openai/gpt-4o-mini", host="openrouter") # functional alias
GalapagosConfig¶
GalapagosConfig.from_config(scaffold_name=None, *, path=None, **overrides) -> GalapagosConfig
cfg.get(dotted, default=None) -> Any
cfg.set(dotted, value) -> GalapagosConfig
cfg.section(name) -> dict
cfg.as_dict() -> dict
A thin config object over a nested dict, accessed by dotted paths.
from_config(scaffold_name="openevolve") loads the scaffold's bundled defaults;
from_config(path="cfg.yaml") loads a file.
cfg = gx.GalapagosConfig.from_config(scaffold_name="openevolve")
cfg.set("database.num_islands", 8).set("budget.max_iterations", 200)
cfg.get("budget.max_iterations") # -> 200
GalapagosScaffold¶
The orchestrator that drives the six components around the loop.
GalapagosScaffold.from_card(name=None, *, path=None, config=None, model=None,
population=None, selection_policy=None, prompt_builder=None,
proposer=None, evaluator=None, memory=None, seed=None, **kw) -> GalapagosScaffold
scaffold.run(task=None, *, max_iterations=None) -> RunResult
Three load modes: by name (from_card("openevolve", model=...) — dispatches via the registry),
subclass defaults (OpenEvolveScaffold.from_card() — loads its own card + config + default
model), and build-your-own (pass component instances / module.Class paths / .py paths to the
six role kwargs).
@classmethod
def build_components(cls, config, model) -> dict # the five scaffold-side components (override in a subclass)
# adaptation hooks (no-ops by default):
def before_step(self) -> None: ... # before selection
def after_step(self, child: Genome, result) -> None: ... # after eval (result is None on a no-op)
def periodic(self) -> None: ... # once per iteration, after the step
The seven runnable subclasses are AdaEvolveScaffold, BeamSearchScaffold, BestOfNScaffold,
EvoXScaffold, MetaHarnessScaffold, OpenEvolveScaffold, and TopKScaffold. See
Write your own scaffold.
GalapagosTask¶
GalapagosTask.from_card(name=None, *, path=None) -> GalapagosTask
task.context -> str # the problem statement injected into prompts
task.status -> str # 'stable' | 'experimental' | 'spec' | 'external'
task.runnable -> bool # True iff it ships a seed + evaluator
task.initial_genome() -> Genome # the seed Genome
task.evaluator -> Evaluator | None # a SubprocessEvaluator over the task's evaluator.py
The task supplies the Evaluator (not the scaffold), so any scaffold runs against any task.
The six components¶
from galapagos.components import .... Every component is an abstract base with shipped impls.
Population¶
class Population: # the candidate store
def add(self, genome: Genome) -> bool: ... # returns whether admitted
def query(self, spec: dict | None = None) -> list[Genome]: ...
def all(self) -> list[Genome]: ...
def best(self) -> Genome | None: ...
| Impl | Purpose |
|---|---|
InMemoryPopulation(capacity=1000) |
A bounded top-k / leaderboard list kept sorted by fitness. |
IslandPopulation(num_islands=4, migration_interval=25, migration_rate=2, descriptor=None) |
Islands of MAP-Elites cells with periodic ring migration. |
SelectionPolicy¶
class SelectionPolicy: # the active, stateful policy
def select(self, population, state: RunState | None = None) -> Selection: ...
def observe(self, genome: Genome, state: RunState | None = None) -> None: ...
| Impl | Purpose |
|---|---|
ExploreExploitPolicy(seed=0, explore_ratio=0.3, num_inspirations=3) |
Explore/exploit split + fitness-weighted exploit; diverse inspirations. |
UCBBanditPolicy(seed=0, num_islands=4, c=1.4, num_inspirations=2) |
UCB1 routing over islands; mirrors posteriors into state.signals['ucb']. |
IdentityPolicy() |
Delegated/agent-driven: returns the whole population, defers the choice to the Proposer. |
PromptBuilder¶
class PromptBuilder: # the renderer (pure formatting, no selection)
def build(self, selection: Selection, memory=None, state: RunState | None = None) -> Prompt: ...
| Impl | Purpose |
|---|---|
DefaultPromptBuilder(system_message=None, max_inspiration_chars=600, include_memory=True) |
The canonical multi-section template (task → metrics → feedback → inspirations → memory → current program). |
Proposer¶
Env(model, selection, evaluator=None, memory=None, state=None) is the toolbox handed to a Proposer.
| Impl | Purpose |
|---|---|
DiffProposer() |
One LLM call → SEARCH/REPLACE diff (or full rewrite) applied to the parent; no-op detection. |
CrossoverProposer(novelty_threshold=0.9, recent=12) |
Crossover + token-Jaccard novelty rejection (one resample on near-duplicates). |
Helper: apply_edit(parent_code, response) -> (new_code, changed).
Evaluator¶
class Evaluator: # the pure scorer (supplied by the task)
def evaluate(self, genome: Genome) -> EvalResult: ...
| Impl | Purpose |
|---|---|
SubprocessEvaluator(evaluator_path, timeout=120, suffix=".py") |
Runs the task's evaluator.py in an isolated subprocess. |
Task evaluator contract: evaluate(program_path) -> dict with at least combined_score (float),
and optional validity / status / per_instance / artifacts.text_feedback.
Memory¶
class Memory: # free-form knowledge (optional)
def read(self, spec: dict | None = None) -> str: ...
def write(self, knowledge: str, **meta) -> None: ...
| Impl | Purpose |
|---|---|
NullMemory() |
The empty memory — the default. |
ScratchpadMemory(max_notes=8) |
A rolling meta-scratchpad of distilled design insights. |
Records¶
from galapagos import Genome, Selection, EvalResult, RunState, RunResult, Budget
Genome¶
@dataclass
class Genome:
content: str # the artifact being evolved (code, prompts, config, ...)
id: str # auto-assigned 'g000001'
parent_id: str | None = None
lineage: str = ""
scores: dict[str, float] = {} # filled by the Evaluator
metadata: dict = {} # selection data: island, cell, generation, embeddings, ...
artifacts: dict = {} # evaluator side-output (text_feedback, traces)
@property
def fitness(self) -> float # scores['combined_score'], else mean of numeric scores, else -inf
def child(self, content, **metadata) -> Genome # descendant with lineage wired up
Selection¶
@dataclass
class Selection:
parent: Genome | None # the parent to mutate (None => delegated selection)
inspirations: list[Genome] = [] # context-only inspirations
pool: list[Genome] = [] # the full visible population (for delegated selection)
EvalResult¶
@dataclass
class EvalResult:
metrics: dict[str, float] = {} # must contain 'combined_score'
artifacts: dict = {}
valid: bool = True # gates admission
per_instance: list[float] | None = None # per-test-case success vector
text_feedback: str | None = None # surfaced into later prompts
@property
def combined_score(self) -> float
RunState¶
@dataclass
class RunState:
iteration: int = 0
cost_usd: float = 0.0
prompt_tokens: int = 0
completion_tokens: int = 0
best: Genome | None = None
run_dir: str | None = None
task_context: str = ""
signals: dict = {} # adaptive policies stash cross-cutting state here
started_at: float
def record_cost(self, cost_usd, prompt_tokens=0, completion_tokens=0) -> None
@property
def elapsed_s(self) -> float
Budget¶
@dataclass
class Budget:
max_iterations: int = 100
max_usd: float | None = None
target_score: float | None = None
patience: int | None = None # stop after N iters with no best-score gain
wallclock_s: float | None = None
The run stops as soon as any configured bound is hit. Built from the config's budget section.
RunResult¶
@dataclass
class RunResult:
best: Genome | None
summary: dict = {} # {scaffold, task, iterations, evaluations, best_score, cost_usd, no_diff, population_size}
run_dir: str | None = None
history: list[Genome] = [] # the seed + every evaluated child, in order
@property
def best_score(self) -> float # best.fitness, or -inf
Loaders & registry¶
load_model(name=None, host=None, *, path=None, base_url=None, **kw) -> GalapagosModel
load_config(scaffold_name=None, *, path=None, **overrides) -> GalapagosConfig
load_scaffold(name=None, *, path=None, model=None, config=None, **kw) -> GalapagosScaffold
load_task(name=None, *, path=None) -> GalapagosTask
available_scaffolds() -> list[str] # all bundled scaffold cards (all runnable)
available_tasks() -> list[str] # all bundled task cards
registered_scaffolds() -> list[str] # the runnable subset (== available_scaffolds() today)
@register_scaffold("name") # decorator: wire a Scaffold subclass to its card name
AutoScaffold.from_card(name, ...) # name -> runnable scaffold (used internally by load_scaffold)
The functional load_* functions are thin aliases for the corresponding *.from_card classmethods.
Cards¶
from galapagos.cards.registry import (
load_scaffold_card, load_task_card, available_scaffolds, available_tasks,
)
from galapagos.cards.schema import ScaffoldCard, TaskCard, ModelCard, VerificationCard
See Scaffold & Task cards for the schemas.