Run a scaffold¶
Running a discovery method is four objects — a model, a config, a scaffold, and a task — and one call. Everything else is optional tuning.
import galapagos as gx
model = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")
config = gx.GalapagosConfig.from_config(scaffold_name="openevolve")
scaffold = gx.GalapagosScaffold.from_card(name="openevolve", config=config, model=model)
task = gx.GalapagosTask.from_card(name="circle_packing")
result = scaffold.run(task=task)
print(result.best_score) # best combined_score found
print(result.best.content) # the winning program (a string)
scaffold.run(task=...) drives the six-component loop: select
parents from the Population → build a prompt → propose a candidate → evaluate it → add
the scored Genome back → repeat until the budget is spent.
The four objects¶
Model¶
GalapagosModel.from_card(name=..., host=...) resolves a hosted, OpenAI-compatible endpoint. The
supported hosts are openai, openrouter, togetherai (alias together), litellm, vllm,
huggingface (alias hf), azure, bedrock, anthropic, and google. The API key is read from
the environment: OPENAI_API_KEY for all hosts (see the host/env-var tables in
Models).
Every run calls a live LLM, so set an OpenRouter key first: Galapagos reads it from OPENAI_API_KEY
(see Installation).
Config¶
GalapagosConfig.from_config(scaffold_name=...) loads the scaffold's bundled default config; or pass
path="cfg.yaml" for your own. Read and override tunables with dotted paths:
config = gx.GalapagosConfig.from_config(scaffold_name="openevolve")
config.set("database.num_islands", 8)
config.set("budget.max_iterations", 200)
config.get("budget.max_iterations") # -> 200
The budget section maps onto the stopping conditions (see below).
Scaffold¶
Three ways to construct one:
# 1. by name via the registry (the base class dispatches)
scaffold = gx.GalapagosScaffold.from_card(name="openevolve", config=config, model=model)
# 2. a concrete subclass loads its own card + defaults (config/model optional)
scaffold = gx.OpenEvolveScaffold.from_card(model=model)
# 3. build-your-own from components (see "Write your own scaffold")
scaffold = gx.GalapagosScaffold.from_card(population=..., selection_policy=..., proposer=...)
List the runnable scaffolds at any time:
gx.available_scaffolds() # every bundled card: ['adaevolve', 'beam_search', 'best_of_n', 'best_of_n_attempts', 'evox', 'meta_harness', 'openevolve', 'topk']
gx.registered_scaffolds() # the runnable subset — the same eight
All bundled scaffolds are runnable
The catalog ships 8 cards — adaevolve, beam_search, best_of_n, best_of_n_attempts, evox,
meta_harness, openevolve, and topk — and every one has a runnable Python controller.
GalapagosScaffold.from_card("nope", ...) raises a clear KeyError listing the runnable set.
Task¶
GalapagosTask.from_card(name=...) loads the problem statement, the seed program, and the
Evaluator. The Evaluator is supplied by the task, not the scaffold — so any scaffold runs against
any task.
task = gx.GalapagosTask.from_card(name="circle_packing")
task.context # the problem text injected into prompts
task.runnable # True iff it ships a seed + evaluator.py
task.status # 'stable'
task.initial_genome() # the seed Genome
The catalog bundles 64 runnable tasks; circle_packing, function_minimization, and
playground_sphere are the canonical quickstart examples. See the
task catalog.
The budget¶
The run stops as soon as any configured bound is hit. Set them on the config's budget section,
or override the iteration count inline on run:
config.set("budget.max_iterations", 100) # cap on iterations
config.set("budget.target_score", 1.0) # stop early once reached
config.set("budget.max_usd", 5.0) # hard $ ceiling (live model calls)
config.set("budget.patience", 30) # stop after N iters with no best-score gain
config.set("budget.wallclock_s", 600) # stop after N seconds
result = scaffold.run(task=task, max_iterations=50) # inline override of max_iterations
Reading RunResult¶
run returns a RunResult:
result = scaffold.run(task=task)
result.best # the best Genome (or None)
result.best_score # result.best.fitness, or -inf
result.history # list[Genome] — the seed + every evaluated child, in order
result.run_dir # run directory (if the scaffold set one)
result.summary # a dict, e.g.:
{
"scaffold": "openevolve",
"task": "circle_packing",
"iterations": 100, # loop steps taken
"evaluations": 94, # genomes evaluated = 1 (seed) + iterations - no_diff
"best_score": 2.61, # best combined_score
"cost_usd": 0.42, # accumulated model spend
"no_diff": 7, # wasted steps where the Proposer returned a no-op
"population_size": 40, # genomes currently in the Population
}
The winning artifact is result.best.content (a string of source code). Its metric dict is
result.best.scores and the headline number is result.best.fitness (==
result.best.scores["combined_score"]).
A short, cheap run¶
Every run calls a live LLM and spends budget. Keep an exploratory run small by starting on the tiny
playground_sphere task and capping the iteration count:
import galapagos as gx
model = gx.load_model("openai/gpt-4o-mini", host="openrouter")
scaffold = gx.OpenEvolveScaffold.from_card(model=model)
task = gx.load_task("playground_sphere") # the fastest task
result = scaffold.run(task=task, max_iterations=20)
print(result.best_score) # > the seed score
The CLI¶
The galapagos console script wraps the same flow. --model is required, and the run reads your
OpenRouter key from OPENAI_API_KEY.
# a short run on the smallest task
galapagos run --scaffold openevolve --task playground_sphere \
--model openai/gpt-4o-mini --host openrouter --iters 20
# a longer run via OpenRouter
galapagos run --scaffold openevolve --task circle_packing \
--model openai/gpt-5.5 --host openrouter --iters 100
# point at a custom config YAML, set the seed
galapagos run --scaffold adaevolve --task function_minimization \
--model openai/gpt-4o-mini --host openrouter --config my_config.yaml --seed 7
# inspect the catalogs
galapagos scaffold list
galapagos task list
galapagos run prints the final best_score and the summary JSON. Use
galapagos submit to validate a card.