Scaffolds¶
The catalog ships 8 scaffold cards — all runnable. Each card lives next to the runnable code
at src/galapagos/scaffolds/<name>/card.yaml, alongside its config.yaml, README.md, and
component modules — one self-contained folder per scaffold. List them at runtime:
import galapagos as gx
gx.available_scaffolds() # every bundled card: ['adaevolve', 'beam_search', 'best_of_n', 'best_of_n_attempts', 'evox', 'meta_harness', 'openevolve', 'topk']
gx.registered_scaffolds() # the runnable subset — the same eight
One self-contained folder
A bundled scaffold is a single folder, src/galapagos/scaffolds/<name>/, holding everything:
the card (card.yaml, mapping the method onto the six components),
the code (the controller class plus the component modules), its config.yaml defaults, and a
human-readable README.md. Loading an unknown name raises a clear KeyError listing the
runnable set.
Evolutionary methods (3)¶
| Name | Display | Status | Summary |
|---|---|---|---|
openevolve |
OpenEvolve | runnable | Island-model MAP-Elites evolutionary search with diff mutation (the open AlphaEvolve). |
adaevolve |
AdaEvolve | runnable | Hierarchical adaptive search: G-signal exploration intensity, UCB island allocation, and LLM meta-guidance on stagnation. |
evox |
EvoX | runnable | Co-evolves the search strategy with the solutions: the parent/context selection policy is itself LLM-written code, scored by windowed improvement and hot-swapped on stagnation. |
AdaEvolve ("AdaEvolve: Adaptive LLM-Driven Zeroth-Order Optimization", UC Berkeley) treats
evolutionary program search as a hierarchical adaptive optimization problem driven by one signal —
the fitness-improvement trajectory. Level 1: each island's exploration intensity adapts via an
accumulated improvement signal (AdaGrad-style). Level 2: a UCB bandit with decayed,
globally-normalized rewards allocates iterations across islands, with ring migration and dynamic
island spawning on stagnation. Level 3 (meta-guidance): when improvement stalls, a separate LLM call
generates breakthrough "tactics" that are injected into mutation prompts and rotated until
exhausted. Ported from the reference implementation in SkyDiscover. Components:
qd_island_archipelago population / adaptive_intensity_ucb selection / adaevolve_template
prompts / diff proposer / task evaluator / paradigm_tactics memory.
EvoX ("EvoX: Meta-Evolution for Automated Discovery", UC Berkeley) co-evolves the search
strategy with the solutions: the parent/inspiration selection policy is itself LLM-written code (an
EvolvedStrategy class) that is scored by windowed improvement J = Δ·(1+ln(1+s_start))/√W,
validated by a behavioral test-suite, and hot-swapped with full population migration (plus runtime
fallback) whenever the best score stagnates for a window (default 10% of the budget).
Problem-specific DIVERGE/REFINE variation operators are generated once per run and injected as
parent labels. Ported from the reference implementation in SkyDiscover. Components:
evolved_strategy_store population / evolved_strategy_sampler selection /
operator_labeled_default prompts / diff proposer / task evaluator / strategy_history memory.
All carry type: test_time_search, tier: search.
SkyDiscover search baselines (3)¶
Three search strategies ported from SkyDiscover (UC Berkeley Sky Computing Lab) and grouped under
the SkyDiscover organization (repo_id: SkyDiscover/<Display>). Simple, fixed-rule references that
every adaptive method is compared against — all runnable.
| Name | Display | Status | Summary |
|---|---|---|---|
topk |
Top-K | runnable | Always expand the single best program, with the next K as context. Pure greedy elitism. |
best_of_n |
Best-of-N | runnable | Give the LLM N valid attempts at the same parent before committing to the global best, then repeat. |
best_of_n_attempts |
Best-of-N (attempts) | runnable | Best-of-N that rotates the parent every N attempts — failed/invalid tries spend the budget too. |
beam_search |
Beam Search | runnable | Maintain a fixed-width beam of promising programs; expand one per step, prune by fitness+diversity. |
best_of_n_attempts is a Galapagos variant of best_of_n (attempt-counted budget — every try, valid
or not, spends one of the parent's N), not a SkyDiscover port; the "(3)" header counts the SkyDiscover
ports themselves.
Fidelity to the originals¶
topk / best_of_n (vs SkyDiscover) and openevolve (vs the open OpenEvolve / AlphaEvolve)
are faithful ports — they reproduce the originals' search behavior, not just their shape. The
behavior-determining details that match:
- Selection. Top-K parent = rank 1 with ranks 2..K+1 (and the lone seed as its own context on
step 1); Best-of-N reuses one parent until N valid children, then commits to the global best
(parent chosen by
safe_score, context drawn by the metric-meanget_score); OpenEvolve's round-robin islands + 3-tier explore/exploit/random sampling + island-uniform inspirations (the live parallel path). - Admission. SkyDiscover drops an errored child (
if result.error: continue), sotopk/best_of_nreject eval-invalid candidates; OpenEvolve keeps every evaluated child (including score-0/errored) in its MAP-Elites grid, soopenevolveadmits them. Each store owns this policy. - Mutation. Whole-line
SEARCH/REPLACEdiff application (exactly-7 markers, no full-rewrite fence fallback) matchingapply_diff; a non-matching block is a no-op (retried/discarded) for the SkyDiscover scaffolds. - Population. OpenEvolve's MAP-Elites feature binning (running min/max), global archive + worst-eviction, population cap, and lazy ring migration on island-generation counters.
- Evaluation & validity. Cascade
evaluate_stageNwith per-task thresholds; the validity gate mirrors SkyDiscover's discard rules; OpenEvolve's per-candidate evaluator retries. - Prompt. A generic system message with the diff format in the user
# Task(SkyDiscover), and OpenEvolve's# Program Evolution History(Previous Attempts / Top / Diverse / Inspiration) sections, Focus-areas trend, and full-source context programs. - Defaults.
num_islands=5,archive_size=100,population_size=1000, migration50/0.1,feature_bins=10,num_inspirations, the explore/exploit ratios,max_iterations=100,inner_retry_times=3(SkyDiscover) /max_code_length=10000+seed=42+ weighted model-ensemble support (OpenEvolve).
Two differences are intrinsic to the re-architecture, not faithfulness gaps: an exact run-for-run random sequence cannot match (Galapagos uses an isolated, seeded RNG rather than the originals' global RNG), and Galapagos runs sequentially by default (a deterministic analog of OpenEvolve's process-pool snapshot staleness; a parallel mode is available).
Card fields¶
Each scaffold card (a ScaffoldCard) records: name, display_name,
type, tier, status, summary, description, source (paper/repo), tags, license,
controller (the dotted Scaffold-subclass path), components (which
implementation fills each of the six slots), model ({default, host, roles}), requirements,
defaults_config, and examples.
from galapagos.cards.registry import load_scaffold_card
card = load_scaffold_card("openevolve")
card.controller # 'galapagos.scaffolds.openevolve.scaffold.OpenEvolveScaffold'
card.components.selection_policy # {'kind': 'three_tier_explore_exploit'}
To run a runnable scaffold, see Run a scaffold. To author a new one, see Write your own scaffold.