Skip to content

Scaffolds

The catalog ships 8 scaffold cardsall runnable. Each card lives next to the runnable code at src/galapagos/scaffolds/<name>/card.yaml, alongside its config.yaml, README.md, and component modules — one self-contained folder per scaffold. List them at runtime:

import galapagos as gx
gx.available_scaffolds()   # every bundled card: ['adaevolve', 'beam_search', 'best_of_n', 'best_of_n_attempts', 'evox', 'meta_harness', 'openevolve', 'topk']
gx.registered_scaffolds()  # the runnable subset — the same eight
galapagos scaffold list

One self-contained folder

A bundled scaffold is a single folder, src/galapagos/scaffolds/<name>/, holding everything: the card (card.yaml, mapping the method onto the six components), the code (the controller class plus the component modules), its config.yaml defaults, and a human-readable README.md. Loading an unknown name raises a clear KeyError listing the runnable set.

Evolutionary methods (3)

Name Display Status Summary
openevolve OpenEvolve runnable Island-model MAP-Elites evolutionary search with diff mutation (the open AlphaEvolve).
adaevolve AdaEvolve runnable Hierarchical adaptive search: G-signal exploration intensity, UCB island allocation, and LLM meta-guidance on stagnation.
evox EvoX runnable Co-evolves the search strategy with the solutions: the parent/context selection policy is itself LLM-written code, scored by windowed improvement and hot-swapped on stagnation.

AdaEvolve ("AdaEvolve: Adaptive LLM-Driven Zeroth-Order Optimization", UC Berkeley) treats evolutionary program search as a hierarchical adaptive optimization problem driven by one signal — the fitness-improvement trajectory. Level 1: each island's exploration intensity adapts via an accumulated improvement signal (AdaGrad-style). Level 2: a UCB bandit with decayed, globally-normalized rewards allocates iterations across islands, with ring migration and dynamic island spawning on stagnation. Level 3 (meta-guidance): when improvement stalls, a separate LLM call generates breakthrough "tactics" that are injected into mutation prompts and rotated until exhausted. Ported from the reference implementation in SkyDiscover. Components: qd_island_archipelago population / adaptive_intensity_ucb selection / adaevolve_template prompts / diff proposer / task evaluator / paradigm_tactics memory.

EvoX ("EvoX: Meta-Evolution for Automated Discovery", UC Berkeley) co-evolves the search strategy with the solutions: the parent/inspiration selection policy is itself LLM-written code (an EvolvedStrategy class) that is scored by windowed improvement J = Δ·(1+ln(1+s_start))/√W, validated by a behavioral test-suite, and hot-swapped with full population migration (plus runtime fallback) whenever the best score stagnates for a window (default 10% of the budget). Problem-specific DIVERGE/REFINE variation operators are generated once per run and injected as parent labels. Ported from the reference implementation in SkyDiscover. Components: evolved_strategy_store population / evolved_strategy_sampler selection / operator_labeled_default prompts / diff proposer / task evaluator / strategy_history memory.

All carry type: test_time_search, tier: search.

SkyDiscover search baselines (3)

Three search strategies ported from SkyDiscover (UC Berkeley Sky Computing Lab) and grouped under the SkyDiscover organization (repo_id: SkyDiscover/<Display>). Simple, fixed-rule references that every adaptive method is compared against — all runnable.

Name Display Status Summary
topk Top-K runnable Always expand the single best program, with the next K as context. Pure greedy elitism.
best_of_n Best-of-N runnable Give the LLM N valid attempts at the same parent before committing to the global best, then repeat.
best_of_n_attempts Best-of-N (attempts) runnable Best-of-N that rotates the parent every N attempts — failed/invalid tries spend the budget too.
beam_search Beam Search runnable Maintain a fixed-width beam of promising programs; expand one per step, prune by fitness+diversity.

best_of_n_attempts is a Galapagos variant of best_of_n (attempt-counted budget — every try, valid or not, spends one of the parent's N), not a SkyDiscover port; the "(3)" header counts the SkyDiscover ports themselves.

Fidelity to the originals

topk / best_of_n (vs SkyDiscover) and openevolve (vs the open OpenEvolve / AlphaEvolve) are faithful ports — they reproduce the originals' search behavior, not just their shape. The behavior-determining details that match:

  • Selection. Top-K parent = rank 1 with ranks 2..K+1 (and the lone seed as its own context on step 1); Best-of-N reuses one parent until N valid children, then commits to the global best (parent chosen by safe_score, context drawn by the metric-mean get_score); OpenEvolve's round-robin islands + 3-tier explore/exploit/random sampling + island-uniform inspirations (the live parallel path).
  • Admission. SkyDiscover drops an errored child (if result.error: continue), so topk/ best_of_n reject eval-invalid candidates; OpenEvolve keeps every evaluated child (including score-0/errored) in its MAP-Elites grid, so openevolve admits them. Each store owns this policy.
  • Mutation. Whole-line SEARCH/REPLACE diff application (exactly-7 markers, no full-rewrite fence fallback) matching apply_diff; a non-matching block is a no-op (retried/discarded) for the SkyDiscover scaffolds.
  • Population. OpenEvolve's MAP-Elites feature binning (running min/max), global archive + worst-eviction, population cap, and lazy ring migration on island-generation counters.
  • Evaluation & validity. Cascade evaluate_stageN with per-task thresholds; the validity gate mirrors SkyDiscover's discard rules; OpenEvolve's per-candidate evaluator retries.
  • Prompt. A generic system message with the diff format in the user # Task (SkyDiscover), and OpenEvolve's # Program Evolution History (Previous Attempts / Top / Diverse / Inspiration) sections, Focus-areas trend, and full-source context programs.
  • Defaults. num_islands=5, archive_size=100, population_size=1000, migration 50/0.1, feature_bins=10, num_inspirations, the explore/exploit ratios, max_iterations=100, inner_retry_times=3 (SkyDiscover) / max_code_length=10000 + seed=42 + weighted model-ensemble support (OpenEvolve).

Two differences are intrinsic to the re-architecture, not faithfulness gaps: an exact run-for-run random sequence cannot match (Galapagos uses an isolated, seeded RNG rather than the originals' global RNG), and Galapagos runs sequentially by default (a deterministic analog of OpenEvolve's process-pool snapshot staleness; a parallel mode is available).

Card fields

Each scaffold card (a ScaffoldCard) records: name, display_name, type, tier, status, summary, description, source (paper/repo), tags, license, controller (the dotted Scaffold-subclass path), components (which implementation fills each of the six slots), model ({default, host, roles}), requirements, defaults_config, and examples.

from galapagos.cards.registry import load_scaffold_card
card = load_scaffold_card("openevolve")
card.controller    # 'galapagos.scaffolds.openevolve.scaffold.OpenEvolveScaffold'
card.components.selection_policy   # {'kind': 'three_tier_explore_exploit'}

To run a runnable scaffold, see Run a scaffold. To author a new one, see Write your own scaffold.