OpenEvolve/openevolve

OpenEvolve

Island-model MAP-Elites evolutionary search with diff mutation (the open AlphaEvolve).

Test-time searchApache-2.0

# OpenEvolve (`openevolve`) > Island-model MAP-Elites evolutionary search with diff mutation (the open AlphaEvolve). ## Overview OpenEvolve is the open-source re-implementation of Google DeepMind's AlphaEvolve, and in Galapagos it serves as the canonical evolutionary baseline that every other method is compared against. It evolves a population of programs by repeatedly picking a parent, showing an LLM the parent plus a handful of inspiring neighbours, and asking for a small SEARCH/REPLACE diff that improves the program's measured performance. The population is not a flat pool but a quality-diversity store. Programs are spread across several islands, and within each island they are placed into the cells of a MAP-Elites grid whose axes are behaviour descriptors — by default a complexity axis (code length) and a diversity axis (a fast token/length/char distance from a reference sample). Each cell keeps only its single fittest elite, so the grid simultaneously preserves high performers and structurally distinct solutions, while a separate global archive and the full program store back the exploit and random sampling tiers. Parent selection is a three-tier explore/exploit scheme: most of the time the method exploits a strong program drawn from the global archive, sometimes it explores a uniform-random program from the current island, and occasionally it takes a fitness-weighted draw. Islands evolve in round-robin rotation and stay loosely coupled through periodic ring migration, where the best programs of each island are copied to its two neighbours. This island-plus-migration structure is what keeps the search from collapsing onto a single lineage and gives OpenEvolve its characteristic breadth. This Galapagos scaffold is a faithful port of the upstream `ProgramDatabase` and parallel controller: the feature scaling uses global running min/max, migration is lazy on per-island generation counters, and the proposer applies the LLM's SEARCH/REPLACE diff by **whole-line matching** (the reference `apply_diff`), with a response carrying no valid diff block treated as a no-op — exactly as the reference discards it in `diff_based` mode (no full-rewrite fallback). ## Algorithm Each iteration rotates to the next island, samples a parent and its inspirations from that island's MAP-Elites grid, renders the multi-section diff prompt, applies the LLM's SEARCH/REPLACE diff, evaluates the child, and admits it into the grid. After every admitted child the island's generation counter is bumped, and when the lead island has advanced `migration_interval` generations past the last migration, the best fraction of each island is copied to its ring neighbours. 1. **Select** — advance `current_island` round-robin. Draw the parent by tier: with `exploration_ratio` (0.2) a uniform-random program from the current island; with `exploitation_ratio` (0.7) a uniform pick from the global archive (preferring archive members already on this island); otherwise a fitness-weighted draw from the island. Build up to `num_inspirations` inspirations: the island best, then the top `elite_selection_ratio` of the island, then diverse picks from nearby grid cells, then random fill. 2. **Prompt** — render the OpenEvolve diff template: current metrics, areas for improvement, execution feedback, evolution history (previous attempts + top programs), inspirations, then the current program kept in the last `python` fence so the diff can be located, then the SEARCH/REPLACE task instruction. 3. **Propose** — apply the LLM's SEARCH/REPLACE diff to the parent by whole-line matching; a response with no valid diff block (or a no-op edit) is reported as a no-diff step (matching the reference's diff-mode discard). 4. **Evaluate** — run the task-supplied evaluator on the child. If the task defines `evaluate_stageN` functions, evaluation cascades: each stage runs only if the previous one cleared its `cascade_thresholds` gate, and the stages' metrics/artifacts are merged (matching OpenEvolve's `_cascade_evaluate`). 5. **Admit** — compute the child's MAP-Elites cell; if the cell is empty or the child is fitter than its incumbent, the child takes the cell (and inherits archive membership if it displaced an archived elite). Update the global archive, enforce the population cap, and refresh global/island bests. 6. **After step** — increment the child's island generation; if `max(island_generations) - last_migration >= migration_interval`, run ring migration. ```text for iteration in range(max_iterations): island = population.next_island() # round-robin rotation r = rng.random() if r < exploration_ratio: parent = uniform(island_members) # explore elif r < exploration_ratio + exploit_ratio: parent = uniform(archive ∩ island) # exploit else: parent = fitness_weighted(island) # random parent.metadata["island"] = island inspirations = [island_best] + top(elite_selection_ratio·island) \ + diverse_nearby_cells(parent) + random_fill # capped at num_inspirations prompt = render(metrics, improvement, feedback, history, inspirations, parent_code, task) child = apply_search_replace_diff(parent, LLM(prompt)) # whole-line; no diff block → no-op if child is no_op: record no_diff; continue scores = task_evaluator(child) cell = feature_coords(child) # minmax-scaled descriptors if cell empty or fitness(child) > fitness(incumbent[cell]): place child in cell; update archive, island_best, global_best island_generations[island] += 1 if max(island_generations) - last_migration >= migration_interval: migrate_top(migration_rate) to ring neighbours (i±1) % num_islands last_migration = max(island_generations) ``` ## Components A Galapagos scaffold composes six components — Population, SelectionPolicy, PromptBuilder, Proposer, Evaluator, and Memory — and the controller wires them into the select → prompt → propose → evaluate → admit loop. OpenEvolve supplies a distinct, faithful subclass for every slot except the evaluator (task-supplied) and memory (none). | Slot | Implementation | Role | |---|---|---| | Population | `MapElitesIslandsPopulation` (`map_elites_islands`) | Per-island MAP-Elites grids over the feature dimensions + global elite archive + lazy ring migration; one elite per cell, all programs retained for exploit/random tiers. | | SelectionPolicy | `OpenEvolveSelectionPolicy` (`three_tier_explore_exploit`) | Round-robin island rotation, 3-tier explore/exploit/weighted parent sampling, and island-best/elite/diverse/random inspirations. | | PromptBuilder | `OpenEvolvePromptBuilder` (`openevolve_template`) | Renders the live OpenEvolve `diff_user.txt` template — `# Current Program Information` (Fitness/Feature coordinates/Focus areas), execution artifacts, `# Program Evolution History` (Previous Attempts + Top Performing + Diverse + Inspiration programs), then the current program last for the diff. | | Proposer | `OpenEvolveProposer` → `DiffProposer` (`diff`) | Applies an LLM SEARCH/REPLACE diff by whole-line matching (`apply_diff`); no valid diff block → no-op (no full-rewrite fallback). | | Evaluator | task-supplied (`SubprocessEvaluator`) | The task's scorer; OpenEvolve does not bundle one. Runs cascade evaluation (`evaluate_stage1/2/3` with early-abort on the task's `cascade_thresholds`) when the task defines stages, else single-shot. | | Memory | `NullMemory` (`none`) | No cross-candidate memory; all history lives in the population. | ## Configuration Keys this scaffold actually reads (from `config.yaml` + `build_components`): - `seed` (0) — RNG seed for the selection policy. - `general.max_iterations` (100) — number of evolve steps. - `population.num_islands` (5) — island count (also read by the selection policy). - `population.archive_size` (100) — global elite archive capacity; also floors `feature_bins` so the grid can hold the archive. - `population.population_size` (1000) — total retained-program cap before worst-first eviction. - `population.feature_dimensions` (`[complexity, diversity]`) — MAP-Elites behaviour axes (also read by selection and prompt builder). - `population.feature_bins` (10) — bins per feature axis. - `population.migration_interval` (50) — migrate every N island generations. - `population.migration_rate` (0.1) — top fraction of each island copied to ring neighbours. - `population.diversity_reference_size` (20) — size of the reference sample for the diversity descriptor. - `selection_policy.exploration_ratio` (0.2) — P(uniform-random parent from the current island). - `selection_policy.exploitation_ratio` (0.7) — P(parent from the global archive). - `selection_policy.elite_selection_ratio` (0.1) — top fraction of the island used as inspirations. - `selection_policy.num_inspirations` (3) — inspiration count, also reused as the prompt's `num_top_programs`. - `selection_policy.num_diverse` (2) — diverse-program slots in the prompt. - `selection_policy.num_diverse` (2) — number of `## Diverse Programs` shown in the prompt (`prompt.num_diverse_programs`). Optionally, the run-level `general.inner_retry_times` (1) controls how many proposer attempts a single step makes before recording a no-diff. Upstream's `diff_based`, `allow_full_rewrite`, and `max_code_length` are intentionally not exposed: the proposer is hard-wired to whole-line SEARCH/REPLACE diffs (no full-rewrite fallback). Cascade evaluation is task-owned: when the task evaluator defines `evaluate_stageN`, this scaffold runs them with OpenEvolve's `cascade_thresholds` defaults `[0.5, 0.75, 0.9]` (a task card can override via its `evaluation` block); a task without stages is evaluated single-shot. ## When to use Reach for OpenEvolve when you want the well-understood, breadth-first evolutionary baseline: the island + MAP-Elites structure resists premature convergence and maintains a diverse frontier of solutions, which pays off on long open-ended runs and on tasks where structurally different programs matter. It is the right default reference point when benchmarking a new method. The trade-off is overhead and slower early progress — the explore/diversity machinery spends budget maintaining the grid rather than greedily hill-climbing. When you have a tight budget or a single sharp objective, a leaner sibling like `topk` (greedy top-1 with context) or `best_of_n` (flat parent reuse) will usually reach a good solution faster, while methods like `adaevolve` layer adaptive parent-selection signals on top of a similar evolutionary core. ## Source OpenEvolve — the open-source implementation of AlphaEvolve (Google DeepMind); this scaffold is a faithful Galapagos port of its `ProgramDatabase` and parallel island controller. Apache-2.0.