# AdaEvolve (`adaevolve`)

> Hierarchical adaptive search: G-signal exploration intensity, UCB island allocation, and LLM meta-guidance on stagnation.

## Overview

AdaEvolve reframes LLM-driven program evolution as hierarchical adaptive optimization driven by one quantity — the accumulated fitness-improvement signal G, an Adam-style second moment of normalized improvements. Rather than fixing exploration knobs by hand, the method lets each island measure how productively it has been improving and adapts its own search behaviour accordingly, then coordinates those islands with a bandit and rescues the global search with meta-guidance when it stalls.

Level 1 maps each island's G to an exploration intensity that splits parent sampling into explore, exploit, and balanced modes over per-island quality-diversity archives: an island that keeps finding big improvements drives its intensity down toward exploitation, while a saturated island pushes intensity back up to explore. Level 2 allocates iterations across islands with a decayed-reward UCB bandit — rewards are normalized against the global best rather than each island's local best, which fixes the bias that would otherwise starve a strong island, and both rewards and visit counts decay so the bandit tracks recent productivity. Ring migration shares elite programs between islands, and new islands are spawned from heterogeneous quality/diversity/Pareto presets when global productivity collapses. Level 3 detects stagnation via a windowed global-improvement rate and asks a guide LLM for breakthrough "paradigm" ideas that are injected into prompts and applied to the global best until they are exhausted.

This Galapagos implementation is a faithful port of the SkyDiscover reference implementation of AdaEvolve, which it treats as ground truth wherever the paper and code diverge. Following the code, G is updated only on improvement events (it is not decayed during stagnation), island spawning triggers on global productivity (improvements per evaluation below a threshold) rather than on a G threshold, and parent selection uses the 3-mode probability split P(explore) = I, P(exploit) = (1 − I)·0.7, P(balanced) = (1 − I)·0.3.

Two ablation switches from upstream are deliberately not exposed: the quality-diversity archive is always on (the legacy capped-list mode is not ported), and the proposer is hard-wired to SEARCH/REPLACE diffs with a full-rewrite fallback (the upstream default), so the diff-vs-rewrite knobs are not surfaced.

## Algorithm

Each iteration evolves the island that the UCB bandit chose at the previous end-of-iteration. The policy computes that island's exploration intensity from its G signal, draws an explore/exploit/balanced mode, samples a parent from the island's QD archive accordingly, assembles inspirations (local most-distant-from-parent programs plus a global top-fitness fill), builds the AdaEvolve diff prompt, proposes a SEARCH/REPLACE child, evaluates it, and admits it through the archive's crowding gate. End-of-iteration bookkeeping then runs every iteration (even on no-diff steps): a dynamic-spawn check, the next UCB island pick, periodic ring migration, and the Level-3 paradigm trigger.

The signature mechanisms: an island's intensity is `I = I_min + (I_max − I_min) / (1 + sqrt(G + ε))`, so a high G (recent large improvements) yields a low intensity (more exploit), and G itself is an EMA of squared normalized deltas `G ← ρ·G + (1 − ρ)·δ²` updated only on a new local best. UCB scores islands by `R/V + C·sqrt(ln(N+1)/n_raw)` where R (globally normalized reward) and V (visit count) both decay at ρ while the exploration bonus uses raw visits. Stagnation is the windowed global-improvement rate, not a G threshold: when the binary window of "did this admitted child set a new global best" is full, no active paradigm remains, and the rate is below the threshold, one guide-LLM call requests breakthrough ideas; while a paradigm is active the parent is overridden with the global best before inspirations are computed.

```text
# one-time: seed every island with a copy of the seed program (G stays -inf until first eval)

seed:  island 0 <- record_evaluation(seed.fitness) + one improvement tick (window starts [1.0]);
       islands 1+ <- receive_external_improvement(seed.fitness)        # migration adds, no UCB credit

each iteration:
    isl       = current_island                       # chosen by UCB at previous end_iteration

    # iteration START: Level-3 paradigm trigger (so the breakthrough lands on THIS iteration's child)
    if memory.is_stagnating():                        # full window, no active paradigm, rate < threshold
        ideas = guide_llm(paradigm_prompt(global_best))[:num_to_generate]
        if ideas: memory.install(ideas)               # else nothing installed; re-attempt next iter (no cooldown)

    # up to 1 + max_error_retries (= 3) attempts, RESAMPLING parent/mode/inspirations each attempt:
    for attempt in range(inner_retry_times):
        I       = intensity_min + (intensity_max - intensity_min) / (1 + sqrt(G[isl] + eps))
        r       = rand()
        mode    = "explore"  if r < I
                  "exploit"  if r < I + (1 - I)*0.7   # top-quartile parent
                  "balanced" otherwise                # 50/50 top-programs vs novelty-roulette
        parent  = archive[isl].sample_parent(mode)
        if paradigm_active: parent = population.best() # global best overrides the sampled parent
        insp    = archive[isl].most_distant_from(parent, local_n) + global_top(n - local_n, exclude={parent})
        prompt  = AdaEvolve template + search_guidance(feedback, paradigm block, siblings, RETRY ctx)
        child   = diff_propose(prompt)                # whole-line SEARCH/REPLACE, full-rewrite fallback
        if eval_failure(child):                       # validity in {0,-1} / 0-score-with-error
            retry_ctx = child.error; continue         # next attempt re-samples and carries the error
        break                                         # first non-failure child wins
    admitted  = archive[isl].add(child)               # crowding eviction (may still reject); all-fail -> no add

    if admitted and not migrated:                     # only an accepted non-migrant child updates state
        G[isl], local_best <- record_evaluation(child.fitness)      # G EMA, improvement-only
        UCB[isl]           <- R = rho*R + global_norm_delta; V = rho*V + 1
        memory.improvement_tick(child set a new GLOBAL best?)

    # end_iteration (every iteration, even a fully-failed one):
    if dynamic_islands and global_productivity < spawn_threshold and cooldown_ok:
        spawn_island(least_used_preset)               # seeded with global top-5 (NOT adapter-seeded -> stays -inf)
    current_island = UCB.argmax(R/V + C*sqrt(ln(N+1)/n_raw))        # or round-robin (ablation)
    if iteration % migration_interval == 0:
        ring_migrate(top migration_count -> (k+1) mod K)            # receivers get external G update
```

## Components

A Galapagos scaffold composes six components — Population, SelectionPolicy, PromptBuilder, Proposer, Evaluator, and Memory — behind a thin scaffold that wires the loop and adds the method's extra control flow (here: seed-state init, the iteration-start paradigm trigger, the in-iteration retry with per-attempt resampling, the one-tick-per-admitted-child improvement window, and end-of-iteration UCB/migration/spawn scheduling).

| Slot | Implementation | Role |
|---|---|---|
| Population | `AdaEvolveArchipelago` (per-island `_IslandArchive`) | `num_islands` quality-diversity archives with crowding admission, ring migration, dynamic spawning, and a genealogy map for sibling context. |
| SelectionPolicy | `AdaEvolvePolicy` (`AdaptiveState` + `MultiDimensionalAdapter`) | Level-1 G-intensity 3-mode parent sampling on the Level-2 UCB-chosen island; owns the adaptive state and end-of-iteration scheduling. |
| PromptBuilder | `AdaEvolvePromptBuilder` | The AdaEvolve diff template with the computed `{search_guidance}` slot (feedback, paradigm block, siblings, retry context) and mode-aware EXPLORE/EXPLOIT parent labels. |
| Proposer | `AdaEvolveProposer` (extends `DiffProposer`) | SEARCH/REPLACE diff with full-rewrite fallback and no-op detection; stamps the child with the current sampling island. |
| Evaluator | task-supplied | Scores each child; its source code is read into the paradigm prompt, and its hard-failure shapes drive the archive's eval-failure gate. |
| Memory | `AdaEvolveParadigmMemory` (`ParadigmTracker`) | Level-3 paradigm tracker: binary global-improvement window, stagnation detection, paradigm rotation with bounded uses, and outcome-annotated tried-idea history. |

## Configuration

Keys this scaffold actually reads (from `config.yaml` and `build_components`):

- `population.num_islands` (2) — initial island count (also read by the selection policy).
- `population.population_size` (20) — programs per island archive (archive capacity).
- `population.k_neighbors` (5) — k for k-NN code-distance novelty.
- `population.archive_elite_ratio` (0.2) — fraction of each archive protected from eviction.
- `population.fitness_weight` (1.0) / `population.novelty_weight` (0.0) / `population.pareto_weight` (0.4) — elite-score weights (Pareto folded into fitness in scalar mode).
- `population.migration_count` (5) — top programs each island sends on migration.
- `selection_policy.decay` (0.9) — EMA ρ for G, UCB rewards, and decayed visits.
- `selection_policy.intensity_min` (0.15) / `selection_policy.intensity_max` (0.5) — exploration-intensity floor and ceiling.
- `selection_policy.use_adaptive_search` (true) / `selection_policy.fixed_intensity` (0.4) — G-based intensity vs a fixed intensity (ablation).
- `selection_policy.use_ucb_selection` (true) — UCB island selection vs round-robin (ablation).
- `selection_policy.use_migration` (true) / `selection_policy.migration_interval` (15) — ring migration on/off and its period.
- `selection_policy.use_dynamic_islands` (true) / `selection_policy.max_islands` (5) — island spawning on/off and the cap.
- `selection_policy.spawn_productivity_threshold` (0.015) / `selection_policy.spawn_cooldown_iterations` (30) — spawn when global improvements/evaluations drops below the threshold, gated by the cooldown.
- `selection_policy.local_context_program_ratio` (0.6) / `selection_policy.num_inspirations` (4) — local share of, and total count of, prompt inspirations.
- `memory.use_paradigm_breakthrough` (true) — Level-3 paradigm trigger on/off.
- `memory.paradigm_window_size` (10) — binary global-improvement window (also the failed-generation cooldown).
- `memory.paradigm_improvement_threshold` (0.12) — generate a paradigm when the full-window rate is below this and no paradigm is active.
- `memory.paradigm_max_uses` (2) — uses per paradigm before rotation.
- `memory.paradigm_num_to_generate` (3) — ideas requested per guide-LLM call (installs are capped here).
- `memory.paradigm_max_tried` (10) — bounded history of tried paradigms.
- `general.max_iterations` (100) — run length.

## When to use

AdaEvolve is the heaviest, most self-tuning scaffold in the library: reach for it on harder, longer optimization runs where a single LLM evolutionary loop plateaus and you want the search to reallocate effort, diversify across islands, and break out of stagnation on its own rather than through hand-tuned schedules. The trade-off is machinery and a per-stagnation guide-LLM call — on short budgets or simple tasks the bandit and paradigm overhead rarely pay for themselves. If you want plain island-based QD evolution without the adaptive intensity and meta-guidance, a simpler evolutionary sibling (e.g. `openevolve`) is a better fit, and for cheap greedy refinement `topk` or `best_of_n` will spend the budget more directly.

## Source

AdaEvolve: Adaptive LLM-Driven Zeroth-Order Optimization (UC Berkeley); this is a faithful port of the SkyDiscover reference implementation, which is treated as ground truth wherever it diverges from the paper.
