EvoX
Co-evolves the search strategy with the solutions: the parent/context selection policy is itself LLM-written code, scored by windowed improvement and hot-swapped on stagnation.
# EvoX (`evox`)
> Co-evolves the search strategy with the solutions: the parent/context selection policy is itself LLM-written code, scored by windowed improvement and hot-swapped on stagnation.
## Overview
EvoX (EvoX: Meta-Evolution for Automated Discovery, UC Berkeley) treats the search strategy of LLM-driven evolutionary search as an evolvable object rather than a fixed harness. The active strategy is an executable class whose `add()`/`sample()` methods decide which parent to mutate, which variation-operator label to attach (free-form exploration, structural divergence, or local refinement — problem-specific operator texts generated once per run by a guide LLM), and which inspiration set to show the proposer. Solutions and the strategy that breeds them evolve on two interleaved timescales: solutions every iteration, the strategy only when progress stalls.
Every deployed strategy is scored over the window it ran for. The score is a log-weighted, horizon-normalized improvement, `J = (s_end - s_start) * (1 + ln(1 + max(0, s_start))) / sqrt(W)`, and every finalized strategy is recorded in a strategy history H. Switching is demand-driven: when the best score stagnates (consecutive iterations with gain at or below an absolute threshold) for `W` iterations, a strong model rewrites the argmax-J strategy from H, conditioned on a population-state descriptor φ that captures the score distribution, top-k structure, recent execution trace, and parent/context reuse ratios.
A rewrite is never trusted blindly: the candidate is validated by a behavioral test-suite (`Valid(·)`), and on success the entire current population is migrated into the new strategy — never reset — with the previous strategy kept as a runtime fallback. If a deployed strategy throws at runtime, the population restores the fallback (or, in the worst case, the always-valid seed) and the failed evolution is counted but never scored. All failures leave the current strategy in place.
This scaffold is a faithful port of the SkyDiscover reference implementation (`search/evox/`), which is treated as ground truth wherever it diverges from the paper. Notably, J uses the code's `(1 + ln(...))` weight, stagnation is a per-iteration consecutive counter rather than fixed windows, the meta-parent is the deterministic argmax-J strategy, and the horizon normalizer is fixed at the switch interval even when a strategy outlives it (an intentional bonus for long-lived improving strategies).
## Algorithm
Each iteration runs the ordinary solution loop, but the select step delegates to the LLM-written strategy and a separate meta loop fires on stagnation:
1. **Setup.** Resolve the switch interval `W` (explicit config, else `max(1, 10% of max_iterations)`), snapshot the start φ, reset the scoring window, and run the one-time variation-operator generation (a single guide call producing the EXPLORATION/EXPLOITATION operator labels; any failure falls back to free-form-only).
2. **Select.** The active evolved strategy's `sample()` returns one parent and its inspiration set, each tagged with a variation-operator label. The policy validates the runtime shape and publishes `{label, parent_id, context_ids}`.
3. **Prompt + propose.** The PromptBuilder renders the operator-labeled template; the diff proposer (full-rewrite fallback) emits a child and stamps `parent_info`/`context_ids`/`iteration` that the strategy and the φ trace read back.
4. **Evaluate + admit.** The task evaluator scores the child. Eval-failed children are gated out (never reach the strategy, leave `state.best` untouched); the failure is stashed as feedback for the next solution prompt.
5. **Window tick.** Every iteration records the current best into the scoring window (one tick per iteration, NO_DIFF steps included).
6. **Meta loop (periodic).** Count any runtime fallbacks, then run the consecutive-stagnation counter. On trigger: finalize the pending strategy into H (scoring the seed first on the very first event), build the meta prompt from the argmax-J parent + up to two random inspirations + φ, and make one meta call per attempt (up to `meta_max_retries`, feeding rejections back). On a validated rewrite, hot-swap with full migration and open a fresh window; on total failure, keep the current strategy.
```text
setup: W = switch_interval or max(1, 0.10*T)
generate variation-operator labels (one guide call)
reset scoring window; snapshot start phi
for iter in 1..T:
parent, inspirations, label = active_strategy.sample() # LLM-written select
child = diff_propose(prompt(parent, inspirations, label)) # stamps lineage
result = evaluate(child)
if not eval_failed(result): active_strategy.add(child) # else gated, retried
scorer.record(best_combined_score) # window tick
# stagnation counter (absolute gain <= tau resets it)
if best gained > tau: stagnant = 0 else: stagnant += 1
if stagnant >= W and not should_stop():
stagnant = 0
if first event: score+insert seed S0 into H
else: finalize pending strategy -> J over its window -> H
reset window
parent_strategy = H.argmax(J) # greedy meta-parent
for attempt in 1..meta_max_retries:
code = meta_model.rewrite(parent_strategy, phi, inspirations, failures)
if Valid(code): swap_strategy(code); break # full migration + fallback
# all attempts fail -> keep current strategy
finalize: score any still-pending strategy into H
```
## Components
A Galapagos scaffold composes six components. EvoX's distinctive twist is that the SelectionPolicy is not fixed code — it is an LLM-written `EvolvedStrategy` hosted by the Population — and a seventh meta loop (the scaffold itself) co-evolves that strategy.
| Slot | Implementation | Role |
|---|---|---|
| Population | `EvoXPopulation` (`evolved_strategy_store`) | Hosts the active evolved strategy, gates eval-failures, builds the φ descriptor, and owns hot-swap migration + runtime fallback. |
| SelectionPolicy | `EvoXPolicy` (`evolved_strategy_sampler`) | Thin adapter over the active strategy's `sample()`; unpacks the parent/label/inspirations and retries on a broken strategy. |
| PromptBuilder | `EvoXPromptBuilder` (`operator_labeled_default`) | Renders the operator-labeled default solution template plus all meta/guide prompts. |
| Proposer | `EvoXProposer` (`diff`) | SEARCH/REPLACE diff with full-rewrite fallback; stamps `parent_info`/`context_ids`/`iteration` lineage. |
| Evaluator | task-supplied | Scores each child; its `combined_score`/`validity` drive admission and the window. |
| Memory | `EvoXStrategyMemory` (`strategy_history`) | The strategy history H; argmax-J meta-parent selection + uniform-random inspirations. |
## Configuration
Keys this scaffold actually reads:
- `population.improvement_threshold` (0.01) — absolute best-score gain τ below which a stagnation step is counted.
- `population.statistics_k` (20) — top-k scores included in the φ population-state descriptor.
- `selection_policy.num_context_programs` (4) — inspirations requested from the active strategy's `sample()` each iteration.
- `meta.switch_interval` (null → `max(1, 0.10 * max_iterations)`) — `W`: consecutive stagnant iterations before a strategy-evolution event, and the J horizon normalizer.
- `meta.meta_num_context_programs` (2) — strategy inspirations sampled from H per evolution event.
- `meta.meta_max_retries` (3) — meta generation attempts per event, each re-prompted with prior failures.
- `meta.auto_generate_variation_operators` (true) — generate operator labels via one guide call; `false` uses the static default templates (no LLM).
- `meta.use_llm_stats_insight` (true) — replace raw φ text with a guide-LLM insight in the meta prompt.
- `meta.use_problem_summary` (true) — guide-LLM problem-context summary, cached per problem.
- `meta.use_batch_summaries` (true) — batch `[PROGRAM N]` summaries of prior strategies in the meta prompt.
- `meta.max_strategy_chars` (60000) — generated-strategy length cap (rejected as a failed attempt if exceeded).
- `general.max_iterations` (100) — total budget `T`; also seeds the default `W`.
## When to use
Reach for EvoX when a single fixed search policy is leaving gains on the table and the budget is large enough to amortize occasional meta calls — it adapts how it explores (which parents, which inspirations, which operators) as the landscape shifts, rather than committing to one selection rule up front. Because each strategy switch costs a strong meta-model call plus several guide calls, it is heavier than scaffolds with a static policy; on short runs or simple tasks the stagnation trigger may rarely fire and a fixed-policy sibling like `openevolve` or `topk` will be cheaper and just as good. Choose EvoX when the right search strategy is itself unknown and worth discovering.
## Source
Port of *EvoX: Meta-Evolution for Automated Discovery* (UC Berkeley), following its SkyDiscover reference implementation (`search/evox/`), which is authoritative wherever paper and code diverge. Maintained under the SkyDiscover organization in the Galapagos library.