SkyDiscover/evox

EvoX

Co-evolves the search strategy with the solutions: the parent/context selection policy is itself LLM-written code, scored by windowed improvement and hot-swapped on stagnation.

Test-time searchApache-2.0

# EvoX (`evox`) > Co-evolves the search strategy with the solutions: the parent/context selection policy is itself LLM-written code, scored by windowed improvement and hot-swapped on stagnation. ## Overview EvoX (EvoX: Meta-Evolution for Automated Discovery, UC Berkeley) treats the search strategy of LLM-driven evolutionary search as an evolvable object rather than a fixed harness. The active strategy is an executable class whose `add()`/`sample()` methods decide which parent to mutate, which variation-operator label to attach (free-form exploration, structural divergence, or local refinement — problem-specific operator texts generated once per run by a guide LLM), and which inspiration set to show the proposer. Solutions and the strategy that breeds them evolve on two interleaved timescales: solutions every iteration, the strategy only when progress stalls. Every deployed strategy is scored over the window it ran for. The score is a log-weighted, horizon-normalized improvement, `J = (s_end - s_start) * (1 + ln(1 + max(0, s_start))) / sqrt(W)`, and every finalized strategy is recorded in a strategy history H. Switching is demand-driven: when the best score stagnates (consecutive iterations with gain at or below an absolute threshold) for `W` iterations, a strong model rewrites the argmax-J strategy from H, conditioned on a population-state descriptor φ that captures the score distribution, top-k structure, recent execution trace, and parent/context reuse ratios. A rewrite is never trusted blindly: the candidate is validated by a behavioral test-suite (`Valid(·)`), and on success the entire current population is migrated into the new strategy — never reset — with the previous strategy kept as a runtime fallback. If a deployed strategy throws at runtime, the population restores the fallback (or, in the worst case, the always-valid seed) and the failed evolution is counted but never scored. All failures leave the current strategy in place. This scaffold is a faithful port of the SkyDiscover reference implementation (`search/evox/`), which is treated as ground truth wherever it diverges from the paper. Notably, J uses the code's `(1 + ln(...))` weight, stagnation is a per-iteration consecutive counter rather than fixed windows, the meta-parent is the deterministic argmax-J strategy, and the horizon normalizer is fixed at the switch interval even when a strategy outlives it (an intentional bonus for long-lived improving strategies). ## Algorithm Each iteration runs the ordinary solution loop, but the select step delegates to the LLM-written strategy and a separate meta loop fires on stagnation: 1. **Setup.** Resolve the switch interval `W` (explicit config, else `max(1, 10% of max_iterations)`), snapshot the start φ, reset the scoring window, and run the one-time variation-operator generation (a single guide call producing the EXPLORATION/EXPLOITATION operator labels; any failure falls back to free-form-only). 2. **Select.** The active evolved strategy's `sample()` returns one parent and its inspiration set, each tagged with a variation-operator label. The policy validates the runtime shape and publishes `{label, parent_id, context_ids}`. 3. **Prompt + propose.** The PromptBuilder renders the operator-labeled template; the diff proposer (full-rewrite fallback) emits a child and stamps `parent_info`/`context_ids`/`iteration` that the strategy and the φ trace read back. 4. **Evaluate + admit.** The task evaluator scores the child. Eval-failed children are gated out (never reach the strategy, leave `state.best` untouched); the failure is stashed as feedback for the next solution prompt. 5. **Window tick.** Every iteration records the current best into the scoring window (one tick per iteration, NO_DIFF steps included). 6. **Meta loop (periodic).** Count any runtime fallbacks, then run the consecutive-stagnation counter. On trigger: finalize the pending strategy into H (scoring the seed first on the very first event), build the meta prompt from the argmax-J parent + up to two random inspirations + φ, and make one meta call per attempt (up to `meta_max_retries`, feeding rejections back). On a validated rewrite, hot-swap with full migration and open a fresh window; on total failure, keep the current strategy. ```text setup: W = switch_interval or max(1, 0.10*T) generate variation-operator labels (one guide call) reset scoring window; snapshot start phi for iter in 1..T: parent, inspirations, label = active_strategy.sample() # LLM-written select child = diff_propose(prompt(parent, inspirations, label)) # stamps lineage result = evaluate(child) if not eval_failed(result): active_strategy.add(child) # else gated, retried scorer.record(best_combined_score) # window tick # stagnation counter (absolute gain <= tau resets it) if best gained > tau: stagnant = 0 else: stagnant += 1 if stagnant >= W and not should_stop(): stagnant = 0 if first event: score+insert seed S0 into H else: finalize pending strategy -> J over its window -> H reset window parent_strategy = H.argmax(J) # greedy meta-parent for attempt in 1..meta_max_retries: code = meta_model.rewrite(parent_strategy, phi, inspirations, failures) if Valid(code): swap_strategy(code); break # full migration + fallback # all attempts fail -> keep current strategy finalize: score any still-pending strategy into H ``` ## Components A Galapagos scaffold composes six components. EvoX's distinctive twist is that the SelectionPolicy is not fixed code — it is an LLM-written `EvolvedStrategy` hosted by the Population — and a seventh meta loop (the scaffold itself) co-evolves that strategy. | Slot | Implementation | Role | |---|---|---| | Population | `EvoXPopulation` (`evolved_strategy_store`) | Hosts the active evolved strategy, gates eval-failures, builds the φ descriptor, and owns hot-swap migration + runtime fallback. | | SelectionPolicy | `EvoXPolicy` (`evolved_strategy_sampler`) | Thin adapter over the active strategy's `sample()`; unpacks the parent/label/inspirations and retries on a broken strategy. | | PromptBuilder | `EvoXPromptBuilder` (`operator_labeled_default`) | Renders the operator-labeled default solution template plus all meta/guide prompts. | | Proposer | `EvoXProposer` (`diff`) | SEARCH/REPLACE diff with full-rewrite fallback; stamps `parent_info`/`context_ids`/`iteration` lineage. | | Evaluator | task-supplied | Scores each child; its `combined_score`/`validity` drive admission and the window. | | Memory | `EvoXStrategyMemory` (`strategy_history`) | The strategy history H; argmax-J meta-parent selection + uniform-random inspirations. | ## Configuration Keys this scaffold actually reads: - `population.improvement_threshold` (0.01) — absolute best-score gain τ below which a stagnation step is counted. - `population.statistics_k` (20) — top-k scores included in the φ population-state descriptor. - `selection_policy.num_context_programs` (4) — inspirations requested from the active strategy's `sample()` each iteration. - `meta.switch_interval` (null → `max(1, 0.10 * max_iterations)`) — `W`: consecutive stagnant iterations before a strategy-evolution event, and the J horizon normalizer. - `meta.meta_num_context_programs` (2) — strategy inspirations sampled from H per evolution event. - `meta.meta_max_retries` (3) — meta generation attempts per event, each re-prompted with prior failures. - `meta.auto_generate_variation_operators` (true) — generate operator labels via one guide call; `false` uses the static default templates (no LLM). - `meta.use_llm_stats_insight` (true) — replace raw φ text with a guide-LLM insight in the meta prompt. - `meta.use_problem_summary` (true) — guide-LLM problem-context summary, cached per problem. - `meta.use_batch_summaries` (true) — batch `[PROGRAM N]` summaries of prior strategies in the meta prompt. - `meta.max_strategy_chars` (60000) — generated-strategy length cap (rejected as a failed attempt if exceeded). - `general.max_iterations` (100) — total budget `T`; also seeds the default `W`. ## When to use Reach for EvoX when a single fixed search policy is leaving gains on the table and the budget is large enough to amortize occasional meta calls — it adapts how it explores (which parents, which inspirations, which operators) as the landscape shifts, rather than committing to one selection rule up front. Because each strategy switch costs a strong meta-model call plus several guide calls, it is heavier than scaffolds with a static policy; on short runs or simple tasks the stagnation trigger may rarely fire and a fixed-policy sibling like `openevolve` or `topk` will be cheaper and just as good. Choose EvoX when the right search strategy is itself unknown and worth discovering. ## Source Port of *EvoX: Meta-Evolution for Automated Discovery* (UC Berkeley), following its SkyDiscover reference implementation (`search/evox/`), which is authoritative wherever paper and code diverge. Maintained under the SkyDiscover organization in the Galapagos library.