default/beam_search

Maintain a fixed-width beam of promising programs; expand one per step, prune by fitness+diversity.

Test-time searchApache-2.0

# Beam Search (`beam_search`) > Maintain a fixed-width beam of promising programs; expand one per step, prune by fitness+diversity. ## Overview Beam Search keeps a fixed-width beam of the most promising programs. Every admitted program joins the beam, and whenever the beam grows past its width it is pruned back to `beam_width` by fitness — optionally diversity-aware, so the beam does not collapse onto a single lineage. Each step one parent is drawn from the beam by one of four strategies (best, stochastic, round_robin, or diversity_weighted), and the program is expanded into a single child; inspirations passed as context are the global top programs. This scaffold is a faithful port of SkyDiscover's `BeamSearchDatabase` (UC Berkeley Sky Computing Lab). In Galapagos its two beam-specific axes map cleanly onto two components: the beam structure, diversity-aware admission, and search-tree depth tracking live in a `BeamPopulation`, while the parent choice lives in a `BeamSelectionPolicy`. Both share one fitness notion — `combined_score` with an optional exponential depth penalty — and one cheap, embedding-free diversity metric (character n-gram Jaccard distance between programs), keeping pruning and selection mutually consistent. The result is a search that is greedier than a flat population sampler but cheaper and more parallel-friendly than a full evolutionary loop. A keep-all store underneath the beam means every program is retained, so inspirations can always be sourced from the global best even after a candidate has been pruned out of the active beam. ## Algorithm Each iteration follows the standard select -> prompt -> propose -> evaluate -> admit cycle, specialized to a beam: 1. **Select** — draw one parent from the current beam by `strategy`. `best` takes the highest-scoring member; `round_robin` cycles the beam in score order; `stochastic` samples a softmax over score at `temperature`; `diversity_weighted` samples a softmax over `(1-w)·score + w·avg_distance_to_recently_expanded`. The policy remembers recently expanded parents (last 50, with a 10-deep lookback for the diversity term) as its adaptive state. 2. **Prompt** — build the canonical multi-section message (task -> metrics -> feedback -> inspirations -> current program), where inspirations are the global top `num_inspirations` programs, excluding the chosen parent. 3. **Propose** — one LLM call yields a SEARCH/REPLACE diff (or full-rewrite fallback) applied to the parent, producing one child, with no-op detection. 4. **Evaluate** — the task-supplied evaluator scores the child. 5. **Admit** — the child is retained in the keep-all store and appended to the beam. If the beam now exceeds `beam_width`, prune: by pure fitness when `diversity_weight == 0`, or by a greedy fitness+diversity objective otherwise. ```text beam = [] # ids of the active beam (subset of a keep-all store) expanded = [] # recently expanded parents (selection state) for step in range(max_iterations): members = sorted(beam, by score) # score = combined_score * exp(-depth_penalty * depth) parent = pick(members, strategy) # best | round_robin | stochastic | diversity_weighted expanded.append(parent) inspirations = global_top(num_inspirations, exclude=parent) prompt = build(task, parent, inspirations) child = apply_diff(llm(prompt), parent) # one child per expanded parent score(child) store.add(child) # keep-all; child also enters the beam depth[child] = depth[parent] + 1 if len(beam) > beam_width: if diversity_weight > 0: beam = greedy_diverse(beam, beam_width) # (1-w)*fitness + w*min_dist_to_selected else: beam = top_by_fitness(beam, beam_width) ``` The diversity-aware prune (`greedy_diverse`) always keeps the single best member, then greedily adds whichever remaining candidate maximizes `(1-w)·fitness + w·min_distance_to_already_selected`, so the beam stays both strong and spread out across distinct lineages. ## Components A Galapagos scaffold composes six components; Beam Search wires them as follows: | Slot | Implementation | Role | |---|---|---| | Population | `BeamPopulation` (kind: beam) | Keep-all store plus a fixed-width beam; diversity-aware prune and search-tree depth tracking. | | SelectionPolicy | `BeamSelectionPolicy` (kind: beam) | Picks the parent from the beam by best / stochastic / round_robin / diversity_weighted. | | PromptBuilder | `BeamSearchPromptBuilder` (kind: default) | Canonical multi-section template; adds nothing beyond the default context. | | Proposer | `BeamSearchProposer` (kind: diff) | One LLM call -> SEARCH/REPLACE diff (full-rewrite fallback) -> one child per parent. | | Evaluator | task-supplied | Scores each child with the task's evaluation function. | | Memory | `NullMemory` (kind: none) | None; Beam Search keeps no cross-candidate free-form knowledge. | ## Configuration Keys this scaffold actually reads: - `population.beam_width` (5) — number of candidates kept in the active beam. - `population.beam_diversity_weight` (0.3) — weight `w` of the diversity term in pruning; 0 = pure fitness. Also read by the selection policy. - `population.beam_depth_penalty` (0.0) — exponential fitness penalty per search-tree depth; 0 = off. - `selection_policy.beam_selection_strategy` (`diversity_weighted`) — parent draw: `best | stochastic | round_robin | diversity_weighted`. - `selection_policy.beam_temperature` (1.0) — softmax temperature for `stochastic` / `diversity_weighted`. - `selection_policy.num_inspirations` (4) — number of global top programs supplied as context. - `general.max_iterations` (100) — number of search steps. - `seed` (0) — seeds the selection policy's RNG. ## When to use Reach for Beam Search when you want bounded, directed exploration: it commits compute to a small set of strong lineages (the beam) rather than the whole history, which keeps it cheaper and more focused than a full evolutionary population while staying greedier than flat best-of-n sampling. Turn up `beam_diversity_weight` (or use the `diversity_weighted` strategy) when a fitness-only beam keeps collapsing onto one near-duplicate lineage. If you instead want pure greedy hill-climbing on the single best program, `topk` (greedy top-1 plus context) is simpler; if you want independent samples with no beam structure at all, `best_of_n` is the lighter baseline. ## Source Ported from SkyDiscover's (UC Berkeley Sky Computing Lab) `BeamSearchDatabase` search strategy; this is a faithful Galapagos port, not an original method.