default/best_of_n_attempts

Best-of-N (attempt-counted)

Best-of-N that rotates the parent every N attempts — failed/invalid tries spend the budget too.

Test-time searchApache-2.0

# Best-of-N (attempt-counted) (`best_of_n_attempts`) > Best-of-N that rotates the parent every N attempts — failed/invalid tries spend the budget too. ## Overview Best-of-N (attempt-counted) is the strictly-budgeted variant of Best-of-N. Like its sibling, it picks a parent program, reuses that same parent for several consecutive iterations, and then switches to the current global best before repeating the cycle. The only behavioral difference is how the per-parent budget is counted: this variant spends one unit of the N budget on every attempt, whether or not the attempt produces a valid, scorable child. In the faithful `best_of_n` scaffold (a port of SkyDiscover's BestOfNDatabase), the parent-reuse counter advances only when a valid child is added, so a parse or evaluation failure is a free retry on the same parent. Here, the counter advances at selection time instead, which means a failure still consumes one of the parent's N attempts. The practical effect is that the parent rotates on a fixed cadence of exactly N iterations, regardless of how many of those iterations actually yielded usable children. Everything else is shared with `best_of_n`: a flat keep-all in-memory population sorted by fitness, the standard multi-section default prompt, a SEARCH/REPLACE diff proposer with full-rewrite fallback, and no memory. Choose `best_of_n` when you want to match SkyDiscover's database semantics where failures are not charged; choose this variant when you want a strictly fixed per-parent compute budget and predictable parent turnover. ## Algorithm The loop maintains a single piece of adaptive state — the current parent id and a `uses` counter — inside the SelectionPolicy. On each iteration: 1. **Select** — If no parent is committed yet, or the previous parent is gone, or its budget is spent (`uses >= n`), commit to the current global best in the population and reset `uses` to 0; otherwise reuse the same parent. Sample up to `num_inspirations` context programs from the top pool, excluding the parent. Then increment `uses` immediately — this is the attempt-counted rule: the budget is spent at selection, before the outcome is known. 2. **Prompt** — Render the default multi-section message (task, metrics, feedback, inspirations, current program). 3. **Propose** — One LLM call produces a diff (or full rewrite) applied to the parent. 4. **Evaluate** — The task-supplied evaluator scores the child. 5. **Admit** — The child is added to the keep-all population. `observe` is a no-op here, because the budget was already charged in `select`. ```text state: current_parent_id = None, uses = 0 for each iteration: members = population.all() # _choose_parent current = member with id == current_parent_id (or None) if current is None or uses >= n: parent = argmax fitness over members # commit to global best current_parent_id = parent.id; uses = 0 else: parent = current # reuse same parent inspirations = random sample of <=num_inspirations from top pool, excluding parent uses += 1 # ATTEMPT-COUNTED: charge now, valid or not prompt = default_builder(parent, inspirations) child = proposer(prompt) # diff / full rewrite; may fail to parse score = evaluate(child) # may fail population.add(child) # failures still cost one of the N attempts # observe(child) does nothing — budget already spent ``` Contrast with `best_of_n`, where `uses += 1` lives in `observe` and runs only when the child is valid, so failures do not advance the counter. ## Components A Galapagos scaffold composes six components. This scaffold reuses five of them verbatim from `best_of_n` and swaps in only the attempt-counted SelectionPolicy. | Slot | Implementation | Role | |---|---|---| | Population | `BestOfNPopulation` (`InMemoryPopulation`, keep-all) | Fitness-sorted archive; finds the global best and supplies the context sample | | SelectionPolicy | `BestOfNAttemptsPolicy` | Reuse one parent for N consecutive attempts, charging the budget per `select` (valid or not) | | PromptBuilder | `BestOfNPromptBuilder` (`DefaultPromptBuilder`) | Standard multi-section template: task, metrics, feedback, inspirations, current program | | Proposer | `BestOfNProposer` (`DiffProposer`) | One LLM call producing a SEARCH/REPLACE diff or full rewrite, with no-op detection | | Evaluator | task-supplied | Scores each proposed child against the task | | Memory | `BestOfNMemory` (`NullMemory`) | None — no cross-candidate knowledge is kept | ## Configuration - `selection_policy.best_of_n` (5) — number of consecutive attempts to reuse one parent before switching to the current global best; here every attempt counts toward this budget. - `selection_policy.num_inspirations` (4) — number of context programs sampled from the top pool, excluding the parent. - `population.capacity` (null) — uncapped keep-all by default (matches SkyDiscover, no cap); set an int to bound the archive. - `general.max_iterations` (100) — total search iterations. - `seed` (0) — seeds the policy RNG for parent/inspiration sampling. ## When to use Use this scaffold when you want a strictly fixed compute budget per parent and a predictable parent-rotation cadence of exactly N iterations — useful when failures are expensive or you want to bound time spent on any single starting point. Prefer the faithful `best_of_n` when you want SkyDiscover-equivalent semantics, where parse/eval failures are free retries and a parent is held until N *valid* children appear. If you instead want to always exploit the single best program with fresh context rather than reusing a fixed parent for a window, `topk` (greedy top-1) is the better neighbor. ## Source Galapagos variant of SkyDiscover's `best_of_n` search strategy (the attempt-counted counterpart of the faithful `best_of_n` port of SkyDiscover's BestOfNDatabase).