galapagos
DocsHubLeaderboardPlaygroundNews
galapagos

six blocks · any task ·
better solutions emerge.

Platform

  • Hub
  • Leaderboard
  • Playground

Resources

  • Docs
  • API reference
  • Card spec

Community

  • GitHub
  • Contribute

Updates

  • News
  • Releases

© 2026 Galapagos. Licensed under Apache-2.0.

Build your own scaffold.

Hub/Scaffolds/default/Best-of-N (attempt-counted)

default/best_of_n_attempts

Best-of-N (attempt-counted)

Best-of-N that rotates the parent every N attempts — failed/invalid tries spend the budget too.

Test-time searchApache-2.0
Scaffold cardFiles and versions
best_of_n_attempts/README.md
72 lines · 5.9 KBmarkdownDownload
# Best-of-N (attempt-counted) (`best_of_n_attempts`)

> Best-of-N that rotates the parent every N attempts — failed/invalid tries spend the budget too.

## Overview

Best-of-N (attempt-counted) is the strictly-budgeted variant of Best-of-N. Like its sibling, it picks a parent program, reuses that same parent for several consecutive iterations, and then switches to the current global best before repeating the cycle. The only behavioral difference is how the per-parent budget is counted: this variant spends one unit of the N budget on every attempt, whether or not the attempt produces a valid, scorable child.

In the faithful `best_of_n` scaffold (a port of SkyDiscover's BestOfNDatabase), the parent-reuse counter advances only when a valid child is added, so a parse or evaluation failure is a free retry on the same parent. Here, the counter advances at selection time instead, which means a failure still consumes one of the parent's N attempts. The practical effect is that the parent rotates on a fixed cadence of exactly N iterations, regardless of how many of those iterations actually yielded usable children.

Everything else is shared with `best_of_n`: a flat keep-all in-memory population sorted by fitness, the standard multi-section default prompt, a SEARCH/REPLACE diff proposer with full-rewrite fallback, and no memory. Choose `best_of_n` when you want to match SkyDiscover's database semantics where failures are not charged; choose this variant when you want a strictly fixed per-parent compute budget and predictable parent turnover.

## Algorithm

The loop maintains a single piece of adaptive state — the current parent id and a `uses` counter — inside the SelectionPolicy. On each iteration:

1. **Select** — If no parent is committed yet, or the previous parent is gone, or its budget is spent (`uses >= n`), commit to the current global best in the population and reset `uses` to 0; otherwise reuse the same parent. Sample up to `num_inspirations` context programs from the top pool, excluding the parent. Then increment `uses` immediately — this is the attempt-counted rule: the budget is spent at selection, before the outcome is known.
2. **Prompt** — Render the default multi-section message (task, metrics, feedback, inspirations, current program).
3. **Propose** — One LLM call produces a diff (or full rewrite) applied to the parent.
4. **Evaluate** — The task-supplied evaluator scores the child.
5. **Admit** — The child is added to the keep-all population. `observe` is a no-op here, because the budget was already charged in `select`.

```text
state: current_parent_id = None, uses = 0
for each iteration:
    members = population.all()
    # _choose_parent
    current = member with id == current_parent_id (or None)
    if current is None or uses >= n:
        parent = argmax fitness over members      # commit to global best
        current_parent_id = parent.id; uses = 0
    else:
        parent = current                          # reuse same parent
    inspirations = random sample of <=num_inspirations from top pool, excluding parent
    uses += 1                                     # ATTEMPT-COUNTED: charge now, valid or not
    prompt = default_builder(parent, inspirations)
    child = proposer(prompt)                      # diff / full rewrite; may fail to parse
    score = evaluate(child)                       # may fail
    population.add(child)                          # failures still cost one of the N attempts
    # observe(child) does nothing — budget already spent
```

Contrast with `best_of_n`, where `uses += 1` lives in `observe` and runs only when the child is valid, so failures do not advance the counter.

## Components

A Galapagos scaffold composes six components. This scaffold reuses five of them verbatim from `best_of_n` and swaps in only the attempt-counted SelectionPolicy.

| Slot | Implementation | Role |
|---|---|---|
| Population | `BestOfNPopulation` (`InMemoryPopulation`, keep-all) | Fitness-sorted archive; finds the global best and supplies the context sample |
| SelectionPolicy | `BestOfNAttemptsPolicy` | Reuse one parent for N consecutive attempts, charging the budget per `select` (valid or not) |
| PromptBuilder | `BestOfNPromptBuilder` (`DefaultPromptBuilder`) | Standard multi-section template: task, metrics, feedback, inspirations, current program |
| Proposer | `BestOfNProposer` (`DiffProposer`) | One LLM call producing a SEARCH/REPLACE diff or full rewrite, with no-op detection |
| Evaluator | task-supplied | Scores each proposed child against the task |
| Memory | `BestOfNMemory` (`NullMemory`) | None — no cross-candidate knowledge is kept |

## Configuration

- `selection_policy.best_of_n` (5) — number of consecutive attempts to reuse one parent before switching to the current global best; here every attempt counts toward this budget.
- `selection_policy.num_inspirations` (4) — number of context programs sampled from the top pool, excluding the parent.
- `population.capacity` (null) — uncapped keep-all by default (matches SkyDiscover, no cap); set an int to bound the archive.
- `general.max_iterations` (100) — total search iterations.
- `seed` (0) — seeds the policy RNG for parent/inspiration sampling.

## When to use

Use this scaffold when you want a strictly fixed compute budget per parent and a predictable parent-rotation cadence of exactly N iterations — useful when failures are expensive or you want to bound time spent on any single starting point. Prefer the faithful `best_of_n` when you want SkyDiscover-equivalent semantics, where parse/eval failures are free retries and a parent is held until N *valid* children appear. If you instead want to always exploit the single best program with fresh context rather than reusing a fixed parent for a window, `topk` (greedy top-1) is the better neighbor.

## Source

Galapagos variant of SkyDiscover's `best_of_n` search strategy (the attempt-counted counterpart of the faithful `best_of_n` port of SkyDiscover's BestOfNDatabase).