galapagos
DocsHubLeaderboardPlaygroundNews
galapagos

six blocks · any task ·
better solutions emerge.

Platform

  • Hub
  • Leaderboard
  • Playground

Resources

  • Docs
  • API reference
  • Card spec

Community

  • GitHub
  • Contribute

Updates

  • News
  • Releases

© 2026 Galapagos. Licensed under Apache-2.0.

Build your own scaffold.

Hub/Scaffolds/default/Best-of-N

default/best_of_n

Best-of-N

Give the LLM N valid attempts at the same parent before committing to the global best, then repeat.

Test-time searchApache-2.0
Scaffold cardFiles and versions

About

Best-of-N is a test-time search baseline that deliberately exploits one program state at a time. It picks a parent and reuses it until N valid children have been produced from it — N independent variations from a single starting point — and only then commits to the current global best and repeats the cycle. Inspirations (context programs shown alongside the parent) are re-sampled fresh from the top pool at every step, regardless of where the reuse cycle stands.

This scaffold is a faithful port of SkyDiscover's `BestOfNDatabase` from the UC Berkeley Sky Computing Lab. In Galapagos that single class is split along the standard component seam: a flat keep-all `InMemoryPopulation` stores every scored program and re-derives the global best on demand, while a stateful `BestOfNPolicy` owns the parent-reuse counter. Faithful to the original, the counter is advanced only by a validly-scored child — SkyDiscover increments it inside `add()`, which never runs for an error result — so a parse or evaluation failure is a free retry that does not spend the per-parent budget.

The single tuning knob is N. Larger N deepens exploitation of one program state, spending more of the budget refining variations before moving on; N=1 advances to a new best after every valid child and so approaches the behavior of Top-K. If you instead want a strictly fixed per-parent budget where every attempt counts whether or not it scored, the `best_of_n_attempts` sibling spends one budget unit per selection rather than per valid child.

Composition

5/6 blocks

The six components this scaffold snaps together. Each block names its concrete implementation.

Population
keep_all
Selection
best_of_n_reuse
Prompt
default
Proposer
diff
Evaluator
task
Memory
none
  • Populationkeep_all

    The set of candidate solutions in play — the gene pool the search evolves over.

  • Selectionbest_of_n_reuse

    Decides which genomes survive and reproduce — tournament, elitism, novelty, or your own policy.

  • Promptdefault

    Assembles the context handed to the model — parents, feedback, instructions, examples.

  • Proposerdiff

    The LLM-driven variation operator — proposes new candidates by mutation and crossover.

  • Evaluatortask

    Scores each candidate against the task — the fitness signal that drives selection.

Tags

baselinesamplingbest-of-nexploitationskydiscover

Source

SkyDiscover (UC Berkeley Sky Computing Lab) — best_of_n search strategy

Quick facts

Downloads0
LicenseApache-2.0
Default model—
ControllerBestOfNScaffold

Use this scaffold

example.py
from galapagos import GalapagosScaffold

scaffold = GalapagosScaffold.from_card(name="best_of_n")
result = scaffold.run(task="<task_name>")