What is Galapagos?¶
Galapagos is a user-friendly, open-source platform for LLM-driven evolutionary search on scientific-discovery and optimization tasks. Load an evolutionary-search scaffold and an evaluation task in a few lines, point them at any LLM, and let the loop evolve solutions that maximize a metric — circle packings, function minimizers, GPU kernels, algorithms, prompts, and more.
Methods, tasks, and models are exchanged as cards — versioned YAML, modeled on the model and dataset cards of the Hugging Face Hub. The card is the communication protocol of Galapagos: the single source of truth for the local library and the live Hub.
import galapagos as gx
model = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")
config = gx.GalapagosConfig.from_config(scaffold_name="openevolve")
scaffold = gx.GalapagosScaffold.from_card(name="openevolve", config=config, model=model)
task = gx.GalapagosTask.from_card(name="circle_packing")
result = scaffold.run(task=task)
print(result.best_score)
-
Run your first evolutionary search in five lines.
-
The motivation: an explosion of methods, inconsistent evaluation, and the case for a unified leaderboard.
-
The loop, the six components, the Genome, scaffolds, tasks, models, and cards.
-
Browse and publish scaffolds, tasks, and verified discoveries as cards.
What you can do with Galapagos¶
Galapagos is built around six user capabilities:
- Evaluate registered scaffolds on tasks. Run any runnable evolutionary-search scaffold against a registered task and get a comparable score. See the Quickstart.
- Submit your own scaffold or task via a card — a small YAML file that declares which class and which six components implement a method, or which seed and evaluator define a task.
- Submit a discovery — a trajectory plus the best solution — via a verification card for domain-expert review.
- Climb a unified, consistent, live leaderboard of verified discoveries, scored by the task's own evaluator so entries are directly comparable.
- Use the
galapagoslibrary locally —import galapagos as gx, then load a scaffold, model, and task with.from_card(...), or drive it from the CLI. - Load Hub scaffolds and tasks via cards, the way you pull models and datasets from the Hugging Face Hub.
The loop in one sentence
Each iteration: select parents from the Population → build a prompt from them and Memory → propose a new candidate → evaluate it with a deterministic, verifiable function → add the scored Genome back to the Population (and optionally write what was learned to Memory). Repeat until the budget is spent.
What Galapagos provides¶
- A simple, modular evolutionary loop. Every method is the same six-component loop over one unit of evolution, the Genome — only the implementation filling each slot changes: Population, SelectionPolicy, PromptBuilder, Proposer, Evaluator, Memory.
- CLI-agent integration. Autonomous CLI agents plug in as a Proposer — the variation operator — with no special "agent mode." This is a designed integration point on the platform roadmap.
- A registry of tasks and scaffolds. A single catalog, browsable from the package or the Hub, with scaffold cards and task cards as the unit of exchange.
- A Docker sandbox + Harbor-style task schema. Tasks declare their seed, evaluator, requirements, and evaluation mode (local by default, or container) so a deterministic scorer runs the same way everywhere.
- Domain-expert verification. A submitted discovery is reviewed and re-scored before it counts, so leaderboard numbers are checked, not self-reported.
- A live, unified leaderboard. Per task, ranking verified discoveries by the task's metric.
- Reusable skills. A SkillHub of portable, reusable skills (e.g. a Google Science Skill) that scaffolds and agents can pull in.
- A live Hub. Upload and download scaffolds and tasks as cards, like the Hugging Face Hub.
What ships in v0.3, and what is roadmap
The architecture and vision above describe the full platform. To be precise about what runs
today: the catalog ships eight scaffolds, all runnable — adaevolve, beam_search,
best_of_n, best_of_n_attempts, evox, meta_harness, openevolve, and topk — and 64 runnable tasks (circle_packing,
function_minimization, and playground_sphere are the canonical quickstart examples). Large
scope numbers — 300+ tasks, CLI-agent integrations — describe the
platform's roadmap, not the shipped wheel. Every code sample on this site runs as written.
How methods work¶
| Tier | What it does | Examples |
|---|---|---|
| search | A frozen LLM acts as the variation operator; the scaffold provides selection + prompting. | openevolve, adaevolve, evox, best_of_n, best_of_n_attempts, topk, beam_search |
Ready? Head to Installation and the Quickstart — or read Why Galapagos first.