Cards — the communication protocol¶
The Card is the central concept of Galapagos. A card is a small YAML document that describes one shareable artifact — a task, a scaffold, a model, or a discovery — and it is the only thing you ever exchange with the platform. Loading a task, submitting a scaffold, pinning a model, or publishing a result are all the same gesture: read a card, or write a card.
If you have used the Hugging Face Hub, the analogy is exact: there, model cards and dataset cards are
the unit of sharing; in Galapagos the card is the unit of sharing for evolutionary-search
scaffolds, discovery tasks, models, and reviewed discoveries. The same YAML loads a
card locally (gx.*.from_card) and uploads it to the Hub (galapagos submit) — one
schema, one protocol, both directions.
There are four card types:
| Card | Describes | Loaded with | Lives at |
|---|---|---|---|
| Task Card | an evaluation task (problem + metrics + evaluator) | gx.GalapagosTask.from_card |
tasks/<name>/card.yaml |
| Scaffold Card | a discovery method (controller + six components) | gx.GalapagosScaffold.from_card |
scaffolds/<name>/card.yaml |
| Model Card | a model (path + host) | gx.GalapagosModel.from_card |
user-supplied |
| Verification Card | a submitted discovery for expert review | Hub POST /api/verifications |
Hub submission |
All four are defined in galapagos.cards.schema with pydantic v2 and extra="allow": every card has
a small required core of fields, and you may attach any number of method- or task-specific extras
without a core change. The tables below mark the core fields; everything else is optional protocol
metadata that the platform reads when present.
One protocol, two directions
Load a card to use an artifact: gx.GalapagosTask.from_card(name="circle_packing").
Submit a card to share one: galapagos submit --card my_task/card.yaml --kind task. The
card you load and the card you submit are the same file, validated by the same schema — so
anything that loads in the galapagos library is publishable to the Hub, and anything on the
Hub loads in the library.
Task Card¶
A task is a discovery problem. The Task Card states what to optimize (the problem and its
metrics), how to score it (the deterministically-verifiable evaluator), and under what
constraints (software/hardware, evaluation mode). It is the single source of truth for a task; the
Evaluator it points at is supplied to every scaffold that runs the
task, so any scaffold can be pointed at any task.
| Field | Type | Meaning |
|---|---|---|
name |
str | (core) unique task id. |
display_name |
str | human label. |
domain |
str | top-level domain (math, gpu_kernel, systems, ml, nlp, bio, …). |
macro / family |
str | the sub-domain / sub-grouping (e.g. packing). |
description |
str | the full problem statement, injected into prompts as task.context. |
summary |
str | a one-line description. |
metrics |
list[dict] | the evaluation metrics — a list, since a task may score on several (see below). |
metric |
dict | the legacy single-metric form — {key, direction, type} (kept for back-compat). |
components |
obj | the task components: {initial_program, evaluator, config, requirement}. |
constraint |
obj | the task constraint — software/hardware spec the run must satisfy. |
evaluation |
obj | {format, mode} — mode ∈ local (default) | docker. |
language |
str | the seed/solution language (python, cuda, …). |
modality |
str | the I/O modality (text, image, …). |
library |
str/list | libraries the task depends on. |
references |
obj | best-known score + source. |
metadata |
obj | free-form. |
The evaluation metrics is a list of dicts¶
A task is rarely single-objective: a kernel must be both correct and fast; a packing must be both
valid and dense. So metrics is a list, one dict per objective (the legacy metric field
holds a single {key, direction, type} dict):
| Sub-field | Meaning |
|---|---|
metric_name |
the metric's key in the Evaluator's output dict (e.g. combined_score, latency_ms). |
metric_direction |
maximize | minimize. |
metric_description |
what the metric means, in prose. |
metric_computation |
how it is computed — the deterministic rule the evaluator implements. |
The first metric is, by convention, the headline combined_score the search drives.
Task components¶
components bundles the four files that make a task runnable:
| Component | What it is |
|---|---|
initial_program |
the seed — a path (initial_program.py) or inline code — the search starts from, with # EVOLVE-BLOCK markers around the region the search may rewrite. |
evaluator |
the deterministically-verifiable scorer — evaluate(program_path) -> dict returning the metrics. This is the Evaluator component. |
config |
an optional config.yaml of task-specific knobs. |
requirement |
a requirements.txt path, or an inline list of pip requirements. |
A card that omits initial_program/evaluator is a metadata-only entry: it loads for the
catalog and the Hub, but cannot run.
name: circle_packing
display_name: Circle Packing (n=26)
domain: math
family: packing # macro / sub-domain
summary: "Pack 26 circles in the unit square; maximize the sum of radii."
description: |
Find centers and radii for 26 non-overlapping circles inside the unit square [0,1]^2 that
maximize the sum of radii. Only the code inside the EVOLVE-BLOCK is modified by the search.
Validity and the score are recomputed independently from the returned geometry (anti reward-hacking).
metrics: # a list — a task may report several metrics
- metric_name: combined_score
metric_direction: maximize
metric_description: "Fraction of the AlphaEvolve best (sum_radii / 2.635)."
metric_computation: "Re-validate the geometry, then sum_radii / 2.635; 0.0 if invalid."
- metric_name: sum_radii
metric_direction: maximize
metric_description: "Total radius of the 26 packed circles."
metric_computation: "Sum of the radii recomputed from the returned centers/radii."
components:
initial_program: initial_program.py # path (or inline code)
evaluator: evaluator.py # the deterministic verifiable scorer
config: config.yaml
requirement: [numpy] # inline list, or a requirements.txt path
constraint: {gpu: none, docker: optional, est_runtime_s: 3}
evaluation: {format: python, mode: local} # mode: local | docker
language: python
modality: text
library: numpy
references: {best_known: 2.635, source: AlphaEvolve}
import galapagos as gx
task = gx.GalapagosTask.from_card(name="circle_packing") # registered task → loads from the catalog
task.context # the problem statement (from description)
seed = task.initial_genome() # the seed Genome (generation 0)
task.evaluator # the deterministic verifiable scorer
Tasks must be registered to load by name
from_card(name=...) resolves against the registered task catalog. To run a task that is not
in the catalog, either submit its card to the Hub, or point at a local task directory:
Deterministic, verifiable scoring (anti reward-hacking)
The evaluator must recompute the objective from the candidate's raw output — never trust a
self-reported score. circle_packing's evaluator discards the program's own sum_radii and
re-validates every constraint (count, bounds, non-overlap), returning combined_score = 0.0 on
any violation. The catalog ships 64 runnable bundled tasks; circle_packing,
function_minimization, and playground_sphere are the canonical quickstart examples, and
the platform's roadmap scopes 300+ tasks.
Scaffold Card¶
A scaffold is an evolutionary-search method. In Galapagos a method is not a monolithic loop — it is a composition of the six components driven by a controller. The Scaffold Card declares that composition: the controller class and which implementation fills each of the six slots. Two methods differ only in their slot fillings, never in the architecture.
| Field | Type | Meaning |
|---|---|---|
name |
str | (core) unique scaffold id (the slug). |
display_name |
str | human label. |
organization |
str | HF-Hub-style group; repo_id = <organization>/<display_name>. |
type |
str | test_time_search. |
description |
str | full prose. |
summary |
str | one-liner. |
source |
str | the paper or repo the method comes from. |
tags |
list | free-form tags. |
license |
str | SPDX id or label. |
controller |
str | dotted path to the GalapagosScaffold subclass that orchestrates the loop; omit for a card-only spec method. |
components |
obj | the six slots — population, selection_policy, prompt_builder, proposer, evaluator, memory. Each is inline code, a module.Class path, or a .py path; an omitted component is not used. |
model |
obj | {default, host, roles} — the default model + the roles it plays. |
requirements |
obj | {gpu, docker, python}. |
The bundled adaevolve card (src/galapagos/scaffolds/adaevolve/card.yaml):
name: adaevolve
display_name: AdaEvolve
organization: "SkyDiscover"
type: test_time_search
summary: "Hierarchical adaptive search: G-signal exploration intensity, UCB island allocation, and LLM meta-guidance on stagnation."
description: |
AdaEvolve reframes LLM-driven program evolution as hierarchical adaptive optimization driven by
one signal — the accumulated fitness-improvement signal G (an Adam-style second moment of
normalized improvements). Level 1 maps each island's G to an exploration intensity that splits
parent sampling into explore/exploit/balanced modes over per-island quality-diversity archives.
Level 2 allocates iterations across islands with a decayed-reward UCB bandit (globally
normalized rewards fix poor-island bias), ring migration, and dynamic island spawning from
heterogeneous presets when global productivity collapses. Level 3 detects stagnation via a
windowed improvement rate and asks a guide LLM for breakthrough "paradigm" ideas that are
injected into prompts and applied to the global best until exhausted.
source: "AdaEvolve: Adaptive LLM-Driven Zeroth-Order Optimization (UC Berkeley); reference implementation in SkyDiscover"
tags: [adaptive, ucb, islands, quality-diversity, meta-guidance, diff-evolution]
license: Apache-2.0
controller: galapagos.scaffolds.adaevolve.scaffold.AdaEvolveScaffold
components:
population: {kind: qd_island_archipelago} # each slot: a kind, module.Class, or .py path
selection_policy: {kind: adaptive_intensity_ucb}
prompt_builder: {kind: adaevolve_template}
proposer: {kind: diff}
evaluator: {kind: task} # supplied by the task
memory: {kind: paradigm_tactics} # omit this line ⇒ Memory unused
model:
default: "openai/gpt-5.5"
host: openrouter
import galapagos as gx
config = gx.GalapagosConfig.from_config(scaffold_name="adaevolve")
model = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")
scaffold = gx.GalapagosScaffold.from_card(name="adaevolve", config=config, model=model)
The components mapping is present even for a card-only spec method — it documents which
implementation fills each slot before any code exists.
Build your own — no card file needed
The same six slots can be passed directly to from_card, each as a component instance, a
"module.Class" path, or a .py file:
scaffold = gx.GalapagosScaffold.from_card(
population="galapagos.components.IslandPopulation",
selection_policy="galapagos.components.UCBBanditPolicy",
prompt_builder="galapagos.components.DefaultPromptBuilder",
proposer="./my_proposer.py", # a .py file with one Proposer subclass
memory="galapagos.components.ScratchpadMemory",
model=model,
)
memory= ⇒ a Memory-free loop). The
catalog ships eight bundled scaffolds — adaevolve, beam_search, best_of_n,
best_of_n_attempts, evox, meta_harness, openevolve, and topk — all runnable.
Model Card¶
A Model Card pins a model so a run is reproducible from disk: a display name, the real model path, and the host that serves it.
| Field | Type | Meaning |
|---|---|---|
name |
str | (core) the model's display name / id. |
model_path |
str | the real provider model name (e.g. openai/gpt-5.5). |
host |
str | where it is served — see the host list below. Default openrouter. |
temperature |
float | sampling temperature. |
max_tokens |
int | generation cap. |
The host selects an OpenAI-compatible endpoint. The protocol's host vocabulary is:
huggingface · openrouter · vllm · togetherai (Together AI) · litellm · openai ·
anthropic · azure · bedrock · google.
model = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")
model = gx.GalapagosModel.from_card(path="my_model_card.yaml") # or pin it from a card file
See Models for the host-to-base_url resolution table, the three mandated load
forms (HF / hosting platform / local vLLM), and which hosts are wired into the shipped loader.
Verification Card¶
A discovery is more than a number — it is a claim with provenance. When a search finds a strong solution, you submit it as a Verification Card: the task, the scaffold (or agent) that produced it, the best solution, and the full discovery trajectory, for review by a domain expert. This is what turns a result into a portable, reviewable artifact and feeds the live leaderboard.
| Field | Type | Meaning |
|---|---|---|
task |
str | (core) the task the discovery was made on. |
scaffold |
str | the scaffold that produced it (or…). |
agent |
str | …the agent that produced it. |
submitter |
str | who is submitting. |
claimed_score |
float | the claimed headline score. |
best_solution |
str | the discovered solution — inline, or a path. |
trajectory |
str | path / URI to the full discovery trajectory. |
status |
str | unverified | under_review | verified | rejected. |
notes |
str | reviewer / submitter notes. |
task: circle_packing
scaffold: adaevolve
submitter: passing2961
claimed_score: 0.9997
best_solution: solutions/circle_packing_2.6342.py
trajectory: runs/2026-06-08_adaevolve_circle_packing/
status: unverified
notes: "26 circles, sum_radii = 2.6342 (99.97% of the AlphaEvolve best)."
A verification card is not submitted through the CLI — it is POSTed to a Hub instance as a JSON
object at POST /api/verifications (see the full flow in
Submit to the Hub):
curl -X POST https://open-galapagos.com/api/verifications \
-H "authorization: Bearer $TOKEN" -H "content-type: application/json" \
-d "$(python -c 'import json,yaml; print(json.dumps(yaml.safe_load(open("circle_packing_discovery.yaml"))))')"
The claimed score is re-verified, not trusted
On submission, the task's deterministic Evaluator is re-run against the submitted
best_solution (the same anti-reward-hacking recompute the task uses during search), so a
submission cannot inflate its own number. A domain expert then reviews the trajectory before the
discovery is marked verified.
Loading and submitting¶
Every card type follows the same two-verb protocol:
import galapagos as gx
task = gx.GalapagosTask.from_card(name="circle_packing")
scaffold = gx.GalapagosScaffold.from_card(name="openevolve")
model = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")
gx.load_task, gx.load_scaffold, gx.load_model, gx.load_config. Pass
path=.../card.yaml instead of name=... to load a local card file.
galapagos submit --card my_task/card.yaml --kind task # validate a task card
galapagos submit --card my_scaffold/card.yaml # kind auto-detected (scaffold)
galapagos.cards.schema that validates loaded cards, then
published to the Hub via POST /api/scaffolds / POST /api/tasks — so the library
and the Hub never disagree about a card's shape. A discovery (verification card) is POSTed to
POST /api/verifications; see Submit to the Hub.
The library ⊆ Hub invariant
The card is the same artifact locally and on the Hub. The cards bundled in the galapagos wheel
are a subset of the Hub catalog — never a fork. One schema validates both directions, so a card
that loads in the library publishes to the Hub, and vice versa.
See also¶
- Core components — the six slots a Scaffold Card declares.
- Genome — the unit a Task Card's evaluator scores.
- Models — the hosts a Model Card's
hostselects. - The Hub — where cards are published and the leaderboard lives.