Cards — the communication protocol¶

The Card is the central concept of Galapagos. A card is a small YAML document that describes one shareable artifact — a task, a scaffold, a model, or a discovery — and it is the only thing you ever exchange with the platform. Loading a task, submitting a scaffold, pinning a model, or publishing a result are all the same gesture: read a card, or write a card.

If you have used the Hugging Face Hub, the analogy is exact: there, model cards and dataset cards are the unit of sharing; in Galapagos the card is the unit of sharing for evolutionary-search scaffolds, discovery tasks, models, and reviewed discoveries. The same YAML loads a card locally (gx.*.from_card) and uploads it to the Hub (galapagos submit) — one schema, one protocol, both directions.

There are four card types:

Card	Describes	Loaded with	Lives at
Task Card	an evaluation task (problem + metrics + evaluator)	`gx.GalapagosTask.from_card`	`tasks/<name>/card.yaml`
Scaffold Card	a discovery method (controller + six components)	`gx.GalapagosScaffold.from_card`	`scaffolds/<name>/card.yaml`
Model Card	a model (path + host)	`gx.GalapagosModel.from_card`	user-supplied
Verification Card	a submitted discovery for expert review	Hub `POST /api/verifications`	Hub submission

All four are defined in galapagos.cards.schema with pydantic v2 and extra="allow": every card has a small required core of fields, and you may attach any number of method- or task-specific extras without a core change. The tables below mark the core fields; everything else is optional protocol metadata that the platform reads when present.

One protocol, two directions

Load a card to use an artifact: gx.GalapagosTask.from_card(name="circle_packing"). Submit a card to share one: galapagos submit --card my_task/card.yaml --kind task. The card you load and the card you submit are the same file, validated by the same schema — so anything that loads in the galapagos library is publishable to the Hub, and anything on the Hub loads in the library.

Task Card¶

A task is a discovery problem. The Task Card states what to optimize (the problem and its metrics), how to score it (the deterministically-verifiable evaluator), and under what constraints (software/hardware, evaluation mode). It is the single source of truth for a task; the Evaluator it points at is supplied to every scaffold that runs the task, so any scaffold can be pointed at any task.

Field	Type	Meaning
`name`	str	(core) unique task id.
`display_name`	str	human label.
`domain`	str	top-level domain (`math`, `gpu_kernel`, `systems`, `ml`, `nlp`, `bio`, …).
`macro` / `family`	str	the sub-domain / sub-grouping (e.g. `packing`).
`description`	str	the full problem statement, injected into prompts as `task.context`.
`summary`	str	a one-line description.
`metrics`	list[dict]	the evaluation metrics — a list, since a task may score on several (see below).
`metric`	dict	the legacy single-metric form — `{key, direction, type}` (kept for back-compat).
`components`	obj	the task components: `{initial_program, evaluator, config, requirement}`.
`constraint`	obj	the task constraint — software/hardware spec the run must satisfy.
`evaluation`	obj	`{format, mode}` — `mode` ∈ `local` (default) \| `docker`.
`language`	str	the seed/solution language (`python`, `cuda`, …).
`modality`	str	the I/O modality (`text`, `image`, …).
`library`	str/list	libraries the task depends on.
`references`	obj	best-known score + source.
`metadata`	obj	free-form.

The evaluation `metrics` is a list of dicts¶

A task is rarely single-objective: a kernel must be both correct and fast; a packing must be both valid and dense. So metrics is a list, one dict per objective (the legacy metric field holds a single {key, direction, type} dict):

Sub-field	Meaning
`metric_name`	the metric's key in the Evaluator's output dict (e.g. `combined_score`, `latency_ms`).
`metric_direction`	`maximize` \| `minimize`.
`metric_description`	what the metric means, in prose.
`metric_computation`	how it is computed — the deterministic rule the evaluator implements.

The first metric is, by convention, the headline combined_score the search drives.

Task components¶

components bundles the four files that make a task runnable:

Component	What it is
`initial_program`	the seed — a path (`initial_program.py`) or inline code — the search starts from, with `# EVOLVE-BLOCK` markers around the region the search may rewrite.
`evaluator`	the deterministically-verifiable scorer — `evaluate(program_path) -> dict` returning the metrics. This is the `Evaluator` component.
`config`	an optional `config.yaml` of task-specific knobs.
`requirement`	a `requirements.txt` path, or an inline list of pip requirements.

A card that omits initial_program/evaluator is a metadata-only entry: it loads for the catalog and the Hub, but cannot run.

name: circle_packing
display_name: Circle Packing (n=26)
domain: math
family: packing                            # macro / sub-domain
summary: "Pack 26 circles in the unit square; maximize the sum of radii."
description: |
  Find centers and radii for 26 non-overlapping circles inside the unit square [0,1]^2 that
  maximize the sum of radii. Only the code inside the EVOLVE-BLOCK is modified by the search.
  Validity and the score are recomputed independently from the returned geometry (anti reward-hacking).

metrics:                                   # a list — a task may report several metrics
  - metric_name: combined_score
    metric_direction: maximize
    metric_description: "Fraction of the AlphaEvolve best (sum_radii / 2.635)."
    metric_computation: "Re-validate the geometry, then sum_radii / 2.635; 0.0 if invalid."
  - metric_name: sum_radii
    metric_direction: maximize
    metric_description: "Total radius of the 26 packed circles."
    metric_computation: "Sum of the radii recomputed from the returned centers/radii."

components:
  initial_program: initial_program.py     # path (or inline code)
  evaluator: evaluator.py                  # the deterministic verifiable scorer
  config: config.yaml
  requirement: [numpy]                     # inline list, or a requirements.txt path

constraint: {gpu: none, docker: optional, est_runtime_s: 3}
evaluation: {format: python, mode: local}      # mode: local | docker
language: python
modality: text
library: numpy
references: {best_known: 2.635, source: AlphaEvolve}

import galapagos as gx

task = gx.GalapagosTask.from_card(name="circle_packing")   # registered task → loads from the catalog
task.context                       # the problem statement (from description)
seed = task.initial_genome()       # the seed Genome (generation 0)
task.evaluator                     # the deterministic verifiable scorer

Tasks must be registered to load by name

from_card(name=...) resolves against the registered task catalog. To run a task that is not in the catalog, either submit its card to the Hub, or point at a local task directory:

task = gx.GalapagosTask.from_card(path="./my_task/card.yaml")    # a local task card

Deterministic, verifiable scoring (anti reward-hacking)

The evaluator must recompute the objective from the candidate's raw output — never trust a self-reported score. circle_packing's evaluator discards the program's own sum_radii and re-validates every constraint (count, bounds, non-overlap), returning combined_score = 0.0 on any violation. The catalog ships 64 runnable bundled tasks; circle_packing, function_minimization, and playground_sphere are the canonical quickstart examples, and the platform's roadmap scopes 300+ tasks.

Scaffold Card¶

A scaffold is an evolutionary-search method. In Galapagos a method is not a monolithic loop — it is a composition of the six components driven by a controller. The Scaffold Card declares that composition: the controller class and which implementation fills each of the six slots. Two methods differ only in their slot fillings, never in the architecture.

Field	Type	Meaning
`name`	str	(core) unique scaffold id (the slug).
`display_name`	str	human label.
`organization`	str	HF-Hub-style group; `repo_id = <organization>/<display_name>`.
`type`	str	`test_time_search`.
`description`	str	full prose.
`summary`	str	one-liner.
`source`	str	the paper or repo the method comes from.
`tags`	list	free-form tags.
`license`	str	SPDX id or label.
`controller`	str	dotted path to the `GalapagosScaffold` subclass that orchestrates the loop; omit for a card-only `spec` method.
`components`	obj	the six slots — `population`, `selection_policy`, `prompt_builder`, `proposer`, `evaluator`, `memory`. Each is inline code, a `module.Class` path, or a `.py` path; an omitted component is not used.
`model`	obj	`{default, host, roles}` — the default model + the roles it plays.
`requirements`	obj	`{gpu, docker, python}`.

The bundled adaevolve card (src/galapagos/scaffolds/adaevolve/card.yaml):

name: adaevolve
display_name: AdaEvolve
organization: "SkyDiscover"
type: test_time_search
summary: "Hierarchical adaptive search: G-signal exploration intensity, UCB island allocation, and LLM meta-guidance on stagnation."
description: |
  AdaEvolve reframes LLM-driven program evolution as hierarchical adaptive optimization driven by
  one signal — the accumulated fitness-improvement signal G (an Adam-style second moment of
  normalized improvements). Level 1 maps each island's G to an exploration intensity that splits
  parent sampling into explore/exploit/balanced modes over per-island quality-diversity archives.
  Level 2 allocates iterations across islands with a decayed-reward UCB bandit (globally
  normalized rewards fix poor-island bias), ring migration, and dynamic island spawning from
  heterogeneous presets when global productivity collapses. Level 3 detects stagnation via a
  windowed improvement rate and asks a guide LLM for breakthrough "paradigm" ideas that are
  injected into prompts and applied to the global best until exhausted.
source: "AdaEvolve: Adaptive LLM-Driven Zeroth-Order Optimization (UC Berkeley); reference implementation in SkyDiscover"
tags: [adaptive, ucb, islands, quality-diversity, meta-guidance, diff-evolution]
license: Apache-2.0
controller: galapagos.scaffolds.adaevolve.scaffold.AdaEvolveScaffold
components:
  population: {kind: qd_island_archipelago}   # each slot: a kind, module.Class, or .py path
  selection_policy: {kind: adaptive_intensity_ucb}
  prompt_builder: {kind: adaevolve_template}
  proposer: {kind: diff}
  evaluator: {kind: task}                     # supplied by the task
  memory: {kind: paradigm_tactics}            # omit this line ⇒ Memory unused
model:
  default: "openai/gpt-5.5"
  host: openrouter

import galapagos as gx

config   = gx.GalapagosConfig.from_config(scaffold_name="adaevolve")
model    = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")
scaffold = gx.GalapagosScaffold.from_card(name="adaevolve", config=config, model=model)

The components mapping is present even for a card-only spec method — it documents which implementation fills each slot before any code exists.

Build your own — no card file needed

The same six slots can be passed directly to from_card, each as a component instance, a "module.Class" path, or a .py file:

scaffold = gx.GalapagosScaffold.from_card(
    population="galapagos.components.IslandPopulation",
    selection_policy="galapagos.components.UCBBanditPolicy",
    prompt_builder="galapagos.components.DefaultPromptBuilder",
    proposer="./my_proposer.py",                 # a .py file with one Proposer subclass
    memory="galapagos.components.ScratchpadMemory",
    model=model,
)

Omitting a slot leaves that component unused (e.g. no memory= ⇒ a Memory-free loop). The catalog ships eight bundled scaffolds — adaevolve, beam_search, best_of_n, best_of_n_attempts, evox, meta_harness, openevolve, and topk — all runnable.

Model Card¶

A Model Card pins a model so a run is reproducible from disk: a display name, the real model path, and the host that serves it.

Field	Type	Meaning
`name`	str	(core) the model's display name / id.
`model_path`	str	the real provider model name (e.g. `openai/gpt-5.5`).
`host`	str	where it is served — see the host list below. Default `openrouter`.
`temperature`	float	sampling temperature.
`max_tokens`	int	generation cap.

The host selects an OpenAI-compatible endpoint. The protocol's host vocabulary is:

huggingface · openrouter · vllm · togetherai (Together AI) · litellm · openai · anthropic · azure · bedrock · google.

name: gpt-5.5
model_path: openai/gpt-5.5
host: openrouter
temperature: 0.7
max_tokens: 16384

model = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")
model = gx.GalapagosModel.from_card(path="my_model_card.yaml")     # or pin it from a card file

See Models for the host-to-base_url resolution table, the three mandated load forms (HF / hosting platform / local vLLM), and which hosts are wired into the shipped loader.

Verification Card¶

A discovery is more than a number — it is a claim with provenance. When a search finds a strong solution, you submit it as a Verification Card: the task, the scaffold (or agent) that produced it, the best solution, and the full discovery trajectory, for review by a domain expert. This is what turns a result into a portable, reviewable artifact and feeds the live leaderboard.

Field	Type	Meaning
`task`	str	(core) the task the discovery was made on.
`scaffold`	str	the scaffold that produced it (or…).
`agent`	str	…the agent that produced it.
`submitter`	str	who is submitting.
`claimed_score`	float	the claimed headline score.
`best_solution`	str	the discovered solution — inline, or a path.
`trajectory`	str	path / URI to the full discovery trajectory.
`status`	str	`unverified` \| `under_review` \| `verified` \| `rejected`.
`notes`	str	reviewer / submitter notes.

task: circle_packing
scaffold: adaevolve
submitter: passing2961
claimed_score: 0.9997
best_solution: solutions/circle_packing_2.6342.py
trajectory: runs/2026-06-08_adaevolve_circle_packing/
status: unverified
notes: "26 circles, sum_radii = 2.6342 (99.97% of the AlphaEvolve best)."

A verification card is not submitted through the CLI — it is POSTed to a Hub instance as a JSON object at POST /api/verifications (see the full flow in Submit to the Hub):

curl -X POST https://open-galapagos.com/api/verifications \
     -H "authorization: Bearer $TOKEN" -H "content-type: application/json" \
     -d "$(python -c 'import json,yaml; print(json.dumps(yaml.safe_load(open("circle_packing_discovery.yaml"))))')"

The claimed score is re-verified, not trusted

On submission, the task's deterministic Evaluator is re-run against the submitted best_solution (the same anti-reward-hacking recompute the task uses during search), so a submission cannot inflate its own number. A domain expert then reviews the trajectory before the discovery is marked verified.

Loading and submitting¶

Every card type follows the same two-verb protocol:

Load (use an artifact)Submit (share an artifact)

import galapagos as gx

task     = gx.GalapagosTask.from_card(name="circle_packing")
scaffold = gx.GalapagosScaffold.from_card(name="openevolve")
model    = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")

Functional aliases: gx.load_task, gx.load_scaffold, gx.load_model, gx.load_config. Pass path=.../card.yaml instead of name=... to load a local card file.

galapagos submit --card my_task/card.yaml --kind task     # validate a task card
galapagos submit --card my_scaffold/card.yaml             # kind auto-detected (scaffold)

The card is validated by the same galapagos.cards.schema that validates loaded cards, then published to the Hub via POST /api/scaffolds / POST /api/tasks — so the library and the Hub never disagree about a card's shape. A discovery (verification card) is POSTed to POST /api/verifications; see Submit to the Hub.

The library ⊆ Hub invariant

The card is the same artifact locally and on the Hub. The cards bundled in the galapagos wheel are a subset of the Hub catalog — never a fork. One schema validates both directions, so a card that loads in the library publishes to the Hub, and vice versa.