Skip to content

Cards — the communication protocol

The Card is the central concept of Galapagos. A card is a small YAML document that describes one shareable artifact — a task, a scaffold, a model, or a discovery — and it is the only thing you ever exchange with the platform. Loading a task, submitting a scaffold, pinning a model, or publishing a result are all the same gesture: read a card, or write a card.

If you have used the Hugging Face Hub, the analogy is exact: there, model cards and dataset cards are the unit of sharing; in Galapagos the card is the unit of sharing for evolutionary-search scaffolds, discovery tasks, models, and reviewed discoveries. The same YAML loads a card locally (gx.*.from_card) and uploads it to the Hub (galapagos submit) — one schema, one protocol, both directions.

There are four card types:

Card Describes Loaded with Lives at
Task Card an evaluation task (problem + metrics + evaluator) gx.GalapagosTask.from_card tasks/<name>/card.yaml
Scaffold Card a discovery method (controller + six components) gx.GalapagosScaffold.from_card scaffolds/<name>/card.yaml
Model Card a model (path + host) gx.GalapagosModel.from_card user-supplied
Verification Card a submitted discovery for expert review Hub POST /api/verifications Hub submission

All four are defined in galapagos.cards.schema with pydantic v2 and extra="allow": every card has a small required core of fields, and you may attach any number of method- or task-specific extras without a core change. The tables below mark the core fields; everything else is optional protocol metadata that the platform reads when present.

One protocol, two directions

Load a card to use an artifact: gx.GalapagosTask.from_card(name="circle_packing"). Submit a card to share one: galapagos submit --card my_task/card.yaml --kind task. The card you load and the card you submit are the same file, validated by the same schema — so anything that loads in the galapagos library is publishable to the Hub, and anything on the Hub loads in the library.


Task Card

A task is a discovery problem. The Task Card states what to optimize (the problem and its metrics), how to score it (the deterministically-verifiable evaluator), and under what constraints (software/hardware, evaluation mode). It is the single source of truth for a task; the Evaluator it points at is supplied to every scaffold that runs the task, so any scaffold can be pointed at any task.

Field Type Meaning
name str (core) unique task id.
display_name str human label.
domain str top-level domain (math, gpu_kernel, systems, ml, nlp, bio, …).
macro / family str the sub-domain / sub-grouping (e.g. packing).
description str the full problem statement, injected into prompts as task.context.
summary str a one-line description.
metrics list[dict] the evaluation metrics — a list, since a task may score on several (see below).
metric dict the legacy single-metric form — {key, direction, type} (kept for back-compat).
components obj the task components: {initial_program, evaluator, config, requirement}.
constraint obj the task constraint — software/hardware spec the run must satisfy.
evaluation obj {format, mode}modelocal (default) | docker.
language str the seed/solution language (python, cuda, …).
modality str the I/O modality (text, image, …).
library str/list libraries the task depends on.
references obj best-known score + source.
metadata obj free-form.

The evaluation metrics is a list of dicts

A task is rarely single-objective: a kernel must be both correct and fast; a packing must be both valid and dense. So metrics is a list, one dict per objective (the legacy metric field holds a single {key, direction, type} dict):

Sub-field Meaning
metric_name the metric's key in the Evaluator's output dict (e.g. combined_score, latency_ms).
metric_direction maximize | minimize.
metric_description what the metric means, in prose.
metric_computation how it is computed — the deterministic rule the evaluator implements.

The first metric is, by convention, the headline combined_score the search drives.

Task components

components bundles the four files that make a task runnable:

Component What it is
initial_program the seed — a path (initial_program.py) or inline code — the search starts from, with # EVOLVE-BLOCK markers around the region the search may rewrite.
evaluator the deterministically-verifiable scorerevaluate(program_path) -> dict returning the metrics. This is the Evaluator component.
config an optional config.yaml of task-specific knobs.
requirement a requirements.txt path, or an inline list of pip requirements.

A card that omits initial_program/evaluator is a metadata-only entry: it loads for the catalog and the Hub, but cannot run.

name: circle_packing
display_name: Circle Packing (n=26)
domain: math
family: packing                            # macro / sub-domain
summary: "Pack 26 circles in the unit square; maximize the sum of radii."
description: |
  Find centers and radii for 26 non-overlapping circles inside the unit square [0,1]^2 that
  maximize the sum of radii. Only the code inside the EVOLVE-BLOCK is modified by the search.
  Validity and the score are recomputed independently from the returned geometry (anti reward-hacking).

metrics:                                   # a list — a task may report several metrics
  - metric_name: combined_score
    metric_direction: maximize
    metric_description: "Fraction of the AlphaEvolve best (sum_radii / 2.635)."
    metric_computation: "Re-validate the geometry, then sum_radii / 2.635; 0.0 if invalid."
  - metric_name: sum_radii
    metric_direction: maximize
    metric_description: "Total radius of the 26 packed circles."
    metric_computation: "Sum of the radii recomputed from the returned centers/radii."

components:
  initial_program: initial_program.py     # path (or inline code)
  evaluator: evaluator.py                  # the deterministic verifiable scorer
  config: config.yaml
  requirement: [numpy]                     # inline list, or a requirements.txt path

constraint: {gpu: none, docker: optional, est_runtime_s: 3}
evaluation: {format: python, mode: local}      # mode: local | docker
language: python
modality: text
library: numpy
references: {best_known: 2.635, source: AlphaEvolve}
import galapagos as gx

task = gx.GalapagosTask.from_card(name="circle_packing")   # registered task → loads from the catalog
task.context                       # the problem statement (from description)
seed = task.initial_genome()       # the seed Genome (generation 0)
task.evaluator                     # the deterministic verifiable scorer

Tasks must be registered to load by name

from_card(name=...) resolves against the registered task catalog. To run a task that is not in the catalog, either submit its card to the Hub, or point at a local task directory:

task = gx.GalapagosTask.from_card(path="./my_task/card.yaml")    # a local task card

Deterministic, verifiable scoring (anti reward-hacking)

The evaluator must recompute the objective from the candidate's raw output — never trust a self-reported score. circle_packing's evaluator discards the program's own sum_radii and re-validates every constraint (count, bounds, non-overlap), returning combined_score = 0.0 on any violation. The catalog ships 64 runnable bundled tasks; circle_packing, function_minimization, and playground_sphere are the canonical quickstart examples, and the platform's roadmap scopes 300+ tasks.


Scaffold Card

A scaffold is an evolutionary-search method. In Galapagos a method is not a monolithic loop — it is a composition of the six components driven by a controller. The Scaffold Card declares that composition: the controller class and which implementation fills each of the six slots. Two methods differ only in their slot fillings, never in the architecture.

Field Type Meaning
name str (core) unique scaffold id (the slug).
display_name str human label.
organization str HF-Hub-style group; repo_id = <organization>/<display_name>.
type str test_time_search.
description str full prose.
summary str one-liner.
source str the paper or repo the method comes from.
tags list free-form tags.
license str SPDX id or label.
controller str dotted path to the GalapagosScaffold subclass that orchestrates the loop; omit for a card-only spec method.
components obj the six slotspopulation, selection_policy, prompt_builder, proposer, evaluator, memory. Each is inline code, a module.Class path, or a .py path; an omitted component is not used.
model obj {default, host, roles} — the default model + the roles it plays.
requirements obj {gpu, docker, python}.

The bundled adaevolve card (src/galapagos/scaffolds/adaevolve/card.yaml):

name: adaevolve
display_name: AdaEvolve
organization: "SkyDiscover"
type: test_time_search
summary: "Hierarchical adaptive search: G-signal exploration intensity, UCB island allocation, and LLM meta-guidance on stagnation."
description: |
  AdaEvolve reframes LLM-driven program evolution as hierarchical adaptive optimization driven by
  one signal — the accumulated fitness-improvement signal G (an Adam-style second moment of
  normalized improvements). Level 1 maps each island's G to an exploration intensity that splits
  parent sampling into explore/exploit/balanced modes over per-island quality-diversity archives.
  Level 2 allocates iterations across islands with a decayed-reward UCB bandit (globally
  normalized rewards fix poor-island bias), ring migration, and dynamic island spawning from
  heterogeneous presets when global productivity collapses. Level 3 detects stagnation via a
  windowed improvement rate and asks a guide LLM for breakthrough "paradigm" ideas that are
  injected into prompts and applied to the global best until exhausted.
source: "AdaEvolve: Adaptive LLM-Driven Zeroth-Order Optimization (UC Berkeley); reference implementation in SkyDiscover"
tags: [adaptive, ucb, islands, quality-diversity, meta-guidance, diff-evolution]
license: Apache-2.0
controller: galapagos.scaffolds.adaevolve.scaffold.AdaEvolveScaffold
components:
  population: {kind: qd_island_archipelago}   # each slot: a kind, module.Class, or .py path
  selection_policy: {kind: adaptive_intensity_ucb}
  prompt_builder: {kind: adaevolve_template}
  proposer: {kind: diff}
  evaluator: {kind: task}                     # supplied by the task
  memory: {kind: paradigm_tactics}            # omit this line ⇒ Memory unused
model:
  default: "openai/gpt-5.5"
  host: openrouter
import galapagos as gx

config   = gx.GalapagosConfig.from_config(scaffold_name="adaevolve")
model    = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")
scaffold = gx.GalapagosScaffold.from_card(name="adaevolve", config=config, model=model)

The components mapping is present even for a card-only spec method — it documents which implementation fills each slot before any code exists.

Build your own — no card file needed

The same six slots can be passed directly to from_card, each as a component instance, a "module.Class" path, or a .py file:

scaffold = gx.GalapagosScaffold.from_card(
    population="galapagos.components.IslandPopulation",
    selection_policy="galapagos.components.UCBBanditPolicy",
    prompt_builder="galapagos.components.DefaultPromptBuilder",
    proposer="./my_proposer.py",                 # a .py file with one Proposer subclass
    memory="galapagos.components.ScratchpadMemory",
    model=model,
)
Omitting a slot leaves that component unused (e.g. no memory= ⇒ a Memory-free loop). The catalog ships eight bundled scaffolds — adaevolve, beam_search, best_of_n, best_of_n_attempts, evox, meta_harness, openevolve, and topk — all runnable.


Model Card

A Model Card pins a model so a run is reproducible from disk: a display name, the real model path, and the host that serves it.

Field Type Meaning
name str (core) the model's display name / id.
model_path str the real provider model name (e.g. openai/gpt-5.5).
host str where it is served — see the host list below. Default openrouter.
temperature float sampling temperature.
max_tokens int generation cap.

The host selects an OpenAI-compatible endpoint. The protocol's host vocabulary is:

huggingface · openrouter · vllm · togetherai (Together AI) · litellm · openai · anthropic · azure · bedrock · google.

name: gpt-5.5
model_path: openai/gpt-5.5
host: openrouter
temperature: 0.7
max_tokens: 16384
model = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")
model = gx.GalapagosModel.from_card(path="my_model_card.yaml")     # or pin it from a card file

See Models for the host-to-base_url resolution table, the three mandated load forms (HF / hosting platform / local vLLM), and which hosts are wired into the shipped loader.


Verification Card

A discovery is more than a number — it is a claim with provenance. When a search finds a strong solution, you submit it as a Verification Card: the task, the scaffold (or agent) that produced it, the best solution, and the full discovery trajectory, for review by a domain expert. This is what turns a result into a portable, reviewable artifact and feeds the live leaderboard.

Field Type Meaning
task str (core) the task the discovery was made on.
scaffold str the scaffold that produced it (or…).
agent str …the agent that produced it.
submitter str who is submitting.
claimed_score float the claimed headline score.
best_solution str the discovered solution — inline, or a path.
trajectory str path / URI to the full discovery trajectory.
status str unverified | under_review | verified | rejected.
notes str reviewer / submitter notes.
task: circle_packing
scaffold: adaevolve
submitter: passing2961
claimed_score: 0.9997
best_solution: solutions/circle_packing_2.6342.py
trajectory: runs/2026-06-08_adaevolve_circle_packing/
status: unverified
notes: "26 circles, sum_radii = 2.6342 (99.97% of the AlphaEvolve best)."

A verification card is not submitted through the CLI — it is POSTed to a Hub instance as a JSON object at POST /api/verifications (see the full flow in Submit to the Hub):

curl -X POST https://open-galapagos.com/api/verifications \
     -H "authorization: Bearer $TOKEN" -H "content-type: application/json" \
     -d "$(python -c 'import json,yaml; print(json.dumps(yaml.safe_load(open("circle_packing_discovery.yaml"))))')"

The claimed score is re-verified, not trusted

On submission, the task's deterministic Evaluator is re-run against the submitted best_solution (the same anti-reward-hacking recompute the task uses during search), so a submission cannot inflate its own number. A domain expert then reviews the trajectory before the discovery is marked verified.


Loading and submitting

Every card type follows the same two-verb protocol:

import galapagos as gx

task     = gx.GalapagosTask.from_card(name="circle_packing")
scaffold = gx.GalapagosScaffold.from_card(name="openevolve")
model    = gx.GalapagosModel.from_card(name="openai/gpt-5.5", host="openrouter")
Functional aliases: gx.load_task, gx.load_scaffold, gx.load_model, gx.load_config. Pass path=.../card.yaml instead of name=... to load a local card file.

galapagos submit --card my_task/card.yaml --kind task     # validate a task card
galapagos submit --card my_scaffold/card.yaml             # kind auto-detected (scaffold)
The card is validated by the same galapagos.cards.schema that validates loaded cards, then published to the Hub via POST /api/scaffolds / POST /api/tasks — so the library and the Hub never disagree about a card's shape. A discovery (verification card) is POSTed to POST /api/verifications; see Submit to the Hub.

The library ⊆ Hub invariant

The card is the same artifact locally and on the Hub. The cards bundled in the galapagos wheel are a subset of the Hub catalog — never a fork. One schema validates both directions, so a card that loads in the library publishes to the Hub, and vice versa.


See also

  • Core components — the six slots a Scaffold Card declares.
  • Genome — the unit a Task Card's evaluator scores.
  • Models — the hosts a Model Card's host selects.
  • The Hub — where cards are published and the leaderboard lives.