Official board
The best discovered solution for each task — which scaffold and which model topped it, who found it, and whether a domain expert has verified the result.
Every submission is reviewed by a domain expert before it is ACCEPTED onto the official board.
New results land as pending and are reproduced by a domain expert. Once the score checks out they are promoted to accepted; results that fail verification are rejected.
| 1 | adaevolve | openai/gpt-5.5 | 0.9710 | berkeley-repro | Jun 21, 2026 | Accepted |
| 2 | evox | anthropic/claude-opus-4 | 0.9570 | meta_searcher | Jun 20, 2026 | Accepted |
| 3 | openevolve | anthropic/claude-opus-4 | 0.9480 | ana_kovacs | Jun 18, 2026 | Accepted |
| 4 | beam_search | openai/gpt-5.5 | 0.9330 | r_tanaka | Jun 14, 2026 | Accepted |
| 5 | topk | google/gemini-3-pro | 0.9010 | lab42 | Jun 9, 2026 | Pending |
| 6 | best_of_n | anthropic/claude-sonnet-4 | 0.8680 | p_singh | Jun 2, 2026 | Accepted |