Task library
Document environments the way a strong research deck would: clear defaults, precise constraints, and readable benchmark framing.
Browse tasksResearch benchmark platform
WorldModel Gym turns environments, uploads, traces, and leaderboards into a public-facing benchmark product that feels intentional from the first click.

Benchmark narrative
Build a benchmark surface that communicates as well as the experiment itself.

Our benchmark surfaces
The site is designed to make benchmark evidence readable: task framing, run uploads, leaderboard slices, and trace inspection all move together as one public story.

Task library
Document environments the way a strong research deck would: clear defaults, precise constraints, and readable benchmark framing.
Browse tasksLive leaderboards
Compare planning quality, return, and cost in one public surface instead of across notebooks, screenshots, and scattered artifacts.
Open leaderboardsUpload studio
Create a run, attach metrics and traces, and publish it from the browser while keeping automation-friendly CLI and API options.
Publish a runProduct workflow
Workflow Create
Shape sparse-reward tasks, defaults, and success criteria before you ever touch a leaderboard. This is the fastest way to move from a research idea to a benchmark someone else can immediately understand.
Prompt
Frame a partially observable benchmark with delayed reward, reproducible seeds, and a planning budget that matches the story you want the leaderboard to tell.
Step 1
Choose an environment with explicit constraints
Step 2
Set defaults that make the benchmark reproducible
Step 3
Carry the task into evaluation and upload flows

Task framing
Task defaults
Observation mode
Reward design
Live benchmark product
Ship new runs, compare them publicly, and use the same benchmark surface in your README, interviews, project portfolio, or research demo.
