Experimental World Performance Dashboard

DART publishes a public performance dashboard that tracks experimental World Google Benchmark results over time:

The live dashboard is embedded below. If it does not load (for example before the first publication), open it directly with the link above.

It is scoped to experimental World performance only: kinematics updates, world-step throughput with sequential vs parallel compute executors, rigid-body step scaling, the contact-shaped and contact-island scalable-compute proxies, and the Phase 5 CPU baseline row. It intentionally excludes unrelated solver, SIMD, and robot-loader benchmark surfaces so the headline charts stay focused on the experimental World.

The dashboard is built entirely on GitHub infrastructure. A GitHub Actions workflow runs DART’s benchmark suites and hands the Google Benchmark JSON to benchmark-action/github-action-benchmark, which stores the history on the gh-pages branch and renders an interactive Chart.js page. There is no external account, API token, or third-party service to maintain.

The dashboard URL returns 404 until a main push, scheduled, or manually dispatched run of the Performance Dashboard workflow has published to gh-pages and GitHub Pages has finished building the branch.

How it works

The Performance Dashboard workflow (.github/workflows/performance_dashboard.yml) runs on every push to main that touches benchmark-relevant paths, twice weekly on a schedule, and on manual dispatch. Each run:

  1. builds and runs the bounded benchmark surfaces with scripts/run_performance_dashboard_benchmarks.py;

  2. merges the per-target JSON into one file with scripts/merge_benchmark_results.py;

  3. publishes the result with github-action-benchmark, which appends a point to the per-benchmark history and updates the hosted page.

When a benchmark regresses past the alert threshold the action posts a comment on the offending commit. GitHub-hosted runners are noisy, so the threshold is set conservatively (200%) and a regression does not fail the build.

Preview a run locally

You can render the same dashboard locally before anything is published:

pixi run bm-dashboard-surfaces      # build and run the benchmark surfaces
pixi run bm-dashboard-preview       # render build/performance-dashboard/index.html

Open build/performance-dashboard/index.html in a browser. The preview reads the same window.BENCHMARK_DATA shape that the hosted dashboard uses, so it matches what will be published. Pass --append to scripts/preview_performance_dashboard.py to accumulate multiple runs into a local history.

Add a benchmark to the dashboard

The dashboard reuses DART’s existing Google Benchmark targets. To track a new surface, add a BenchmarkSpec entry to scripts/run_performance_dashboard_benchmarks.py pointing at the target and a bounded --benchmark_filter. No dashboard code changes are required; the action picks up whatever rows appear in the merged JSON.

Setup (maintainers)

The dashboard requires GitHub Pages to serve from the gh-pages branch (Settings -> Pages -> Source: gh-pages / root). The workflow’s contents: write permission lets the action create and update that branch.

Future work

The core pipeline is intentionally minimal. Natural extensions, added once the baseline is stable, include per-PR regression comments and an optional secondary backend such as Bencher or CodSpeed for noise-controlled regression gating.