Experimental World Performance Dashboard
========================================
DART publishes a public performance dashboard that tracks experimental World
Google Benchmark results over time:
* Dashboard:
`https://dartsim.github.io/dart/performance/ `_
The live dashboard is embedded below. If it does not load (for example before
the first publication), open it directly with the link above.
.. raw:: html
It is scoped to experimental World performance only: kinematics updates,
world-step throughput with sequential vs parallel compute executors,
rigid-body step scaling, the contact-shaped and contact-island scalable-compute
proxies, and the Phase 5 CPU baseline row. It intentionally excludes unrelated
solver, SIMD, and robot-loader benchmark surfaces so the headline charts stay
focused on the experimental World.
The dashboard is built entirely on GitHub infrastructure. A GitHub Actions
workflow runs DART's benchmark suites and hands the Google Benchmark JSON to
`benchmark-action/github-action-benchmark
`_, which stores
the history on the ``gh-pages`` branch and renders an interactive Chart.js page.
There is no external account, API token, or third-party service to maintain.
The dashboard URL returns ``404`` until a ``main`` push, scheduled, or manually
dispatched run of the **Performance Dashboard** workflow has published to
``gh-pages`` and GitHub Pages has finished building the branch.
How it works
------------
The ``Performance Dashboard`` workflow
(``.github/workflows/performance_dashboard.yml``) runs on every push to ``main``
that touches benchmark-relevant paths, twice weekly on a schedule, and on
manual dispatch. Each run:
#. builds and runs the bounded benchmark surfaces with
``scripts/run_performance_dashboard_benchmarks.py``;
#. merges the per-target JSON into one file with
``scripts/merge_benchmark_results.py``;
#. publishes the result with ``github-action-benchmark``, which appends a point
to the per-benchmark history and updates the hosted page.
When a benchmark regresses past the alert threshold the action posts a comment
on the offending commit. GitHub-hosted runners are noisy, so the threshold is
set conservatively (``200%``) and a regression does not fail the build.
Preview a run locally
---------------------
You can render the same dashboard locally before anything is published::
pixi run bm-dashboard-surfaces # build and run the benchmark surfaces
pixi run bm-dashboard-preview # render build/performance-dashboard/index.html
Open ``build/performance-dashboard/index.html`` in a browser. The preview reads
the same ``window.BENCHMARK_DATA`` shape that the hosted dashboard uses, so it
matches what will be published. Pass ``--append`` to
``scripts/preview_performance_dashboard.py`` to accumulate multiple runs into a
local history.
Add a benchmark to the dashboard
--------------------------------
The dashboard reuses DART's existing Google Benchmark targets. To track a new
surface, add a ``BenchmarkSpec`` entry to
``scripts/run_performance_dashboard_benchmarks.py`` pointing at the target and a
bounded ``--benchmark_filter``. No dashboard code changes are required; the
action picks up whatever rows appear in the merged JSON.
Setup (maintainers)
-------------------
The dashboard requires GitHub Pages to serve from the ``gh-pages`` branch
(Settings -> Pages -> Source: ``gh-pages`` / root). The workflow's
``contents: write`` permission lets the action create and update that branch.
Future work
-----------
The core pipeline is intentionally minimal. Natural extensions, added once the
baseline is stable, include per-PR regression comments and an optional secondary
backend such as `Bencher `_ or
`CodSpeed `_ for noise-controlled regression gating.