pytorch/pytorchPublic

NotificationsYou must be signed in to change notification settings
Fork26.3k
Star96k

PyTorch OSS benchmark infra

Jump to bottom

Yang Wang edited this pageDec 16, 2025 ·30 revisions

Architecture overview

At a high level, the PyTorch OSS benchmark infrastructure consists of 5 key components:

Benchmark hardwares. Coming from various sources based on availability, they serve different use cases, such as:
1. CUDA benchmarks liketorch.compile onlinux.aws.h100 orvLLM bench
2. ROCm benchmarks onlinux.rocm.gpu.mi300.2
3. x86 CPU benchmarks onlinux.24xl.spr-metal
4. aarch64 CPU benchmark onlinux.arm64.m7g.metal
5. MPS benchmarks onmacos-m2-15
6. Android andiOS benchmarks on AWS Device Farms
Integration. This is where the benchmark results are ingested. To support different use cases across PyTorch-org, we don't dictate what benchmarks to run or how. Instead, we provide an integration point on GitHub for CI and an API to upload benchmark results when running in a local environment. This gives teams flexibility to run benchmarks their own way as long as results are saved in a standardized format. This format is documentedhere
Databases. This is currently ocated on ClickHouse Cloud athttps://console.clickhouse.cloud under thebenchmark.oss_ci_benchmark_v3 table. The table has an S3 backend where all the raw JSON files are kept.
Exploration. There are currently 3 way to explore the benchmark results:
Utilities, such as:
1. Bisection
2. Regression notification with Grafana
3. Generate PyTorch profile
4. Gather models insights, e.g. Hugging Face stats

Benchmark hardwares

Please refer topartners_pytorch_ci_runners.md on the technical steps to add your runners into PyTorch CI.

Benchmark results format

Your benchmark results should be formatted as a list of metrics as shown below. All fields are optional unless specified as required.

// The list of all benchmark metrics[  {    // Information about the benchmark    benchmark: Tuple(      name,  // Required. The name of the benchmark      mode,  // Training or inference      dtype,  // The dtype used by the benchmark      extra_info: {},  // Any additional information about the benchmark    ),    // Information about the model or the test    model: Tuple (      name,  // Required. The model or the test name      type,  // Additional information, for example is this a HF model or a micro-benchmark custom layer      backend,  // Any delegation backend used here, i.e. XNNPACK      origins,  // Tell us where this is from, i.e. HF      extra_info: {},  // Any additional information about the model or the test    ),    // Information about the benchmark result    metric: Tuple(      name,  // Required. The name of the metric. It's a good practice to include its unit here too, i.e. compilation_time(ms)      benchmark_values,  // Float. Required. The metric values. It's a list here because a benchmark is usually run multiple times      target_value,  // Float. The optional target value used to indicate if there is a regression      extra_info: {},  // Any additional information about the benchmark result    ),    // Optional information about any inputs used by the benchmark    inputs: {      name: Tuple(        dtype,  // The dtype of the input        extra_info: {},  // Any additional information about the input      )    },  },  ...]

Note that using a JSON list is optional. Writing one JSON record per line (JSONEachRow) is also accepted.

Upload the benchmark results

GitHub CI

If you are using PyTorch AWS self-hosted runners, they already have permission to upload benchmark results. No additional preparation is needed.
If you are using non-AWS runners (such as ROCm runners), please contact the PyTorch Dev Infra team (POC: @huydhn) to create a GitHub environment with S3 write permissions.vLLM bench workflow is an example.

A sample job on AWS self-hosted runners

name: A sample benchmark job that runs on all main commitson:  push:    - mainjobs:  benchmark:    runs-on: linux.2xlarge    steps:      - uses: actions/checkout@v3      - name: Run your own benchmark logic        shell: bash        run: |          set -eux          # Run your benchmark script and write the result to benchmark-results.json whose format is defined in the previous section          python run_my_benchmark_script.py > ${{ runner.temp }}/benchmark-results/benchmark-results.json          # It's also ok to write the results into multiple JSON files, for example          python run_my_benchmark_script.py --output-dir ${{ runner.temp }}/benchmark-results     - name: Upload the benchmark results to OSS benchmark database for the dashboard       uses: pytorch/test-infra/.github/actions/upload-benchmark-results@main       with:         benchmark-results-dir: ${{ runner.temp }}/benchmark-results         dry-run: false         github-token: ${{ secrets.GITHUB_TOKEN }}

A sample job on non-AWS runners

name: A sample benchmark job that runs on all main commitson:  push:    - mainjobs:  benchmark:    runs-on: linux.rocm.gpu.2  // An example non-AWS runner    environment: upload-benchmark-results  // The environment has write access S3 to upload the results    permissions:      id-token: write      contents: read    steps:      - uses: actions/checkout@v3      - name: Authenticate with AWS        uses: aws-actions/configure-aws-credentials@v4        with:          role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_upload-benchmark-results          # The max duration enforced by the server side          role-duration-seconds: 18000          aws-region: us-east-1      - name: Run your own benchmark logic        shell: bash        run: |          # Run your benchmark script and write the result to benchmark-results.json whose format is defined in the previous section          python run_my_benchmark_script.py > ${{ runner.temp }}/benchmark-results/benchmark-results.json          # It's also ok to write the results into multiple JSON files, for example          python run_my_benchmark_script.py --output-dir ${{ runner.temp }}/benchmark-results     - name: Upload the benchmark results to OSS benchmark database for the dashboard       uses: pytorch/test-infra/.github/actions/upload-benchmark-results@main       with:         benchmark-results-dir: ${{ runner.temp }}/benchmark-results         dry-run: false         github-token: ${{ secrets.GITHUB_TOKEN }}

Upload API

The alternative way to upload benchmark results outside of GitHub CI is to use theupload_benchmark_results.py script. This script requiresUPLOADER_[USERNAME|PASSWORD] credentials, so please contact PyTorch Dev Infra if you need access. Once written to the database, benchmark results should be considered immutable.

Here is an example usage:

export UPLOADER_USERNAME=<REDACT>export UPLOADER_PASSWORD=<REDACT># Any strings work as long as they are consistentexport DEVICE_NAME=cudaexport DEVICE_TYPE=$(nvidia-smi -i 0 --query-gpu=name --format=csv,noheader | awk '{print $2}')git clone https://github.com/pytorch/pytorch-integration-testingcd pytorch-integration-testing/.github/scripts# The script dependenciespip install -r requirements.txt# --repo is where the repo is checkout and built# --benchmark-name is an unique string that is used to identify the benchmark# --benchmark-results is where the JSON benchmark result files are kept# --dry-run is to prepare everything except writing the results to S3python upload_benchmark_results.py \  --repo pytorch \  --benchmark-name "My PyTorch benchmark" \  --benchmark-results benchmark-results-dir \  --device-name "${DEVICE_NAME}" \  --device-type "${DEVICE_TYPE}" \  --dry-run

You can also set the repo metadata manually, for example, when using nightly or release binaries, for example

# Use PyTorch 2.7 releasepython upload_benchmark_results.py \  --repo-name "pytorch/pytorch" \  --head-branch "release/2.7" \  --head-sha "e2d141dbde55c2a4370fac5165b0561b6af4798b" \  --benchmark-name "My PyTorch benchmark" \  --benchmark-results benchmark-results-dir \  --device-name "${DEVICE_NAME}" \  --device-type "${DEVICE_TYPE}" \  --dry-run# Use PyTorch nightlypython upload_benchmark_results.py \  --repo-name "pytorch/pytorch" \  --head-branch "nightly" \  --head-sha "be2ad70cfa1360da5c23a04ff6ca3480fa02f278" \  --benchmark-name "My PyTorch benchmark" \  --benchmark-results benchmark-results-dir \  --device-name "${DEVICE_NAME}" \  --device-type "${DEVICE_TYPE}" \  --dry-run

Behind the scenes, we have an API deployed athttps://kvvka55vt7t2dzl6qlxys72kra0xtirv.lambda-url.us-east-1.on.aws that accepts the benchmark result JSON and an S3 path where it will be stored.

# This path is an example - any path under the v3 directory is acceptable. If the path already exists, the API will not overwrite its3_path = f"v3/{repo_name}/{head_branch}/{head_sha}/{device_name}/{device_type}/benchmark_results.json"payload = {    "username": UPLOADER_USERNAME,    "password": UPLOADER_PASSWORD,    "s3_path": s3_path,    "content": json.dumps(benchmark_results),}headers = {"content-type": "application/json"}requests.post(    # One current limitation of the API is that AWS limits the maximum size of the JSON to be less than **6MB**    "https://kvvka55vt7t2dzl6qlxys72kra0xtirv.lambda-url.us-east-1.on.aws",    json=payload,    headers=headers)

Explore benchmark results

HUD

https://hud.pytorch.org/benchmark/benchmark_list

Flambeau PyTorch CI agent

To quickly explore the benchmark database, the recommended way is to usehttps://hud.pytorch.org/flambeau. You'll need to login to GitHub and have write access to PyTorch to use the agent. The tool incorporates ourclickhouse-mcp for database exploration.

For example, a prompt to list available benchmarks:List all the benchmark names from different GitHub repositories from Jun 10th to Jun 16th, 2025. List each name only once

Query API

The query API is available athttps://queries.clickhouse.cloud/run/84649f4e-52c4-4cf9-bd6e-0a105ea145c8 for querying raw benchmark results from the database. Please contact PyTorch Dev Infra (@huydhn) if you need credentials to access it:

import osimport jsonimport requestsusername = os.environ.get("CLICKHOUSE_API_USERNAME")password = os.environ.get("CLICKHOUSE_API_PASSWORD")params = {    "format": "JSONEachRow",    "queryVariables": {               # REQUIRED: The repo name in org/repo format        "repo": "pytorch/pytorch",        # REQUIRED: The name of the benchmark        "benchmark": "TorchInductor",        # REQUIRED: YYYT-MM-DDThh:mm:ss        "startTime": "2025-06-06T00:00:00",        # REQUIRED: YYYT-MM-DDThh:mm:ss        "stopTime": "2025-06-13T00:00:00",        # OPTIONAL: Only query benchmark results for these models.  Leaving this as an empty array [] will fetch all of them        "models": ["BERT_pytorch"],        # OPTIONAL: Only fetch these metrics.  Leaving this as an empty array [] will fetch all of them        "metrics": ["speedup"],        # OPTIONAL: Filter the benchmark results by device, i.e. cuda, and arch, i.e. H100, leave them empty to get all devices        "device": "",        "arch": "",        # OPTIONAL: Use this when you only care about the benchmark results from a specific branch and commit        "branch": "",        "commit": "",    }}api_url = "https://queries.clickhouse.cloud/run/84649f4e-52c4-4cf9-bd6e-0a105ea145c8"r = requests.post(api_url, json=params, auth=(username, password))with open("benchmark_results.txt", "w") as f:    print(r.text, file=f)

The list of available benchmarks at the moment are:

"pytorch-labs/tritonbench": "compile_time""pytorch-labs/tritonbench": "nightly""pytorch/ao": "TorchAO benchmark""pytorch/ao": "micro-benchmark api""pytorch/benchmark": "TorchInductor""pytorch/executorch": "ExecuTorch""pytorch/pytorch": "PyTorch gpt-fast benchmark""pytorch/pytorch": "PyTorch operator benchmark""pytorch/pytorch": "TorchCache Benchmark""pytorch/pytorch": "TorchInductor""pytorch/pytorch": "cache_benchmarks""pytorch/pytorch": "pr_time_benchmarks""vllm-project/vllm": "vLLM benchmark"

Here is the Bento notebook N7397718 to illustrate some use cases from TorchInductor benchmark.

Benchmark database

The benchmark database on ClickHouse Cloud is accessible to all Metamates. We also provide aClickHouse MCP server that you can install to access the database through AI agents like Claude Code.

Follow these steps to access the database:

Login tohttps://console.clickhouse.cloud. For metamates, you can login using your meta email using SSO and request access. Read-only access will be granted by default.
Selectbenchmark database
Run a sample query:

select    head_branch,    head_sha,    benchmark,    model.name as model,    metric.name as name,    arrayAvg(metric.benchmark_values) as valuefrom    oss_ci_benchmark_v3where    tupleElement(benchmark, 'name') = 'TorchAO benchmark'    and oss_ci_benchmark_v3.timestamp < 1733870813    and oss_ci_benchmark_v3.timestamp > 1733784413

Benchmark UI Configuration

This section describes how to onboard a benchmark into the PyTorch Benchmark UI, from the minimum required setup to advanced customization of APIs and UI components.

Onboarding a Benchmark (Minimum Setup)

To onboard a benchmark with thedefault dashboard components, you need to be able to open a PR in
pytorch/test-infra.

This minimum setup allows your benchmark to appear in the benchmark list and render with the default UI. For advanced customization, see theAdvanced section below.

Required Steps

Add a unique benchmark ID
Define a unique benchmark ID in
benchmark_v3/configs/configurations.tsx.
Add the benchmark to a category
Add your benchmark to the appropriate benchmark list category in
benchmark_v3/configs/configurations.tsx.
This makes the benchmark visible in:
- TheBenchmark List page
- The benchmark selection UI

Example

Onboarding a benchmark calledTorchao Awesome:

unique_benchmark_id:torchao_awesome_benchmark
benchmarkName:torchao_awesome
(This must match the benchmark name stored in the ClickHouse database)
repo:pytorch/ao
(The repository name stored in the ClickHouse database)

Advanced: Customizing Benchmark Results

For benchmarks that require custom data fields or specialized UI rendering, you can define custom API fetchers and UI configurations.

See the onboarding of
TorchAoMicroAPIBenchmark
as a complete example of customized API responses and UI configuration.

Customized API Responses

By default, the Benchmark UI uses standard queries to fetch benchmark data.
If your benchmark requires additional fields or custom query logic, you must implement a custom fetcher.

Relevant Fetchers

MetadataQueryFetcher
fetchers.ts
- Used by thelist_metadata API
- Drives filter options in the benchmark dashboard
BackendQueryFetchers
fetchers.ts
- Used by theget_time_series API
- Fetches the main benchmark data rendered in the UI
BenchmarkListCommitFetcher
listCommitQueryBuilder.ts
- Used by bothlist_commits andget_time_series APIs
- Returns commit lists for navigation and data filtering

Setting Up a Custom QueryBuilder

Follow the example of
PytorchOperatorMicroBenchmarkDataFetcher.
Implement:
- addExtraInfos if you need to return additional fields
- addInnerWhereStatements if you need custom filtering logic
Register your fetcher in
backend/dataFetchers/fetchers.ts
Once registered, it will automatically be used by theget_time_series API.

Customized UI Rendering

If your benchmark:

Uses a customized API response, or
Requires specialized UI behavior

you must configure predefined Benchmark UI configs.

All UI configurations live under theteam folders in:
torchci/components/benchmark_v3/configs

Supported UI Customization Options

Components

Table
- Regression policy
- Column display names
Time Series Chart
- Regression policy
- Display name
Logging Searcher
- Search criteria configuration

Utilities

Regression notification

The alerts are currently created and maintained manually onhttps://pytorchci.grafana.net/alerting. Each alert has 3 components:

The SQL query to get the data from ClickHouse with its query interval
An alert condition with a failure threshold
The list of recipients who will receive the alert. They could be email, Slack msg, or a lambda for any custom logic such as creating GitHub issue likehttps://github.com/pytorch/alerting-infra/issues/557. Our custom alert stack can be found atpytorch/alerting-infra

I would love to contribute to PyTorch!

Movatterモバイル変換

PyTorch OSS benchmark infra

Architecture overview

Benchmark hardwares

Benchmark results format

Upload the benchmark results

GitHub CI

A sample job on AWS self-hosted runners

A sample job on non-AWS runners

Upload API

Explore benchmark results

HUD

Flambeau PyTorch CI agent

Query API

Benchmark database

Benchmark UI Configuration

This section describes how to onboard a benchmark into the PyTorch Benchmark UI, from the minimum required setup to advanced customization of APIs and UI components.

Onboarding a Benchmark (Minimum Setup)

Required Steps

Example

Advanced: Customizing Benchmark Results

Customized API Responses

Relevant Fetchers

Setting Up a Custom QueryBuilder

Customized UI Rendering

Supported UI Customization Options

Pages

Components

Utilities

Regression notification

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!