Automated testing of the DRM subsystem

Introduction

Making sure that changes to the core or drivers don’t introduce regressions canbe very time-consuming when lots of different hardware configurations need tobe tested. Moreover, it isn’t practical for each person interested in thistesting to have to acquire and maintain what can be a considerable amount ofhardware.

Also, it is desirable for developers to check for regressions in their code bythemselves, instead of relying on the maintainers to find them and thenreporting back.

There are facilities in gitlab.freedesktop.org to automatically test Mesa thatcan be used as well for testing the DRM subsystem. This document explains howpeople interested in testing it can use this shared infrastructure to savequite some time and effort.

Relevant files

drivers/gpu/drm/ci/gitlab-ci.yml

This is the root configuration file for GitLab CI. Among other less interestingbits, it specifies the specific version of the scripts to be used. There aresome variables that can be modified to change the behavior of the pipeline:

DRM_CI_PROJECT_PATH

Repository that contains the Mesa software infrastructure for CI

DRM_CI_COMMIT_SHA

A particular revision to use from that repository

UPSTREAM_REPO

URL to git repository containing the target branch

TARGET_BRANCH

Branch to which this branch is to be merged into

IGT_VERSION

Revision of igt-gpu-tools being used, fromhttps://gitlab.freedesktop.org/drm/igt-gpu-tools

drivers/gpu/drm/ci/testlist.txt

IGT tests to be run on all drivers (unless mentioned in a driver’s *-skips.txtfile, see below).

drivers/gpu/drm/ci/${DRIVER_NAME}-${HW_REVISION}-fails.txt

Lists the known failures for a given driver on a specific hardware revision.

drivers/gpu/drm/ci/${DRIVER_NAME}-${HW_REVISION}-flakes.txt

Lists the tests that for a given driver on a specific hardware revision areknown to behave unreliably. These tests won’t cause a job to fail regardless ofthe result. They will still be run.

Each new flake entry must be associated with a link to the email reporting thebug to the author of the affected driver or the relevant GitLab issue. The entrymust also include the board name or Device Tree name, the first kernel versionaffected, the IGT version used for tests, and an approximation of the failure rate.

They should be provided under the following format:

# Bug Report: $LORE_URL_OR_GITLAB_ISSUE# Board Name: broken-board.dtb# Linux Version: 6.6-rc1# IGT Version: 1.28-gd2af13d9f# Failure Rate: 100flaky-test

Use the appropriate link below to create a GitLab issue:amdgpu driver:https://gitlab.freedesktop.org/drm/amd/-/issuesi915 driver:https://gitlab.freedesktop.org/drm/i915/kernel/-/issuesmsm driver:https://gitlab.freedesktop.org/drm/msm/-/issuesxe driver:https://gitlab.freedesktop.org/drm/xe/kernel/-/issues

drivers/gpu/drm/ci/${DRIVER_NAME}-${HW_REVISION}-skips.txt

Lists the tests that won’t be run for a given driver on a specific hardwarerevision. These are usually tests that interfere with the running of the testlist due to hanging the machine, causing OOM, taking too long, etc.

How to enable automated testing on your tree

1. Create a Linux tree inhttps://gitlab.freedesktop.org/ if you don’t have oneyet

2. In your kernel repo’s configuration (eg.https://gitlab.freedesktop.org/janedoe/linux/-/settings/ci_cd), change theCI/CD configuration file from .gitlab-ci.yml todrivers/gpu/drm/ci/gitlab-ci.yml.

3. Request to be added to the drm/ci-ok group so that your user has thenecessary privileges to run the CI onhttps://gitlab.freedesktop.org/drm/ci-ok

4. Next time you push to this repository, you will see a CI pipeline beingcreated (eg.https://gitlab.freedesktop.org/janedoe/linux/-/pipelines)

5. The various jobs will be run and when the pipeline is finished, all jobsshould be green unless a regression has been found.

6. Warnings in the pipeline indicate that lockdep(seeRuntime locking correctness validator) issues have been detectedduring the tests.

How to update test expectations

If your changes to the code fix any tests, you will have to remove one or morelines from one or more of the files indrivers/gpu/drm/ci/${DRIVER_NAME}_*_fails.txt, for each of the test platformsaffected by the change.

How to expand coverage

If your code changes make it possible to run more tests (by solving reliabilityissues, for example), you can remove tests from the flakes and/or skips lists,and then the expected results if there are any known failures.

If there is a need for updating the version of IGT being used (maybe you haveadded more tests to it), update the IGT_VERSION variable at the top of thegitlab-ci.yml file.

How to test your changes to the scripts

For testing changes to the scripts in the drm-ci repo, change theDRM_CI_PROJECT_PATH and DRM_CI_COMMIT_SHA variables indrivers/gpu/drm/ci/gitlab-ci.yml to match your fork of the project (eg.janedoe/drm-ci). This fork needs to be inhttps://gitlab.freedesktop.org/.

How to incorporate external fixes in your testing

Often, regressions in other trees will prevent testing changes local to thetree under test. These fixes will be automatically merged in during the buildjobs from a branch in the target tree that is named as${TARGET_BRANCH}-external-fixes.

If the pipeline is not in a merge request and a branch with the same nameexists in the local tree, commits from that branch will be merged in as well.

How to deal with automated testing labs that may be down

If a hardware farm is down and thus causing pipelines to fail that wouldotherwise pass, one can disable all jobs that would be submitted to that farmby editing the file athttps://gitlab.freedesktop.org/gfx-ci/lab-status/-/blob/main/lab-status.yml.