Xe Device Coredump¶

Xe uses dev_coredump infrastructure for exposing the crash errors in astandardized way. Once a crash occurs, devcoredump exposes a temporarynode under/sys/class/devcoredump/devcd<m>/. The same node is alsoaccessible in/sys/class/drm/card<n>/device/devcoredump/. Thefailing_device symlink points to the device that crashed and created thecoredump.

The following characteristics are observed by xe when creating a devicecoredump:

Snapshot at hang:

The ‘data’ file contains a snapshot of the HW and driver states at the timethe hang happened. Due to the driver recovering from resets/crashes, it maynot correspond to the state of the system when the file is read byuserspace.

Coredump release:

After a coredump is generated, it stays in kernel memory until released byuserspace by writing anything to it, or after an internal timer expires. Theexact timeout may vary and should not be relied upon. Example to releasea coredump:

$>/sys/class/drm/card0/device/devcoredump/data

First failure only:

In general, the first hang is the most critical one since the followinghangs can be a consequence of the initial hang. For this reason a snapshotis taken only for the first failure. Until the devcoredump is released byuserspace or kernel, all subsequent hangs do not override the snapshot norcreate new ones. Devcoredump has a delayed work queue that will eventuallydelete the file node and free all the dump information.

Internal API¶

ssize_txe_devcoredump_read(char*buffer,loff_toffset,size_tcount,void*data,size_tdatalen)¶: Read data from the Xe device coredump snapshot

Parameters

char*buffer: Destination buffer to copy the coredump data into
loff_toffset: Offset in the coredump data to start reading from
size_tcount: Number of bytes to read
void*data: Pointer to the xe_devcoredump structure
size_tdatalen: Length of the data (unused)

Description

Reads a chunk of the coredump snapshot data into the provided buffer.If the devcoredump is smaller than 1.5 GB (XE_DEVCOREDUMP_CHUNK_MAX),it is read directly from a pre-written buffer. For larger devcoredumps,the pre-written buffer must be periodically repopulated from the snapshotstate due to kmalloc size limitations.

Return

Number of bytes copied on success, or a negative error code on failure.

voidxe_devcoredump(structxe_exec_queue*q,structxe_sched_job*job,constchar*fmt,...)¶: Take the required snapshots and initialize coredump device.

Parameters

structxe_exec_queue*q: The faulty xe_exec_queue, where the issue was detected.
structxe_sched_job*job: The faulty xe_sched_job, where the issue was detected.
constchar*fmt: Printf format + args to describe the reason for the core dump
...: variable arguments

Description

This function should be called at the crash time within the serializedgt_reset. It is skipped if we still have the core dump device availablewith the information of the ‘first’ snapshot.

voidxe_print_blob_ascii85(structdrm_printer*p,constchar*prefix,charsuffix,constvoid*blob,size_toffset,size_tsize)¶: print a BLOB to some useful location in ASCII85

Parameters

structdrm_printer*p: the printer object to output to
constchar*prefix: optional prefix to add to output string
charsuffix: optional suffix to add at the end. 0 disables it and isnot added to the output, which is useful when using multiple callsto dump data top
constvoid*blob: the Binary Large OBject to dump out
size_toffset: offset in bytes to skip from the front of the BLOB, must be a multiple of sizeof(u32)
size_tsize: the size in bytes of the BLOB, must be a multiple of sizeof(u32)

Description

The output is split into multiple calls todrm_puts() because some printtargets, e.g. dmesg, cannot handle arbitrarily long lines. These targets mayadd newlines, as is the case with dmesg: eachdrm_puts() call creates aseparate line.

There is also a scheduler yield call to prevent the ‘task has been stuck for120s’ kernel hang check feature from firing when printing to a slow targetsuch as dmesg over a serial port.

Movatterモバイル変換

Xe Device Coredump¶

Internal API¶