Hardware Latency Detector

Introduction

The tracer hwlat_detector is a special purpose tracer that is used todetect large system latencies induced by the behavior of certain underlyinghardware or firmware, independent of Linux itself. The code was developedoriginally to detect SMIs (System Management Interrupts) on x86 systems,however there is nothing x86 specific about this patchset. It wasoriginally written for use by the “RT” patch since the Real Timekernel is highly latency sensitive.

SMIs are not serviced by the Linux kernel, which means that it does noteven know that they are occurring. SMIs are instead set up by BIOS codeand are serviced by BIOS code, usually for “critical” events such asmanagement of thermal sensors and fans. Sometimes though, SMIs are used forother tasks and those tasks can spend an inordinate amount of time in thehandler (sometimes measured in milliseconds). Obviously this is a problem ifyou are trying to keep event service latencies down in the microsecond range.

The hardware latency detector works by hogging one of the cpus for configurableamounts of time (with interrupts disabled), polling the CPU Time Stamp Counterfor some period, then looking for gaps in the TSC data. Any gap indicates atime when the polling was interrupted and since the interrupts are disabled,the only thing that could do that would be an SMI or other hardware hiccup(or an NMI, but those can be tracked).

Note that the hwlat detector shouldNEVER be used in a production environment.It is intended to be run manually to determine if the hardware platform has aproblem with long system firmware service routines.

Usage

Write the ASCII text “hwlat” into the current_tracer file of the tracing system(mounted at /sys/kernel/tracing or /sys/kernel/tracing). It is possible toredefine the threshold in microseconds (us) above which latency spikes willbe taken into account.

Example:

# echo hwlat > /sys/kernel/tracing/current_tracer# echo 100 > /sys/kernel/tracing/tracing_thresh

The /sys/kernel/tracing/hwlat_detector interface contains the following files:

  • width - time period to sample with CPUs held (usecs)

    must be less than the total window size (enforced)

  • window - total period of sampling, width being inside (usecs)

By default the width is set to 500,000 and window to 1,000,000, meaning thatfor every 1,000,000 usecs (1s) the hwlat detector will spin for 500,000 usecs(0.5s). If tracing_thresh contains zero when hwlat tracer is enabled, it willchange to a default of 10 usecs. If any latencies that exceed the threshold isobserved then the data will be written to the tracing ring buffer.

The minimum sleep time between periods is 1 millisecond. Even if widthis less than 1 millisecond apart from window, to allow the system to notbe totally starved.

If tracing_thresh was zero when hwlat detector was started, it will be setback to zero if another tracer is loaded. Note, the last value intracing_thresh that hwlat detector had will be saved and this value willbe restored in tracing_thresh if it is still zero when hwlat detector isstarted again.

The following tracing directory files are used by the hwlat_detector:

in /sys/kernel/tracing:

  • tracing_threshold - minimum latency value to be considered (usecs)

  • tracing_max_latency - maximum hardware latency actually observed (usecs)

  • tracing_cpumask - the CPUs to move the hwlat thread across

  • hwlat_detector/width - specified amount of time to spin within window (usecs)

  • hwlat_detector/window - amount of time between (width) runs (usecs)

  • hwlat_detector/mode - the thread mode

By default, one hwlat detector’s kernel thread will migrate across each CPUspecified in cpumask at the beginning of a new window, in a round-robinfashion. This behavior can be changed by changing the thread mode,the available options are:

  • none: do not force migration

  • round-robin: migrate across each CPU specified in cpumask [default]

  • per-cpu: create one thread for each cpu in tracing_cpumask