DTL (Dispatch Trace Log)

Athira Rajeev, 19 April 2025

Basic overview

The pseries Shared Processor Logical Partition(SPLPAR) machines canretrieve a log of dispatch and preempt events from the hypervisorusing data from Disptach Trace Log(DTL) buffer. With this information,user can retrieve when and why each dispatch & preempt has occurred.The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL countersvia perf.

Infrastructure used

The VPA DTL PMU counters do not interrupt on overflow or generate anyPMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timernterval can be provided by user via sample_period field in nano seconds.vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (DispatchTrace Log) contains information about dispatch/preempt, enqueue time etc.We directly copy the DTL buffer data as part of auxiliary buffer and itwill be processed later. This will avoid time taken to create samplesin the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)entries makes use of AUX support in perf infrastructure. On the tools side,this data is made available as PERF_RECORD_AUXTRACE records.

To correlate each DTL entry with other events across CPU’s, an auxtrace_queueis created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.All auxtrace queues is maintained in auxtrace heap. The queues are sortedbased on timestamp. When the different PERF_RECORD_XX records are processed,compare the timestamp of perf record with timestamp of top element in theauxtrace heap so that DTL events can be co-related with other eventsProcess the auxtrace queue if the timestamp of element from heap islower than timestamp from entry in perf record. Sometimes it could happen thatone buffer is only partially processed. if the timestamp of occurrence ofanother event is more than currently processed element in the queue, it willmove on to next perf record. So keep track of position of buffer to continueprocessing next time. Update the timestamp of the auxtrace heap with the timestampof last processed entry from the auxtrace buffer.

This infrastructure ensures dispatch trace log entries can be correlatedand presented along with other events like sched.

vpa-dtl PMU example usage

# ls /sys/devices/vpa_dtl/eventsformatperf_event_mux_interval_mspowersubsystemtypeuevent

To capture the DTL data using perf record:.. code-block:: sh

# ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1

The result can be interpreted using perf record. Snippet of perf report -D

# ./perf report -D

There are different PERF_RECORD_XX records. In that records corresponding toauxtrace buffers includes:

  1. PERF_RECORD_AUXConveys that new data is available in AUX area

  2. PERF_RECORD_AUXTRACE_INFODescribes offset and size of auxtrace data in the buffers

  3. PERF_RECORD_AUXTRACEThis is the record that defines the auxtrace data which here in case ofvpa-dtl pmu is dispatch trace log data.

Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump

0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0.. ... VPA DTL PMU data: size 1680 bytes, entries is 35. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665

Above is representation of dtl entry of below format:

struct dtl_entry {

u8 dispatch_reason;u8 preempt_reason;u16 processor_id;u32 enqueue_to_dispatch_time;u32 ready_to_enqueue_time;u32 waiting_to_ready_time;u64 timebase;u64 fault_addr;u64 srr0;u64 srr1;

};

First two fields represent the dispatch reason and preempt reason. The postprocessing of PERF_RECORD_AUXTRACE records will translate to meaningful datafor user to consume.

Visualize the dispatch trace log entries with perf report

# ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1[perfrecord:Wokenup1timestowritedata][perfrecord:Capturedandwrote0.300MBperf.data]# ./perf report# Samples: 321  of event 'vpa-dtl'# Event count (approx.): 321## Children      Self  Command  Shared Object      Symbol# ........  ........  .......  .................  ..............................#100.00%100.00%swapper[kernel.kallsyms][k]plpar_hcall_norets_notrace

Visualize the dispatch trace log entries with perf script

# ./perf scriptmigration/967[009]105373.359903:sched:sched_waking:comm=perfpid=13418prio=120target_cpu=009migration/967[009]105373.359904:sched:sched_migrate_task:comm=perfpid=13418prio=120orig_cpu=9dest_cpu=10migration/967[009]105373.359907:sched:sched_stat_runtime:comm=migration/9pid=67runtime=4050[ns]migration/967[009]105373.359908:sched:sched_switch:prev_comm=migration/9prev_pid=67prev_prio=0prev_state=S==>next_comm=swapper/9next_pid=0next_prio=120:256256[016]105373.359913:vpa-dtl:timebase:21403600706628832dispatch_reason:decrementerinterrupt,preempt_reason:H_CEDE,enqueue_to_dispatch_time:4854,ready_to_enqueue_time:139,waiting_to_ready_time:511842115c0000000000fcd28plpar_hcall_norets_notrace+0x18([kernel.kallsyms]):256256[017]105373.360012:vpa-dtl:timebase:21403600706679454dispatch_reason:privdoorbell,preempt_reason:H_CEDE,enqueue_to_dispatch_time:236,ready_to_enqueue_time:0,waiting_to_ready_time:133864583c0000000000fcd28plpar_hcall_norets_notrace+0x18([kernel.kallsyms])perf13418[010]105373.360048:sched:sched_stat_runtime:comm=perfpid=13418runtime=139748[ns]perf13418[010]105373.360052:sched:sched_waking:comm=migration/10pid=72prio=0target_cpu=010