Perf

Perf Event Attributes

Author:

Andrew Murray <andrew.murray@arm.com>

Date:

2019-03-06

exclude_user

This attribute excludes userspace.

Userspace always runs at EL0 and thus this attribute will exclude EL0.

exclude_kernel

This attribute excludes the kernel.

The kernel runs at EL2 with VHE and EL1 without. Guest kernels always runat EL1.

For the host this attribute will exclude EL1 and additionally EL2 on a VHEsystem.

For the guest this attribute will exclude EL1. Please note that EL2 isnever counted within a guest.

exclude_hv

This attribute excludes the hypervisor.

For a VHE host this attribute is ignored as we consider the host kernel tobe the hypervisor.

For a non-VHE host this attribute will exclude EL2 as we consider thehypervisor to be any code that runs at EL2 which is predominantly used forguest/host transitions.

For the guest this attribute has no effect. Please note that EL2 isnever counted within a guest.

exclude_host / exclude_guest

These attributes exclude the KVM host and guest, respectively.

The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHEkernel or non-VHE hypervisor).

The KVM guest may run at EL0 (userspace) and EL1 (kernel).

Due to the overlapping exception levels between host and guests we cannotexclusively rely on the PMU’s hardware exception filtering - therefore wemust enable/disable counting on the entry and exit to the guest. This isperformed differently on VHE and non-VHE systems.

For non-VHE systems we exclude EL2 for exclude_host - upon entering andexiting the guest we disable/enable the event as appropriate based on theexclude_host and exclude_guest attributes.

For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2for exclude_host. Upon entering and exiting the guest we modify the eventto include/exclude EL0 as appropriate based on the exclude_host andexclude_guest attributes.

The statements above also apply when these attributes are used within anon-VHE guest however please note that EL2 is never counted within a guest.

Accuracy

On non-VHE hosts we enable/disable counters on the entry/exit of host/guesttransition at EL2 - however there is a period of time betweenenabling/disabling the counters and entering/exiting the guest. We areable to eliminate counters counting host events on the boundaries of guestentry/exit when counting guest events by filtering out EL2 forexclude_host. However when using !exclude_hv there is a small blackoutwindow at the guest entry/exit where host events are not captured.

On VHE systems there are no blackout windows.

Perf Userspace PMU Hardware Counter Access

Overview

The perf userspace tool relies on the PMU to monitor events. It offers anabstraction layer over the hardware counters since the underlyingimplementation is cpu-dependent.Arm64 allows userspace tools to have access to the registers storing thehardware counters’ values directly.

This targets specifically self-monitoring tasks in order to reduce the overheadby directly accessing the registers without having to go through the kernel.

How-to

The focus is set on the armv8 PMUv3 which makes sure that the access to the pmuregisters is enabled and that the userspace has access to the relevantinformation in order to use them.

In order to have access to the hardware counters, the global sysctlkernel/perf_user_access must first be enabled:

echo1>/proc/sys/kernel/perf_user_access

It is necessary to open the event using the perf tool interface with config1:1attr bit set: the sys_perf_event_open syscall returns a fd which cansubsequently be used with the mmap syscall in order to retrieve a page of memorycontaining information about the event. The PMU driver uses this page to exposeto the user the hardware counter’s index and other necessary data. Using thisindex enables the user to access the PMU registers using themrs instruction.Access to the PMU registers is only valid while the sequence lock is unchanged.In particular, the PMSELR_EL0 register is zeroed each time the sequence lock ischanged.

The userspace access is supported in libperf using theperf_evsel__mmap()andperf_evsel__read() functions. Seetools/lib/perf/tests/test-evsel.c foran example.

About heterogeneous systems

On heterogeneous systems such as big.LITTLE, userspace PMU counter access canonly be enabled when the tasks are pinned to a homogeneous subset of cores andthe corresponding PMU instance is opened by specifying the ‘type’ attribute.The use of generic event types is not supported in this case.

Have a look attools/perf/arch/arm64/tests/user-events.c for an example. Itcan be run using the perf tool to check that the access to the registers workscorrectly from userspace:

perftest-vuser

About chained events and counter sizes

The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1)counter along with userspace access. The sys_perf_event_open syscall will failif a 64-bit counter is requested and the hardware doesn’t support 64-bitcounters. Chained events are not supported in conjunction with userspace counteraccess. If a 32-bit counter is requested on hardware with 64-bit counters, thenuserspace must treat the upper 32-bits read from the counter as UNKNOWN. The‘pmc_width’ field in the user page will indicate the valid width of the counterand should be used to mask the upper bits as needed.

Event Counting Threshold

Overview

FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only onevents whose count meets a specified threshold condition. For example ifthreshold_compare is set to 2 (‘Greater than or equal’), and thethreshold is set to 2, then the PMU counter will now only increment bywhen an event would have previously incremented the PMU counter by 2 ormore on a single processor cycle.

To increment by 1 after passing the threshold condition instead of thenumber of events on that cycle, add the ‘threshold_count’ option to thecommandline.

How-to

These are the parameters for controlling the feature:

Parameter

Description

threshold

Value to threshold the event by. A value of 0 means thatthresholding is disabled and the other parameters have no effect.

threshold_compare

Comparison function to use, with the following values supported:

0: Not-equal
1: Equals
2: Greater-than-or-equal
3: Less-than

threshold_count

If this is set, count by 1 after passing the threshold conditioninstead of the value of the event on this cycle.

The threshold, threshold_compare and threshold_count values can beprovided per event, for example:

perfstat-estall_slot/threshold=2,threshold_compare=2/\-edtlb_walk/threshold=10,threshold_compare=3,threshold_count/

In this example the stall_slot event will count by 2 or more on everycycle where 2 or more stalls happen. And dtlb_walk will count by 1 onevery cycle where the number of dtlb walks were less than 10.

The maximum supported threshold value can be read from the caps of eachPMU, for example:

cat/sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max0x000000ff

If a value higher than this is given, then opening the event will resultin an error. The highest possible maximum is 4095, as the config fieldfor threshold is limited to 12 bits, and the Perf tool will refuse toparse higher values.

If the PMU doesn’t support FEAT_PMUv3_TH, then threshold_max will read0, and attempting to set a threshold value will also result in an error.threshold_max will also read as 0 on aarch32 guests, even if the hostis running on hardware with the feature.