IMC (In-Memory Collection Counters)¶
Anju T Sudhakar, 10 May 2019
Contents
Basic overview¶
IMC (In-Memory collection counters) is a hardware monitoring facility thatcollects large numbers of hardware performance events at Nest level (these areon-chip but off-core), Core level and Thread level.
The Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC(On-Chip Controller) complex. The microcode collects the counter data and movesthe nest IMC counter data to memory.
The Core and Thread IMC PMU counters are handled in the core. Core level PMUcounters give us the IMC counters’ data per core and thread level PMU countersgive us the IMC counters’ data per CPU thread.
OPAL obtains the IMC PMU and supported events information from the IMC Catalogand passes on to the kernel via the device tree. The event’s informationcontains:
- Event name
- Event Offset
- Event description
and possibly also:
- Event scale
- Event unit
Some PMUs may have a common scale and unit values for all their supportedevents. For those cases, the scale and unit properties for those events must beinherited from the PMU.
The event offset in the memory is where the counter data gets accumulated.
- IMC catalog is available at:
- https://github.com/open-power/ima-catalog
The kernel discovers the IMC counters information in the device tree at theimc-counters device node which has a compatible fieldibm,opal-in-memory-counters. From the device tree, the kernel parses the PMUsand their event’s information and register the PMU and its attributes in thekernel.
IMC example usage¶
# perf list[...]nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/[Kernel PMU event]nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/[Kernel PMU event][...]core_imc/CPM_0THRD_NON_IDLE_PCYC/[Kernel PMU event]core_imc/CPM_1THRD_NON_IDLE_INST/[Kernel PMU event][...]thread_imc/CPM_0THRD_NON_IDLE_PCYC/[Kernel PMU event]thread_imc/CPM_1THRD_NON_IDLE_INST/[Kernel PMU event]
To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/:
# ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socketTo see non-idle instructions for core 0:
# ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000To see non-idle instructions for a “make”:
# ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" makeIMC Trace-mode¶
POWER9 supports two modes for IMC which are the Accumulation mode and Tracemode. In Accumulation mode, event counts are accumulated in system Memory.Hypervisor then reads the posted counts periodically or when requested. In IMCTrace mode, the 64 bit trace SCOM value is initialized with the eventinformation. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the eventto be monitored and the sampling duration. On each overflow in the CPMCxSEL,hardware snapshots the program counter along with event counts and writes intomemory pointed by LDBAR.
LDBAR is a 64 bit special purpose per thread register, it has bits to indicatewhether hardware is configured for accumulation or trace mode.
LDBAR Register Layout¶
0 Enable/Disable 1 0: Accumulation Mode 1: Trace Mode 2:3 Reserved 4-6 PB scope 7 Reserved 8:50 Counter Address 51:63 Reserved
TRACE_IMC_SCOM bit representation¶
0:1 SAMPSEL 2:33 CPMC_LOAD 34:40 CPMC1SEL 41:47 CPMC2SEL 48:50 BUFFERSIZE 51:63 RESERVED
CPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines theevent to count. BUFFERSIZE indicates the memory range. On each overflow,hardware snapshots the program counter along with event counts and updates thememory and reloads the CMPC_LOAD value for the next sampling duration. IMChardware does not support exceptions, so it quietly wraps around if memorybuffer reaches the end.
Currently the event monitored for trace-mode is fixed as cycle.
Trace IMC example usage¶
# perf list[....]trace_imc/trace_cycles/[Kernel PMU event]
To record an application/process with trace-imc event:
# perf record -e trace_imc/trace_cycles/ yes > /dev/null[ perf record: Woken up1times to write data][ perf record: Captured and wrote0.012 MB perf.data(21 samples)]
Theperf.data generated, can be read using perf report.
Benefits of using IMC trace-mode¶
PMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMCtrace mode snapshots the program counter and updates to the memory. And thisalso provide a way for the operating system to do instruction sampling in realtime without PMI processing overhead.
Performance data usingperf top with and without trace-imc event.
PMI interrupts count whenperf top command is executed without trace-imc event.
# grep PMI /proc/interruptsPMI:0000 Performance monitoring interrupts# ./perf top...# grep PMI /proc/interruptsPMI:3973587101733817801 Performance monitoring interrupts# ./perf top -e trace_imc/trace_cycles/...# grep PMI /proc/interruptsPMI:3973587101733817801 Performance monitoring interrupts
That is, the PMI interrupt counts do not increment when using thetrace_imc event.