Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Hardware performance counter

From Wikipedia, the free encyclopedia
Registers that count hardware-related activities

Incomputers,hardware performance counters (HPCs),[1] orhardware counters are a set of special-purposeregisters built into modernmicroprocessors to store the counts of hardware-related activities. Advanced users often rely on those counters to conduct low-levelperformance analysis ortuning.

Implementations

[edit]

The number of available hardware counters in a processor is limited while eachCPU model might have a lot of different events that a developer might like to measure. Each counter can be programmed with the index of an event type to be monitored, like a L1 cache miss or a branch misprediction.

One of the first processors to implement a hardware counter and an associated instruction to access it (theRDPMC instruction) was theIntel Pentium, but they were not documented untilTerje Mathisen wrote an article about reverse engineering them inByte July 1994.[2]

The following table shows some examples of CPUs and the number of available hardware counters:

Processoravailable HW counters
UltraSparc II2
Pentium III2
ARM112
AMD Athlon4
IA-644
ARM Cortex-A52[3]
ARM Cortex-A84
ARM Cortex-A9 MPCore6
POWER48
Pentium 418

Versus software techniques

[edit]

Compared to softwareprofilers, hardware counters provide low-overhead access to a wealth of detailed performance information related to CPU's functional units, caches and main memory etc. Another benefit of using them is that no source code modifications are needed in general. However, the types and meanings of hardware counters vary from one kind of architecture to another due to the variation in hardware organizations.

There can be difficulties correlating the low level performance metrics back to source code. The limited number of registers to store the counters often force users to conduct multiple measurements to collect all desired performance metrics.

Instruction based sampling

[edit]
Output of an IBS profile fromCodeAnalyst.

Modernsuperscalar processors schedule and execute multiple instructionsout-of-order at one time. These "in-flight" instructions can retire at any time, depending on memory access, hits in cache, stalls in the pipeline and many other factors. This can cause performance counter events to be attributed to the wrong instructions, making precise performance analysis difficult or impossible.

AMD introduced methods to mitigate some of these drawbacks. For example, the Opteron processors have implemented[4] in 2007 a technique known as Instruction Based Sampling (IBS). AMD's implementation of IBS provides hardware counters for both fetch sampling (the front of the superscalar pipeline) and op sampling (the back of the pipeline). This results in discrete performance data associating retired instructions with the "parent" AMD64 instruction.

See also

[edit]

References

[edit]
  1. ^Malone, Corey; Zahran, Mohamed; Karri, Ramesh (2011)."Are hardware performance counters a cost effective way for integrity checking of programs"(PDF).Proceedings of the sixth ACM workshop on Scalable trusted computing. pp. 71–76.doi:10.1145/2046582.2046596.ISBN 9781450310017.S2CID 16409864. Retrieved17 November 2022.
  2. ^"Pentium Secrets". Gamedev.net. Retrieved2012-02-14.
  3. ^"Documentation – Arm Developer".developer.arm.com.
  4. ^"Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors"(PDF).AMD. Retrieved2015-10-16.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Hardware_performance_counter&oldid=1296959121"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp