Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.

License

NotificationsYou must be signed in to change notification settings

jmuehlig/perf-cpp

Repository files navigation

LGPL-3.0LinuxKernel->=4.0C++17Ask DeepWiki

Quick Start |How to Build |Documentation |System Requirements

perf-cpp embeds Linux's hardware performance monitoring directly into your code, letting you profile exactly what matters and process the results in your application.Tools likeLinux Perf,Intel® VTune™, andAMD uProf are powerful but monitor entire programs – and high-performance applications need surgical precision.

What can perf-cpp do?

Built around Linux's powerfulperf subsystem,perf-cpp provides a clean interface forcounting andsampling hardware events – without the complexity of low-level APIs.

  • Measure exactly what you want – utilizeperformance counters to count hardware events, similar toperf stat, but around specific code paths, not an entire binary (documentation).
  • Calculate metrics such ascycles per instruction andcache miss to access ratio based on hardware events and timing (documentation).
  • Low-latency performance counters access without starting/stopping the counters, for micro-benchmarks or adaptive tuning (documentation).
  • Record instruction and memory samples, just likeperf [mem] record – but from inside your application (documentation).
  • Correlate samples with data structures and symbols to generateper-class access statistics andflame graphs.
  • Mix built-in events (e.g.,cycles,instructions,cache misses, ...) with processor-specific counters (documentation).

See variouspractical examples and thedocumentation for more details.

Quick Start

Record Hardware Event Statistics

Recording hardware event statistics operates much likeperf stat: it quantifies critical events–such as executedinstructions, CPUcycles, andcache misses–throughout a code segment's execution.

#include<perfcpp/event_counter.h>/// Initialize the counterauto event_counter = perf::EventCounter{};/// Specify hardware events to countevent_counter.add({"seconds","instructions","cycles","cache-misses"});/// Run the workloadevent_counter.start();code_to_profile();/// <-- Statistics recorded while executionevent_counter.stop();/// Print the result to the consoleconstauto result = event_counter.result();for (constauto [event_name, value] : result){    std::cout << event_name <<":" << value << std::endl;}

Possible output:

seconds:      0.0955897 instructions: 5.92087e+07cycles:       4.70254e+08cache-misses: 1.35633e+07

Note

For additional insights please refer to the guides onrecording event statistics andevent statistics on multiple CPUs/threads.Also, check out thehardware events documentation for details on both built-in and processor-specific events.

Record Samples

Recording samples functions much likeperf [mem] record: it captures execution snapshots, e.g., theinstruction pointer, executingCPU, andtimestamp, at regular intervals (here every50,000th CPU cycle).

#include<perfcpp/sampler.h>/// Create the samplerauto sampler = perf::Sampler{};/// Specify when a sample is recorded: every 50,000th cyclesampler.trigger("cycles", perf::Period{50000U});/// Specify what data is included into a sample: time, CPU ID, instructionsampler.values()    .timestamp(true)    .cpu_id(true)    .instruction_pointer(true);/// Run the workloadsampler.start();code_to_profile();/// <-- Samples recorded while executionsampler.stop();/// Print the samples to the consoleconstauto samples = sampler.result();for (constauto& record : samples){constauto timestamp = record.metadata().timestamp().value();constauto cpu_id = record.metadata().cpu_id().value();constauto instruction = record.instruction_execution().logical_instruction_pointer().value();        std::cout         <<"Time =" << timestamp <<" | CPU =" << cpu_id        <<" | Instruction = 0x" << std::hex << instruction << std::dec        << std::endl;}

Possible output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075cTime = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75cTime = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075cTime = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c

Note

For additional details–such as the types of data that can be included in samples–please consult thesampling guide.Additionally, consult thesampling on multiple CPUs/threads guide for instructions on parallel sampling.

More Examples

We include a collection ofexamples demonstrating the functionality and interface ofperf-cpp in theexamples/ directory, including

  • examples for counting hardware events (examples/statistics)
  • and for sampling (examples/sampling).

Building

perf-cpp is designed as a library (static or shared) that can be linked to your application.

# Clone the repositorygit clone https://github.com/jmuehlig/perf-cpp.git# Switch to the repository foldercd perf-cpp# Optional: Switch to this development versiongit checkout v0.12.4# Build the library (in build/)# -DBUILD_EXAMPLES=1        compiles all examples (optional)# -DBUILD_LIB_SHARED=1      creates the library as a shared one (optional)# -DGEN_PROCESSOR_EVENTS=1  generates and compiles a .cpp file that adds events specific to the underlying CPU (optional)cmake. -B build -DBUILD_EXAMPLES=1cmake --build build# Optional: Build examples (in build/examples/bin) if -DBUILD_EXAMPLES=1cmake --build build --target examples

Note

Further information and detailed building instructions (e.g., how to integrate intoCMake projects) are available in thebuilding guide.

Full Documentation

Further Reading

  • Examples: Learn how to set up different features from code-examples.
  • Changelog: Stay updated with the latest changes and improvements.

System Requirements

  • Clang /GCC with support forC++17 features.
  • CMake version3.10 or higher.
  • Linux Kernel4.0 or newer (note that some features need a newer Kernel).
  • perf_event_paranoid setting: Adjust as needed to allow access to performance counters (see theParanoid Value documentation).
  • Python3, if you make use ofprocessor-specific hardware event generation.

Contribute and Contact

We welcome contributions and feedback.For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me:jan.muehlig@tu-dortmund.de.


Further PMU-related Projects

Below is a non-exhaustive list of some other valuable profiling projects:

  • PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
  • Likwid is a collection of several command line tools for benchmarking, including an extensivewiki.
  • PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
  • Intel'sInstrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction withIntel VTune Profiler.
  • For those who prefer a more hands-on approach, theperf_event_open system call can be utilized directly without any wrappers.

Resources about (Perf-) Profiling

This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).

Academical Papers

Blog Posts


[8]ページ先頭

©2009-2025 Movatter.jp