- Notifications
You must be signed in to change notification settings - Fork14
Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.
License
jmuehlig/perf-cpp
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Quick Start |How to Build |Documentation |System Requirements
perf-cpp embeds Linux's hardware performance monitoring directly into your code, letting you profile exactly what matters and process the results in your application.Tools likeLinux Perf,Intel® VTune™, andAMD uProf are powerful but monitor entire programs – and high-performance applications need surgical precision.
Built around Linux's powerfulperf subsystem,perf-cpp provides a clean interface forcounting andsampling hardware events – without the complexity of low-level APIs.
- Measure exactly what you want – utilizeperformance counters to count hardware events, similar to
perf stat, but around specific code paths, not an entire binary (documentation). - Calculate metrics such ascycles per instruction andcache miss to access ratio based on hardware events and timing (documentation).
- Low-latency performance counters access without starting/stopping the counters, for micro-benchmarks or adaptive tuning (documentation).
- Record instruction and memory samples, just like
perf [mem] record– but from inside your application (documentation). - Correlate samples with data structures and symbols to generateper-class access statistics andflame graphs.
- Mix built-in events (e.g.,cycles,instructions,cache misses, ...) with processor-specific counters (documentation).
See variouspractical examples and thedocumentation for more details.
Recording hardware event statistics operates much likeperf stat: it quantifies critical events–such as executedinstructions, CPUcycles, andcache misses–throughout a code segment's execution.
#include<perfcpp/event_counter.h>/// Initialize the counterauto event_counter = perf::EventCounter{};/// Specify hardware events to countevent_counter.add({"seconds","instructions","cycles","cache-misses"});/// Run the workloadevent_counter.start();code_to_profile();/// <-- Statistics recorded while executionevent_counter.stop();/// Print the result to the consoleconstauto result = event_counter.result();for (constauto [event_name, value] : result){ std::cout << event_name <<":" << value << std::endl;}
Possible output:
seconds: 0.0955897 instructions: 5.92087e+07cycles: 4.70254e+08cache-misses: 1.35633e+07Note
For additional insights please refer to the guides onrecording event statistics andevent statistics on multiple CPUs/threads.Also, check out thehardware events documentation for details on both built-in and processor-specific events.
Recording samples functions much likeperf [mem] record: it captures execution snapshots, e.g., theinstruction pointer, executingCPU, andtimestamp, at regular intervals (here every50,000th CPU cycle).
#include<perfcpp/sampler.h>/// Create the samplerauto sampler = perf::Sampler{};/// Specify when a sample is recorded: every 50,000th cyclesampler.trigger("cycles", perf::Period{50000U});/// Specify what data is included into a sample: time, CPU ID, instructionsampler.values() .timestamp(true) .cpu_id(true) .instruction_pointer(true);/// Run the workloadsampler.start();code_to_profile();/// <-- Samples recorded while executionsampler.stop();/// Print the samples to the consoleconstauto samples = sampler.result();for (constauto& record : samples){constauto timestamp = record.metadata().timestamp().value();constauto cpu_id = record.metadata().cpu_id().value();constauto instruction = record.instruction_execution().logical_instruction_pointer().value(); std::cout <<"Time =" << timestamp <<" | CPU =" << cpu_id <<" | Instruction = 0x" << std::hex << instruction << std::dec << std::endl;}
Possible output:
Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075cTime = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75cTime = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075cTime = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75cNote
For additional details–such as the types of data that can be included in samples–please consult thesampling guide.Additionally, consult thesampling on multiple CPUs/threads guide for instructions on parallel sampling.
We include a collection ofexamples demonstrating the functionality and interface ofperf-cpp in theexamples/ directory, including
- examples for counting hardware events (
examples/statistics) - and for sampling (
examples/sampling).
perf-cpp is designed as a library (static or shared) that can be linked to your application.
# Clone the repositorygit clone https://github.com/jmuehlig/perf-cpp.git# Switch to the repository foldercd perf-cpp# Optional: Switch to this development versiongit checkout v0.12.4# Build the library (in build/)# -DBUILD_EXAMPLES=1 compiles all examples (optional)# -DBUILD_LIB_SHARED=1 creates the library as a shared one (optional)# -DGEN_PROCESSOR_EVENTS=1 generates and compiles a .cpp file that adds events specific to the underlying CPU (optional)cmake. -B build -DBUILD_EXAMPLES=1cmake --build build# Optional: Build examples (in build/examples/bin) if -DBUILD_EXAMPLES=1cmake --build build --target examples
Note
Further information and detailed building instructions (e.g., how to integrate intoCMake projects) are available in thebuilding guide.
- Building: Integrateperf-cpp seamlessly into your C++ projects.
- Counting Performance Events
- Basics: Master recording hardware event statistics directly within your application.
- Parallel and Multithreaded: Learn how to monitor events across threads and CPU cores.
- Metrics: Learn how to combine hardware events into meaningful metrics for clearer performance insights.
- Live Access: See how events can be accessed without stopping the recording, ideal for profiling tight loops and small functions.
- Recording Samples
- Basics: Understand sampling mechanisms, which data to record, and how to access the results.
- Parallel and Multithreaded: Learn how to record samples in multithreaded workloads.
- Use the Linux Perf Tool to Analyze Recorded Samples: See how samples recorded viaperf-cpp can be analyzed with
perf [mem] report. - Translating Instruction Pointers into Symbols and Samples into flame graphs: See how to translate instruction pointers into function names and prepare sampling results to transform them into flame graphs (e.g., usingFlameGraph).
- Analyzing Memory Access Patterns: See how to link memory sampling data to specific data objects to profile detailed memory access characteristics.
- Built-in and Hardware-specific Events: Discover built-in events and learn how to define new ones tailored to your hardware.
- Perf Paranoid: Learn how to configure perf permissions.
- Examples: Learn how to set up different features from code-examples.
- Changelog: Stay updated with the latest changes and improvements.
- Clang /GCC with support forC++17 features.
- CMake version3.10 or higher.
- Linux Kernel4.0 or newer (note that some features need a newer Kernel).
perf_event_paranoidsetting: Adjust as needed to allow access to performance counters (see theParanoid Value documentation).- Python3, if you make use ofprocessor-specific hardware event generation.
We welcome contributions and feedback.For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.
Alternatively, you can email me:jan.muehlig@tu-dortmund.de.
Below is a non-exhaustive list of some other valuable profiling projects:
- PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
- Likwid is a collection of several command line tools for benchmarking, including an extensivewiki.
- PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
- Intel'sInstrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction withIntel VTune Profiler.
- For those who prefer a more hands-on approach, theperf_event_open system call can be utilized directly without any wrappers.
This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).
- Quantitative Evaluation of Intel PEBS Overhead for Online System-Noise Analysis (2017)
- Analyzing memory accesses with modern processors (2020)
- Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative Comparison (2023)
- Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE (2024)
- Breaking the Cycle - A Short Overview of Memory-Access Sampling Differences on Modern x86 CPUs (2025)
About
Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.