jmuehlig/perf-cppPublic

NotificationsYou must be signed in to change notification settings
Fork14
Star162

Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.

License

LGPL-3.0 license

162 stars 14 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 519 Commits
docs		docs
events/x86		events/x86
examples		examples
include/perfcpp		include/perfcpp
script		script
src		src
test		test
.clang-format		.clang-format
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

Quick Start |How to Build |Documentation |System Requirements

perf-cpp embeds Linux's hardware performance monitoring directly into your code, letting you profile exactly what matters and process the results in your application.Tools likeLinux Perf,Intel® VTune™, andAMD uProf are powerful but monitor entire programs – and high-performance applications need surgical precision.

What can perf-cpp do?

Built around Linux's powerfulperf subsystem,perf-cpp provides a clean interface forcounting andsampling hardware events – without the complexity of low-level APIs.

Measure exactly what you want – utilizeperformance counters to count hardware events, similar toperf stat, but around specific code paths, not an entire binary (documentation).
Calculate metrics such ascycles per instruction andcache miss to access ratio based on hardware events and timing (documentation).
Low-latency performance counters access without starting/stopping the counters, for micro-benchmarks or adaptive tuning (documentation).
Record instruction and memory samples, just likeperf [mem] record – but from inside your application (documentation).
Correlate samples with data structures and symbols to generateper-class access statistics andflame graphs.
Mix built-in events (e.g.,cycles,instructions,cache misses, ...) with processor-specific counters (documentation).

See variouspractical examples and thedocumentation for more details.

Quick Start

Record Hardware Event Statistics

Recording hardware event statistics operates much likeperf stat: it quantifies critical events–such as executedinstructions, CPUcycles, andcache misses–throughout a code segment's execution.

#include<perfcpp/event_counter.h>/// Initialize the counterauto event_counter = perf::EventCounter{};/// Specify hardware events to countevent_counter.add({"seconds","instructions","cycles","cache-misses"});/// Run the workloadevent_counter.start();code_to_profile();/// <-- Statistics recorded while executionevent_counter.stop();/// Print the result to the consoleconstauto result = event_counter.result();for (constauto [event_name, value] : result){    std::cout << event_name <<":" << value << std::endl;}

Possible output:

seconds:      0.0955897 instructions: 5.92087e+07cycles:       4.70254e+08cache-misses: 1.35633e+07

Note

For additional insights please refer to the guides onrecording event statistics andevent statistics on multiple CPUs/threads.Also, check out thehardware events documentation for details on both built-in and processor-specific events.

Record Samples

Recording samples functions much likeperf [mem] record: it captures execution snapshots, e.g., theinstruction pointer, executingCPU, andtimestamp, at regular intervals (here every50,000th CPU cycle).

#include<perfcpp/sampler.h>/// Create the samplerauto sampler = perf::Sampler{};/// Specify when a sample is recorded: every 50,000th cyclesampler.trigger("cycles", perf::Period{50000U});/// Specify what data is included into a sample: time, CPU ID, instructionsampler.values()    .timestamp(true)    .cpu_id(true)    .instruction_pointer(true);/// Run the workloadsampler.start();code_to_profile();/// <-- Samples recorded while executionsampler.stop();/// Print the samples to the consoleconstauto samples = sampler.result();for (constauto& record : samples){constauto timestamp = record.metadata().timestamp().value();constauto cpu_id = record.metadata().cpu_id().value();constauto instruction = record.instruction_execution().logical_instruction_pointer().value();        std::cout         <<"Time =" << timestamp <<" | CPU =" << cpu_id        <<" | Instruction = 0x" << std::hex << instruction << std::dec        << std::endl;}

Possible output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075cTime = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75cTime = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075cTime = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c

Note

For additional details–such as the types of data that can be included in samples–please consult thesampling guide.Additionally, consult thesampling on multiple CPUs/threads guide for instructions on parallel sampling.

More Examples

We include a collection ofexamples demonstrating the functionality and interface ofperf-cpp in theexamples/ directory, including

examples for counting hardware events (examples/statistics)
and for sampling (examples/sampling).

Building

perf-cpp is designed as a library (static or shared) that can be linked to your application.

# Clone the repositorygit clone https://github.com/jmuehlig/perf-cpp.git# Switch to the repository foldercd perf-cpp# Optional: Switch to this development versiongit checkout v0.12.4# Build the library (in build/)# -DBUILD_EXAMPLES=1        compiles all examples (optional)# -DBUILD_LIB_SHARED=1      creates the library as a shared one (optional)# -DGEN_PROCESSOR_EVENTS=1  generates and compiles a .cpp file that adds events specific to the underlying CPU (optional)cmake. -B build -DBUILD_EXAMPLES=1cmake --build build# Optional: Build examples (in build/examples/bin) if -DBUILD_EXAMPLES=1cmake --build build --target examples

Note

Further information and detailed building instructions (e.g., how to integrate intoCMake projects) are available in thebuilding guide.

Full Documentation

Building: Integrateperf-cpp seamlessly into your C++ projects.
Counting Performance Events
- Basics: Master recording hardware event statistics directly within your application.
- Parallel and Multithreaded: Learn how to monitor events across threads and CPU cores.
- Metrics: Learn how to combine hardware events into meaningful metrics for clearer performance insights.
- Live Access: See how events can be accessed without stopping the recording, ideal for profiling tight loops and small functions.
Recording Samples
- Basics: Understand sampling mechanisms, which data to record, and how to access the results.
- Parallel and Multithreaded: Learn how to record samples in multithreaded workloads.
- Use the Linux Perf Tool to Analyze Recorded Samples: See how samples recorded viaperf-cpp can be analyzed withperf [mem] report.
- Translating Instruction Pointers into Symbols and Samples into flame graphs: See how to translate instruction pointers into function names and prepare sampling results to transform them into flame graphs (e.g., usingFlameGraph).
- Analyzing Memory Access Patterns: See how to link memory sampling data to specific data objects to profile detailed memory access characteristics.
Built-in and Hardware-specific Events: Discover built-in events and learn how to define new ones tailored to your hardware.
Perf Paranoid: Learn how to configure perf permissions.

System Requirements

Clang /GCC with support forC++17 features.
CMake version3.10 or higher.
Linux Kernel4.0 or newer (note that some features need a newer Kernel).
perf_event_paranoid setting: Adjust as needed to allow access to performance counters (see theParanoid Value documentation).
Python3, if you make use ofprocessor-specific hardware event generation.

Contribute and Contact

We welcome contributions and feedback.For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me:jan.muehlig@tu-dortmund.de.

Further PMU-related Projects

Below is a non-exhaustive list of some other valuable profiling projects:

PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
Likwid is a collection of several command line tools for benchmarking, including an extensivewiki.
PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
Intel'sInstrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction withIntel VTune Profiler.
For those who prefer a more hands-on approach, theperf_event_open system call can be utilized directly without any wrappers.

Resources about (Perf-) Profiling

This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).

Academical Papers

Blog Posts

C2C - False Sharing Detection in Linux Perf (2016)
PMU counters and profiling basics. (2018)
Detect false sharing with Data Address Profiling. (2019)
Advanced profiling topics. PEBS and LBR. (2018)

About

Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.

Releases24

v0.12.4 Latest

Oct 28, 2025

+ 23 releases

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

What can perf-cpp do?

Quick Start

Record Hardware Event Statistics

Record Samples

More Examples

Building

Full Documentation

Further Reading

System Requirements

Contribute and Contact

Further PMU-related Projects

Resources about (Perf-) Profiling

Academical Papers

Blog Posts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases24

Packages

Contributors5

Languages

Movatterモバイル変換

License

jmuehlig/perf-cpp

Folders and files

Latest commit

History

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

What can perf-cpp do?

Quick Start

Record Hardware Event Statistics

Record Samples

More Examples

Building

Full Documentation

Further Reading

System Requirements

Contribute and Contact

Further PMU-related Projects

Resources about (Perf-) Profiling

Academical Papers

Blog Posts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases24

Packages0

Contributors5

Languages

Packages