Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Simple wrapper for PAPI most common used functions

License

NotificationsYou must be signed in to change notification settings

UDC-GAC/papi_wrapper

Repository files navigation

codecovMIT License

Simple wrapper for PAPI most common used functions: set up events, start, stopand print counters within a region of interest. Besides, within that region candefine many subregions in order to count them separately. Thus, this librarysimplifies setting up low-level features of PAPI such as domain, granularity oroverflow.

PAPI's low-level API allows programmers to program different hardware counterswhile executing a program. However, programming those counters may introduce alot of complex code onto the original program. Besides, configuring properlyPAPI may be tedious. For these reasons we have created a set of macros in orderto simplify the problem, while configuring PAPI at compilation time: eitherusing flags or files.

This interface works for multithreaded programs using OpenMP; actually this wasthe main reason for developing this interface. SeeUsage andOptions sections for further details.

Usage

It is only needed to rewrite our code as (<papi.h> header files are alreadyincluded inpapi_wrapper ):

#include <papi_wrapper.h>...pw_init_instruments; /* initialize counters */pw_start_instruments; /* starts automatically PAPI counters *//* region of interest (ROI) to measure */pw_stop_instruments; /* stop counting */pw_print_instruments; /* print results */

This way, it is only needed to add the header#include <papi_wrapper.h> tothe source code and compile with-I/source/to/papi-wrapper papi_wrapper.c -lpapi. Another way to use this wrapper library, using subregions, could be:

#include <papi_wrapper.h>...pw_init_start_instruments_sub(2); /* initialize counters *//* region of interest (ROI) to measure */pw_begin_subregion(0);/* subregion 0 */pw_end_subregion(0);pw_begin_subregion(1);/* subregion 1 */pw_end_subregion(1);pw_stop_instruments; /* stop counting */pw_print_subregions; /* print results */

TL; DR: macros available inpapi_wrapper :

  • pw_set_thread_report : set thread to measure when single thread.
  • pw_init_instruments : init PAPI and flush caches.
  • pw_init_start_instruments : wrapper forpw_init_instruments andpw_start_instruments
  • pw_init_start_instruments_sub(n) : init library and set number of regionsto measure
  • pw_start_instruments : start counting.
  • pw_stop_instruments : stop counting.
  • pw_start_instruments_loop(n) : to use within a parallel loop, e.g.#pragma omp parallel for .
  • pw_stop_instruments_loop(n) : to use within a parallel loop, e.g.#pragma omp parallel for .
  • pw_begin_subregion(n) : start measuringn region.
  • pw_end_subregion(n) : stop measuringn region.
  • pw_print_instruments : print counters.
  • pw_print_subregions : print counters by subregion measured.

For more examples, refer totests subdirectory. They can be executed withCTest .

Dependencies

Options

In order to differenciate PAPI macros and PAPI_wrapper macros, as mnemonicsall PAPI_wrapper options begin with the prefixPW_ .

High-level configuration parameters:

  • -DPW_THREAD_MONITOR - default value0 . Indicates the master thread ifPW_MULTITHREAD also enabled.
  • -DPW_MULTITHREAD - disabled by default. If not defined, onlyPW_THREAD_MONITOR will count events (only one thread). This option is notcompatible when using uncore events, it basically makes the PAPI librarycrash. Need to be compiled with-fopenmp .
  • -DPW_VERBOSE - disabled by default. More text in the output and errors.
  • -DPW_CSV - disabled by default. Print in CSV format using comma(-DPW_CSV_SEPARATOR="," ) as divider where first row contains the thread numberand the names of the hardware counters used, containing the following rowseach thread and its counter values.
  • -DPW_FILE - print output to file specified by-DPW_FILENAME=<file> (default to/tmp/__tmp_papi_wrapper.output), instead of standard output.

Low-level configuration parameters (refer toPAPIfor further information):

  • -DPW_GRN=<granularity> - default valuePAPI_GRN_MIN .
  • -DPW_DOM=<domain> - default valuePAPI_DOM_ALL .
  • -DPW_SAMPLING - disabled by default. Enables sampling for all the eventsspecified inPW_FLIST with thresholds specified inPW_FSAMPLE .

Configuration files (see their format incircleci/circleci-docs/tree/teesloane-patch-5

Implementation details

PAPI wrapper may be precompiled and linked to your executable or compileddirectly with your sources. PAPI wrapper basically initializes a PAPI event setfor each counter specified in thePW_FLIST . If multithread is enabled, thenall threads will count events individually and simultaneously, but one counterat a time. Multiplexing is a experimental feature that should be avoided inPAPI, since inPAPI's discussions there is some skepticism regarding itsreliability.

Overhead

When talking about the overhead of a library, we can think about the overheadof using it regarding execution time or memory. In any case, overheads:

  • Costs of the PAPI library: initializing the library, starting counters,stoping them. For further details refer topapi_cost utility.
  • Costs of PAPI wrapper: iterating over the list of events to measure. Eachcounter is measured individually, so there is no concurrency at all when itcomes to measure different events. There is only concurrency when measuringdifferent thread: they are all measured at the same time

Known issues

List of known issues when testing:

  • Do not mix uncore and not uncore events in the list: undefined behavior.
  • Uncore events must explicitly have the:cpu=X flag on them.
  • With the major version 1.0.0, PAPI wrapper introduces subregions, whichbasically permits measuring different regions of code simoultaneously andindividually. Nonetheless, if the region or subregion measured has an order ofmagnitude lower than the proper PAPI library cost (refer toOverhead), the result will have too much noise.

Versions and changelog

Refer toReleaseswebpage. Versioning followsSemantic Versioning2.0.0.

Contact

Maintainer:

  • Marcos Horro (marcos.horro (at) udc.gal)

Authors:

  • Marcos Horro
  • Dr. Gabriel Rodríguez

Credits

This version is based onPolyBench, under GPLv2 license.

License

MIT License.


[8]ページ先頭

©2009-2025 Movatter.jp