Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
/tccgPublic

Tensor Contraction Code Generator

NotificationsYou must be signed in to change notification settings

HPAC/tccg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Tensor Contraction Code Generator (TCCG) generates high-performance (parallel and) vectorized C code for tensor contractions.

From a computational perspective, tensorscan be interpreted as higher dimensional matrices or simply asmultidimensional arrays; likewise, tensor contractionsare a generalization of the matrix-matrix multiplication to higherdimensions. For instance, A[i,k], B[k,j] and C[i,j] denote two-dimensionaltensors (i.e., matrices) and C[i,j] = A[i,k] * B[k,j] represents a tensorcontraction where the sum over 'k' as well as the loops over 'i' and 'j' areimplicit. Further examples of tensor contractions are: C[i0,j0,j1] = A[i0,k0] * B[j1,k0,j0];C[i0,j0,j1,i1] = A[i0,k0,i1] * B[j1,k0,j0]; C[i0,j0,j1,i1] = A[k0,i0,k1,i1] * B[k1,j1,k0,j0] ...

Current version:v0.1.2

Key Features


  • TCCG generates high-performance vectorized C code
  • TCCG generates code based on three different approaches:
    • GEMM-like Tensor-Tensor Multiplication (GETT): This novel approach to tensor contractions is at the core of our latest publication (see below).
    • Transpose-Transpose-GEMM-Transpose (TTGT)
    • Loops-over-GEMM (LoG)
  • Shared-memory parallelism
    • TTGT, LoG, GETT
  • Support for single- and double-precision
  • Auto-Fine-Tuning:
    • Automatically explores a search space of promising implementation candidates
    • The fastest candidate will be selected and returned automatically
    • A performance model guides the search
    • The search space can be limited by the user (via the --maxImplementations=N command line argument)
  • Support for multiple instruction sets:
    • AVX2: GETT, TTGT, LoG
    • AVX512: GETT, TTGT, LoG (experimental)
    • CUDA: TTGT, LoG

Advantages of GETT


GETT's advantages are manifold:* GETT-based code isfully vectorized andexploits the cache hierarchy.* Sub-tensors are packed into the caches as needed. Thus, GETT avoids the explicit transposition overhead incurred by TTGT.* Thestride-one index is preserved while packing the sub-tensors into a specified level of the cache hierarchy.*No additional workspace is required (except for small buffers which fit into the caches).* Thearithmetic intensity is retained for any given tensor contraction.

While GETT exhibits excellent performance across a wide range of tensor contractions, its performance for bandwidth-bound tensor contractions is especially outstanding.

For further information, please see our(paper).

Requirements


In order to use TCCG, a working C compiler and some BLAS library (e.g., Intel's MKL) as well as theHigh-Perfromance Tensor Transposition library (HPTT) are required:

  • Intel's ICC (>= v15.0, recommended) or g++ (>= v4.8, experimental)
  • Some BLAS library (e.g.,BLIS,ATLAS)
  • High-Performance Tensor Transposition (HPTT) library
  • Python (tested with v2.7.5 and v2.7.9)
  • Tensor Contraction LibraryTCL (OPTIONAL)

Install


  1. Clone the repository into a desired directory and change to that location:

    git clonehttps://github.com/HPAC/tccg.gitcd tccg

  2. Install TCCG:

    python setup.py install --user

  3. Export the TCCG_ROOT environment variable (add to your .bashrc):

    export TCCG_ROOT=pwd

  4. Setup the your BLAS library within the $TCCG_ROOT/config.cfg (default: mkl).

  5. You might have to add the installed location to your PATH environment variable:

    export PATH=$PATH:~/.local/bin

Getting Started


Please runtccg --help to get an overview of TCCG's parameters.

Here is an exemplary input file to TCCG:

C[a,b,i,j] = A[i,m,a] * B[m,j,b]a = 24b = 24i = 24j = 24m = 24

TCCG command line arguments:

tccg --arch=avx2 --numThreads=1 --floatType=s example.tccg

Further exmaples (.tccg files) can be generated via:

python bechmark/benchmark.py

Benchmark


TCCG provides abenchmark for tensor contractions.

python benchmark.py

This will generate the input files (.tccg) for TCCG for each of the test-cases within the benchmark.The tensor contractions within the benchmark are collected from four different publications to cover a broad range of use cases (see paper, Sec. 7.1); this being said, we don't claim that this benchmark is exhaustive in any sense.If you think that the benchmark is missing certain tensor contractions or sizes, please feel free to contribute to the benchmark.

Since this benchmark may evolve over time and to make comparisons easier, please refer to the current version of the benchmark.

Benchmark version:v0.1

Current Limitations of GETT


The product of the sizes corresponding to the free indices of each input tensor needs to be amultiple of 24. This limitation will be lifted in a future version of GETT.

Citation


In case you want to refer to TCCG as part of a research paper, please cite the followingarticle(pdf):

@article{tccg2016a,   author      = {Paul Springer and Paolo Bientinesi},   title       = {{Design of a high-performance GEMM-like Tensor-Tensor Multiplication}},   archivePrefix = "arXiv",   eprint = {1607.00145},   primaryClass = "quant-ph",   journal     = {CoRR},   year        = {2016},   issue_date  = {July 2016},   url         = {http://arxiv.org/abs/1607.00145}}

Changelog


V0.2.0:

Feedback & Contributions


We are happy for any feedback or feature requests. Please contactspringer@aices.rwth-aachen.de.

We also welcome any contributions to the code base or the benchmark.

About

Tensor Contraction Code Generator

Resources

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp