Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
/blisPublic

BLAS-like Library Instantiation Software Framework

License

NotificationsYou must be signed in to change notification settings

flame/blis

Repository files navigation

Recipient of the2023 James H. Wilkinson Prize for Numerical Software

Recipient of the2020 SIAM Activity Group on Supercomputing Best Paper Prize

The BLIS cat is sleeping.

Build Status (CircleCI)Build Status (TravisCI)Build Status (Appveyor)

Discord logo

Contents

Introduction

BLIS is anaward-winningportable software framework for instantiating high-performanceBLAS-like dense linear algebra libraries. The framework was designed to isolateessential kernels of computation that, when optimized, immediately enableoptimized implementations of most of its commonly used and computationallyintensive operations. BLIS is written inISOC99 and available under anew/modified/3-clause BSDlicense. While BLIS exports anew BLAS-like API,it also includes a BLAS compatibility layer which gives application developersaccess to BLIS implementations via traditionalBLAS routinecalls.Anobject-based API unique to BLIS is also available.

For a thorough presentation of our framework, please read ourACM Transactions on Mathematical Software (TOMS)journal article,"BLIS: A Framework for Rapidly Instantiating BLASFunctionality".For those who just want an executive summary, please see theKey Features section below.

In a follow-up article (also inACM TOMS),"The BLIS Framework: Experiments inPortability",we investigate using BLIS to instantiate level-3 BLAS implementations on avariety of general-purpose, low-power, and multicore architectures.

An IPDPS'14 conference paper titled"Anatomy of High-Performance Many-ThreadedMatrixMultiplication"systematically explores the opportunities for parallelism within the five loopsthat BLIS exposes in its matrix multiplication algorithm.

For other papers related to BLIS, please see theCitations section below.

It is our belief that BLIS offers substantial benefits in productivity whencompared to conventional approaches to developing BLAS libraries, as well as amuch-needed refinement of the BLAS interface, and thus constitutes a majoradvance in dense linear algebra computation. While BLIS remains awork-in-progress, we are excited to continue its development and furthercultivate its use within the community.

The BLIS framework is primarily developed and maintained by individuals in theScience of High-Performance Computing(SHPC) group in theOden Institute for Computational Engineering and SciencesatThe University of Texas at Austinand in theMatthews Research GroupatSouthern Methodist University.Please visit theSHPC website for moreinformation about our research group, such as a list ofpeopleandcollaborators,funding sources,publications,andother educational projects (such as MOOCs).

Education and Learning

Want to understand what's under the hood?Many of the same concepts and principles employed when developing BLIS areintroduced and taught in a basic pedagogical setting as part ofLAFF-On Programming for High Performance (LAFF-On-PfHP),one of several massive open online courses (MOOCs) in theLinear Algebra: Foundations to Frontiers series,all of which are available for free via theedX platform.

What's New

  • Plugin feature now available! BLIS addons (see below) provided a way toquickly extend BLIS's operation support or define new custom BLIS APIs for your application.BLIS plugins extend this support to completely external code, needing only an installed BLISpackage (no source required). BLIS plugins also allow users to define their own kernelsand blocksizes, combined with the cross-architecture support provided by the BLIS framework.Finally, user plugins can utilize the new API for modifying the BLIS "control tree" whichdefines the mathematical operation to be computed, as well as information controlling packing,partitioning, etc. Users can now modify the control tree to implement new linear algebraoperations not already included in BLIS. See thedocumentation foran overview of these features and a step-by-step guides for creating plugins and modifyingthe control tree to implement an example operation "SYRKD".

  • BLIS selected for the 2023 James H. Wilkinson Prize for Numerical Software! Weare thrilled to announce that Field Van Zee and Devin Matthews were chosen to receivethe2023 James H. Wilkinson Prize for Numerical Software.The selection committee sought to recognize the recipients "for the development ofBLIS, a portable open-source software framework that facilitates rapid instantiationof high-performance BLAS and BLAS-like operations targeting modern CPUs." This prizeis awarded once every four years to the authors of an outstanding piece of numericalsoftware, or to individuals who have made an outstanding contribution to an existingpiece of numerical software. It is awarded to an entry that best addresses all phasesof the preparation of high-quality numerical software, and is intended to recognizeinnovative software in scientific computing and to encourage researchers in theearlier stages of their career. The prize will be awarded at the2023 SIAM Conference on Computational Science and Engineering in Amsterdam.

  • Join us on Discord! In 2021, we soft-launched ourDiscordserver by privately inviting current and former collaborators, attendees of our BLISRetreat, as well as other participants within the BLIS ecosystem. We've been thrilledby the results thus far, and are happy to announce that our new community is now opento the broader public! If you'd like to hang out with other BLIS users and developers,ask a question, discuss future features, or just say hello, please feel free to joinus! We've put together astep-by-step guide for creating an accountand joining our cozy enclave. We even have a monthly "BLIS happy hour" event wherepeople can casually come together for a video chat, Q&A, brainstorm session, orwhatever it happens to unfold into!

  • Addons feature now available! Have you ever wanted to quickly extend BLIS'soperation support or define new custom BLIS APIs for your application, but wereunsure of how to add your source code to BLIS? Do you want to isolate your customcode so that it only gets enabled when the user requests it? Do you likesandboxes, but wish you didn't have to provide animplementation ofgemm? If so, you should check out our newaddons feature. Addons act like optional extensions that can becreated, enabled, and combined to suit your application's needs, all withoutformally integrating your code into the core BLIS framework.

  • Multithreaded small/skinny matrix support for sgemm now available! Thanks tofunding and hardware support from Oracle, we have now acceleratedgemm forsingle-precision real matrix problems where one or two dimensions is exceedinglysmall. This work is similar to thegemm optimization announced last year.For now, we have only gathered performance results on an AMD Epyc Zen2 system, butwe hope to publish additional graphs for other architectures in the future. You mayfind these Zen2 graphs via thePerformanceSmall document.

  • BLIS awarded SIAM Activity Group on Supercomputing Best Paper Prize for 2020!We are thrilled to announce that the paper that we internally refer to as thesecond BLIS paper,

    "The BLIS Framework: Experiments in Portability." Field G. Van Zee, Tyler Smith, Bryan Marker, Tze Meng Low, Robert A. van de Geijn, Francisco Igual, Mikhail Smelyanskiy, Xianyi Zhang, Michael Kistler, Vernon Austel, John A. Gunnels, Lee Killough. ACM Transactions on Mathematical Software (TOMS), 42(2):12:1--12:19, 2016.

    was selected for theSIAM Activity Group on Supercomputing Best Paper Prizefor 2020. The prize is awarded once every two years to a paper judged to bethe most outstanding paper in the field of parallel scientific and engineeringcomputing, and has only been awarded once before (in 2016) since its inceptionin 2015 (the committee did not award the prize in 2018). The prizewas awardedat the2020 SIAM Conference on Parallel Processing for Scientific Computing in Seattle. Robert was present atthe conference to givea talk on BLIS and accept the prize alongside other coauthors.The selection committee sought to recognize the paper, "which validates BLIS,a framework relying on the notion of microkernels that enables both productivityand high performance." Their statement continues, "The framework will continuehaving an important influence on the design and the instantiation of dense linearalgebra libraries."

  • Multithreaded small/skinny matrix support for dgemm now available! Thanks tocontributions made possible by our partnership with AMD, we have dramaticallyacceleratedgemm for double-precision real matrix problems where one or twodimensions is exceedingly small. A natural byproduct of this optimization isthat the traditional case of smallm = n = k (i.e. square matrices) is alsoaccelerated, even though it was not targeted specifically. And though onlydgemm was optimized for now, support for other datatypes and/or other operationsmay be implemented in the future. We've also added new graphs to thePerformanceSmall document to showcase multithreadedperformance when one or more matrix dimensions are small.

  • Performance comparisons now available! We recently measured theperformance of various level-3 operations on a variety of hardware architectures,as implemented within BLIS and other BLAS libraries for all four of the standardfloating-point datatypes. The results speak for themselves! Check out ourextensive performance graphs and background info in our newPerformance document.

  • BLIS is now in Debian Unstable! Thanks to Debian developer-maintainersM. Zhou andNico Schlömer for sponsoring our package in Debian.Their participation, contributions, and advocacy were key to getting BLIS intothe second-most popular Linux distribution (behind Ubuntu, which Debian packagesfeed into). The Debian tracker page may be foundhere.

  • BLIS now supports mixed-datatype gemm! Thegemm operation may now beexecuted on operands of mixed domains and/or mixed precisions. Any combinationof storage datatype for A, B, and C is now supported, along with a separatecomputation precision that can differ from the storage precision of A and B.And even the 1m method now supports mixed-precision computation.For more details, please see ourACM TOMS journalarticle submission (currentdraft).

  • BLIS now implements the 1m method. Let's face it: writing complexassemblygemm microkernels for a new architecture is never a priority--andnow, it almost never needs to be. The 1m method leverages existing real domaingemm microkernels to implement all complex domain level-3 operations. Formore details, please see ourACM TOMS journal articlesubmission (currentdraft).

What People Are Saying About BLIS

"I noticed a substantial increase in multithreaded performance on my ownmachine, which was extremely satisfying." ..."[I was] happy it worked so well!" (Justin Shea)

"This is an awesome library." ..."I want to thank you and the blis team for your efforts." (@Lephar)

"Any time somebody outside Intel beats MKL by a nontrivial amount, I report it to the MKL team. It is fantastic for any open-source project to get within 10% of MKL... [T]his is why Intel funds BLIS development." (@jeffhammond)

"So BLIS is now a part of Elk." ..."We have found that zgemm applied to a 15000x15000 matrix with multi-threaded BLIS on a 32-core Ryzen 2990WX processor is about twice as fast as MKL" ..."I'm starting to like this a lot." (@jdk2016)

"I [found] BLIS because I was looking for BLAS operations on C-ordered arrays for NumPy. BLIS has that, but even better is the fact that it's developed in the open using a more modern language than Fortran." (@nschloe)

"The specific reason to have BLIS included [in Linux distributions] is the KNL and SKX [AVX-512] BLAS support, which OpenBLAS doesn't have." (@loveshack)

"All tests pass without errors on OpenBSD. Thanks!" (@ararslan)

"Thank you very much for your great help!... Looking forward to benchmarking." (@mrader1248)

"Thanks for the beautiful work." (@mmrmo)

"[M]y software currently uses BLIS for its BLAS interface..." (@ShadenSmith)

"[T]hanks so much for your work on this! Excited to test." ..."[On AMD Excavator], BLIS is competitive to / slightly faster than OpenBLAS for dgemms in my tests." (@iotamudelta)

"BLIS provided the only viable option on KNL, whose ecosystem is at present dominated by blackbox toolchains. Thanks again. Keep on this great work." (@heroxbd)

"I want to definitely try this out..." (@ViralBShah)

Key Features

BLIS offers several advantages over traditional BLAS libraries:

  • Portability that doesn't impede high performance. Portability was a toppriority of ours when creating BLIS. With virtually no additional effort on thepart of the developer, BLIS is configurable as a fully-functional referenceimplementation. But more importantly, the framework identifies and isolates akey set of computational kernels which, when optimized, immediately andautomatically optimize performance across virtually all level-2 and level-3BLIS operations. In this way, the framework acts as a productivity multiplier.And since the optimized (non-portable) code is compartmentalized within thesefew kernels, instantiating a high-performance BLIS library on a newarchitecture is a relatively straightforward endeavor.

  • Generalized matrix storage. The BLIS framework exports interfaces thatallow one to specify both the row stride and column stride of a matrix. Thisallows one to compute with matrices stored in column-major order, row-majororder, or by general stride. (This latter storage format is important for thoseseeking to implement tensor contractions on multidimensional arrays.)Furthermore, since BLIS tracks stride information for each matrix, operands ofdifferent storage formats can be used within the same operation invocation. Bycontrast, BLAS requires column-major storage. And while the CBLAS interfacesupports row-major storage, it does not allow mixing storage formats.

  • Rich support for the complex domain. BLIS operations are developed andexpressed in their most general form, which is typically in the complex domain.These formulations then simplify elegantly down to the real domain, withconjugations becoming no-ops. Unlike the BLAS, all input operands in BLIS thatallow transposition and conjugate-transposition also support conjugation(without transposition), which obviates the need for thread-unsafe workarounds.Also, where applicable, both complex symmetric and complex Hermitian forms aresupported. (BLAS omits some complex symmetric operations, such assymv,syr, andsyr2.) Another great example of BLIS serving as a portabilitylever is its implementation of the 1m method for complex matrix multiplication,a novel mechanism of providing high-performance complex level-3 operations usingonly real domain microkernels. This new innovation guarantees automatic level-3support in the complex domain even when the kernel developers entirely forgowriting complex kernels.

  • Advanced multithreading support. BLIS allows multiple levels ofsymmetric multithreading for nearly all level-3 operations. (Currently, usersmay choose to obtain parallelism via OpenMP, POSIX threads, or HPX). Thismeans that matrices may be partitioned in multiple dimensions simultaneously toattain scalable, high-performance parallelism on multicore and many-corearchitectures. The key to this innovation is a thread-specific control treeinfrastructure which encodes information about the logical thread topology andallows threads to query and communicate data amongst one another. BLIS alsoemploys so-called "quadratic partitioning" when computing dimension sub-rangesfor each thread, so that arbitrary diagonal offsets of structured matrices withunreferenced regions are taken into account to achieve proper load balance.More recently, BLIS introduced a runtime abstraction to specify parallelism ona per-call basis, which is useful for applications that want to handle most ofthe parallelism.

  • Ease of use. The BLIS framework, and the library of routines itgenerates, are easy to use for end users, experts, and vendors alike. Anoptional BLAS compatibility layer provides application developers withbackwards compatibility to existing BLAS-dependent codes. Or, one may adjust orwrite their application to take advantage of new BLIS functionality (such asgeneralized storage formats or additional complex operations) by calling oneof BLIS's native APIs directly. BLIS's typed API will feel familiar to manyveterans of BLAS since these interfaces use BLAS-like calling sequences. Andmany will find BLIS's object-based APIs a delight to use when customizingor writing their own BLIS operations. (Objects are relatively lightweightstructs and passed by address, which helps tame function calling overhead.)

  • Multilayered API and exposed kernels. The BLIS framework exposes itsimplementations in various layers, allowing expert developers to access exactlythe functionality desired. This layered interface includes that of thelowest-level kernels, for those who wish to bypass the bulk of the framework.Optimizations can occur at various levels, in part thanks to exposed packingand unpacking facilities, which by default are highly parameterized andflexible.

  • Functionality that grows with the community's needs. As its namesuggests, the BLIS framework is not a single library or static API, but rathera nearly-complete template for instantiating high-performance BLAS-likelibraries. Furthermore, the framework is extensible, allowing developers toleverage existing components to support new operations as they are identified.If such operations require new kernels for optimal efficiency, the frameworkand its APIs will be adjusted and extended accordingly. Community developerswho wish to experiment with creating new operations or APIs in BLIS can quicklyand easily do so via theAddons feature.

  • Code re-use. Auto-generation approaches to achieving the aforementionedgoals tend to quickly lead to code bloat due to the multiple dimensions ofvariation supported: operation (i.e.gemm,herk,trmm, etc.); parametercase (i.e. side, [conjugate-]transposition, upper/lower storage, unit/non-unitdiagonal); datatype (i.e. single-/double-precision real/complex); matrixstorage (i.e. row-major, column-major, generalized); and algorithm (i.e.partitioning path and kernel shape). These "brute force" approaches oftenconsider and optimize each operation or case combination in isolation, which isless than ideal when the goal is to provide entire libraries. BLIS was designedto be a complete framework for implementing basic linear algebra operations,but supporting this vast amount of functionality in a manageable way required aholistic design that employed careful abstractions, layering, and recycling ofgeneric (highly parameterized) codes, subject to the constraint that highperformance remain attainable.

  • A foundation for mixed domain and/or mixed precision operations. BLISwas designed with the hope of one day allowing computation on real and complexoperands within the same operation. Similarly, we wanted to allow mixingoperands' numerical domains, floating-point precisions, or both domain andprecision, and to optionally compute in a precision different than one or bothoperands' storage precisions. This feature has been implemented for the generalmatrix multiplication (gemm) operation, providing 128 different possible typecombinations, which, when combined with existing transposition, conjugation,and storage parameters, enables 55,296 differentgemm use cases. For moredetails, please see the documentation onmixed datatypesupport and/or ourACM TOMS journal paper onmixed-domain/mixed-precisiongemm (linked below).

How to Download BLIS

There are a few ways to download BLIS. We list the most common four ways below.Wehighly recommend using either Option 1 or 2. Otherwise, we recommendOption 3 (over Option 4) so your compiler can perform optimizations specificto your hardware.

  1. Download a source repository withgit clone.Generally speaking, we prefer usinggit clone to clone agit repository.Having a repository allows the user to periodically pull in the latest changes,try out release candidates when they become available, switch to older versionseasily, and quickly rebuild BLIS whenever they wish.(Note that implicit in cloning a repository is that the repository defaults tousing themaster branch, which, as of 1.0, is considered akin to a developmentbranch and likely contains improvements since the most recent release.)

    In order to clone agit repository of BLIS, please obtain a repositoryURL by clicking on the green button above the file/directory listing near thetop of this page (as rendered by GitHub). Generally speaking, it will amountto executing the following command in your terminal shell:

    git clone https://github.com/flame/blis.git

    At this point, you will have the latest commit of themaster branchchecked out. If you wish to check out an official release version, say,1.0, execute the following:

    git checkout 1.0

    git will then transform your working copy to match the state of thecommit associated with version 1.0. You can view a list of officialversiontags at any time by executing:

    git tag --list

    Note that pre-release versions, such as release candidates, are actuallybranches rather than tags, and thus will not show up in the list of taggedversions.

  2. Download a source release via a tarball/zip file.If you would like to stick to the code that is included in official releasesand don't need the convenience of pulling in the latest changes viagit, youmay download either a tarball or zip file of BLIS's latestrelease. (NOTE: Some older releasesare only available astagged commits.Also note that downloading release x.y.z is equivalent to downloading, orchecking out, thegit tagx.y.z.)We consider this option to be less than ideal for some people since you willnot be able to update your code with a simplegit pull command.

  3. Download a source repository via a zip file.If you are uncomfortable with usinggit but would still like the lateststable commits, we recommend that you download BLIS as a zip file.

    In order to download a zip file of the BLIS source distribution, pleaseclick on the green button above the file listing near the top of this page.This should reveal a link for downloading the zip file.

  4. Download a binary package specific to your OS.While we don't recommend this as the first choice for most users, we providelinks to community members who generously maintain BLIS packages for variousLinux distributions such as Debian Unstable and EPEL/Fedora. Please see theExternal Packages section below for more information.

Getting Started

NOTE: This section assumes you've either cloned a BLIS source code repositoryviagit, downloaded the latest source code via a zip file, or downloaded thesource code for a tagged version release---Options 1, 2, or 3, respectively,as discussed inthe previous section.

If you just want to build a sequential (not parallelized) version of BLISin a hurry and come back and explore other topics later, you can configureand build BLIS as follows:

$ ./configure auto$ make [-j]

You can then verify your build by running BLAS- and BLIS-specific testdrivers viamake check:

$ make check [-j]

And if you would like to install BLIS to the directory specified toconfigurevia the--prefix option, run theinstall target:

$ make install

Please read the output of./configure --help for a full list of configure-timeoptions.If/when you have time, westrongly encourage you to read the detailedwalkthrough of the build system found in ourBuild Systemguide.

If you are still having trouble, you are welcome tojoin us on Discordfor further information and/or assistance.

Example Code

The BLIS source distribution provides example code in theexamples directory.Example code focuses on using BLIS APIs (not BLAS or CBLAS), and resides intwo subdirectories:examples/oapi (which demonstrates theobject API) andexamples/tapi (whichdemonstrates thetyped API).

Either directory contains several files, each containing various pieces ofcode that exercise core functionality of the BLIS API in question (object ortyped). These example files should be thought of collectively like a tutorial,and therefore it is recommended to start from the beginning (the file thatstarts in00).

You can build all of the examples by simply runningmake from either examplesubdirectory (examples/oapi orexamples/tapi). (You can also runmake clean.) The localMakefile assumes that you've already configured andbuilt (but not necessarily installed) BLIS two directories up, in../... Ifyou have already installed BLIS to some permanent directory, you may refer tothat installation by setting the environment variableBLIS_INSTALL_PATH priorto running make:

export BLIS_INSTALL_PATH=/usr/local; make

or by setting the same variable as part of the make command:

make BLIS_INSTALL_PATH=/usr/local

Once the executable files have been built, we recommend reading the code andthe corresponding executable output side by side. This will help you see theeffects of each section of code.

This tutorial is not exhaustive or complete; several object API functions wereomitted (mostly for brevity's sake) and thus more examples could be written.

Documentation

We provide extensive documentation on the BLIS build system, APIs, testinfrastructure, and other important topics. All documentation is formatted inmarkdown and included in the BLIS source distribution (usually in thedocsdirectory). Slightly longer descriptions of each document may be found via inthe project'swiki section.

Documents for everyone:

  • Build System. This document covers the basics ofconfiguring and building BLIS libraries, as well as related topics.

  • Testsuite. This document describes how to runBLIS's highly parameterized and configurable test suite, as well as theincluded BLAS test drivers.

  • BLIS Typed API Reference. Here we document theso-called "typed" (or BLAS-like) API. This is the API that many users who arealready familiar with the BLAS will likely want to use.

  • BLIS Object API Reference. Here we documentthe object API. This is API abstracts away properties of vectors and matriceswithinobj_t structs that can be queried with accessor functions. Manydevelopers and experts prefer this API over the typed API.

  • Hardware Support. This document maintains atable of supported microarchitectures.

  • Multithreading. This document describes how touse the multithreading features of BLIS.

  • Mixed-Datatypes. This document provides anoverview of BLIS's mixed-datatype functionality and provides a brief exampleof how to take advantage of this new code.

  • Extending BLIS functionality. This document provides anoverview of BLIS's mechanisms for extending functionality through user-defined code.BLIS has a plugin infrastructure which allows users to define their own kernels,blocksizes, and kernel preferences which are compiled and managed by the BLIS framework.BLIS also provides an API for modifying the "control tree" which can be used toimplement novel linear algebra operations.

  • Performance. This document reports empiricallymeasured performance of a representative set of level-3 operations on a varietyof hardware architectures, as implemented within BLIS and other BLAS librariesfor all four of the standard floating-point datatypes.

  • PerformanceSmall. This document reportsempirically measured performance ofgemm on select hardware architectureswithin BLIS and other BLAS libraries when performing matrix problems where oneor two dimensions is exceedingly small.

  • Discord. This document describes how to: create anaccount on Discord (if you don't already have one); obtain a private invitelink; and use that invite link to join our BLIS server on Discord.

  • Release Notes. This document tracks a summary ofchanges included with each new version of BLIS, along with contributor creditsfor key features.

  • Frequently Asked Questions. If you have general questionsabout BLIS, please read this FAQ. If you can't find the answer to your question,please feel free to join theblis-develmailing list and post a question. We also have ablis-discuss mailing list thatanyone can post to (even without joining).

Documents for github contributors:

  • Contributing bug reports, feature requests, PRs, etc.Interested in contributing to BLIS? Please read this document before gettingstarted. It provides a general overview of how best to report bugs, propose newfeatures, and offer code patches.

  • Coding Conventions. If you are interested orplanning on contributing code to BLIS, please read this document so that you canformat your code in accordance with BLIS's standards.

Documents for BLIS developers:

  • Kernels Guide. If you would like to learn moreabout the types of kernels that BLIS exposes, their semantics, the operationsthat each kernel accelerates, and various implementation issues, please readthis guide.

  • Configuration Guide. If you would like tolearn how to add new sub-configurations or configuration families, or are simplyinterested in learning how BLIS organizes its configurations and kernel sets,please read this thorough walkthrough of the configuration system.

  • Addon Guide. If you are interested in learningabout using BLIS addons--that is, enabling existing (or creating new) bundlesof operation or API code that are built into a BLIS library--please read thisdocument.

  • Sandbox Guide. If you are interested in learningabout using sandboxes in BLIS--that is, providing alternative implementationsof thegemm operation--please read this document.

Performance

We provide graphs that report performance of several implementations across arange of hardware types, multithreading configurations, problem sizes,operations, and datatypes. These pages also document most of the details neededto reproduce these experiments.

  • Performance. This document reports empiricallymeasured performance of a representative set of level-3 operations on a varietyof hardware architectures, as implemented within BLIS and other BLAS librariesfor all four of the standard floating-point datatypes.

  • PerformanceSmall. This document reportsempirically measured performance ofgemm on select hardware architectureswithin BLIS and other BLAS libraries when performing matrix problems where oneor two dimensions is exceedingly small.

External Packages

Generally speaking, wehighly recommend building from source wheneverpossible using the latestgit clone. (Tarballs of eachtagged release are also available, butwe consider them to be less ideal since they are not as easy to upgrade asgit clones.)

That said, some users may prefer binary and/or source packages through theirLinux distribution. Thanks to generous involvement/contributions from ourcommunity members, the following BLIS packages are now available:

  • Debian.M. Zhou has volunteered tosponsor and maintain BLIS packages within the Debian Linux distribution. TheDebian package tracker can be foundhere.(Also, thanks toNico Schlömer for previouslyvolunteering his time to set up a standalone PPA.)

  • Gentoo.M. Zhou also maintains theBLIS package entry forGentoo, a Linux distribution known for itssource-basedportage package managerand distribution system.

  • EPEL/Fedora. There are official BLIS packages in Fedora and EPEL (forRHEL7+ and compatible distributions) with versions for 64-bit integers, OpenMP,and pthreads, and shims which can be dynamically linked instead of referenceBLAS. (NOTE: For architectures other than intel64, amd64, and maybe arm64, theperformance of packaged BLIS will be low because it uses unoptimized generickernels; for those architectures,OpenBLASmay be a better solution.)DaveLove provides additional packages for EPEL6 in aFedora Copr, andpossibly versions more recent than the official repo for other EPEL/Fedorareleases. The source packages may build on other rpm-based distributions.

  • OpenSuSE. The copr referred to above has rpms for some OpenSuSE releases;the source rpms may build for others.

  • GNU Guix. Guix has BLIS packages, provides builds only for the generictarget and some specificx86_64 micro-architectures.

  • Conda. conda channelconda-forgehas Linux, OSX and Windows binary packages forx86_64.

Discussion

Most of the active discussions are now happening on ourDiscordserver. Users and developers alike are welcome! Please see theBLIS Discord guide for a walkthrough of how to join us.

You can also still stay in touch by using either of the following mailing lists:

  • blis-devel: Please join andpost to this mailing list if you are a BLIS developer, or if you are tryingto use BLIS beyond simply linking to it as a BLAS library.

  • blis-discuss: Please join andpost to this mailing list if you have general questions or feedback regardingBLIS. Application developers (end users) may wish to post here, unless theyhave bug reports, in which case they should open anew issue on github.

Contributing

For information on how to contribute to our project, including preferredcoding conventions, please refer to theCONTRIBUTING file at the top-level of the BLIS sourcedistribution.

Citations

For those of you looking for the appropriate article to cite regarding BLIS, werecommend citing ourfirst ACM TOMS journal paper(unofficial backup link):

@article{BLIS1,   author      = {Field G. {V}an~{Z}ee and Robert A. {v}an~{d}e~{G}eijn},   title       = {{BLIS}: A Framework for Rapidly Instantiating {BLAS} Functionality},   journal     = {ACM Transactions on Mathematical Software},   volume      = {41},   number      = {3},   pages       = {14:1--14:33},   month       = {June},   year        = {2015},   issue_date  = {June 2015},   url         = {https://doi.acm.org/10.1145/2764454},}

You may also cite thesecond ACM TOMS journal paper(unofficial backup link):

@article{BLIS2,   author      = {Field G. {V}an~{Z}ee and Tyler Smith and Francisco D. Igual and                  Mikhail Smelyanskiy and Xianyi Zhang and Michael Kistler and Vernon Austel and                  John Gunnels and Tze Meng Low and Bryan Marker and Lee Killough and                  Robert A. {v}an~{d}e~{G}eijn},   title       = {The {BLIS} Framework: Experiments in Portability},   journal     = {ACM Transactions on Mathematical Software},   volume      = {42},   number      = {2},   pages       = {12:1--12:19},   month       = {June},   year        = {2016},   issue_date  = {June 2016},   url         = {https://doi.acm.org/10.1145/2755561},}

We also have a third paper, submitted to IPDPS 2014, on achievingmultithreaded parallelism in BLIS(unofficial backup link):

@inproceedings{BLIS3,   author      = {Tyler M. Smith and Robert A. {v}an~{d}e~{G}eijn and Mikhail Smelyanskiy and                  Jeff R. Hammond and Field G. {V}an~{Z}ee},   title       = {Anatomy of High-Performance Many-Threaded Matrix Multiplication},   booktitle   = {28th IEEE International Parallel \& Distributed Processing Symposium                  (IPDPS 2014)},   year        = {2014},   url         = {https://doi.org/10.1109/IPDPS.2014.110},}

A fourth paper, submitted to ACM TOMS, also exists, which proposes ananalytical modelfor determining blocksize parameters in BLIS(unofficial backup link):

@article{BLIS4,   author      = {Tze Meng Low and Francisco D. Igual and Tyler M. Smith and                  Enrique S. Quintana-Ort\'{\i}},   title       = {Analytical Modeling Is Enough for High-Performance {BLIS}},   journal     = {ACM Transactions on Mathematical Software},   volume      = {43},   number      = {2},   pages       = {12:1--12:18},   month       = {August},   year        = {2016},   issue_date  = {August 2016},   url         = {https://doi.acm.org/10.1145/2925987},}

A fifth paper, submitted to ACM TOMS, begins the study of so-calledinduced methods for complex matrix multiplication(unofficial backup link):

@article{BLIS5,   author      = {Field G. {V}an~{Z}ee and Tyler Smith},   title       = {Implementing High-performance Complex Matrix Multiplication via the 3m and 4m Methods},   journal     = {ACM Transactions on Mathematical Software},   volume      = {44},   number      = {1},   pages       = {7:1--7:36},   month       = {July},   year        = {2017},   issue_date  = {July 2017},   url         = {https://doi.acm.org/10.1145/3086466},}

A sixth paper, submitted to ACM TOMS, revisits the topic of the previousarticle and derives asuperior induced method(unofficial backup link):

@article{BLIS6,   author      = {Field G. {V}an~{Z}ee},   title       = {Implementing High-Performance Complex Matrix Multiplication via the 1m Method},   journal     = {SIAM Journal on Scientific Computing},   volume      = {42},   number      = {5},   pages       = {C221--C244},   month       = {September}   year        = {2020},   issue_date  = {September 2020},   url         = {https://doi.org/10.1137/19M1282040}}

A seventh paper, submitted to ACM TOMS, explores the implementation ofgemm formixed-domain and/or mixed-precision operands(unofficial backup link):

@article{BLIS7,   author      = {Field G. {V}an~{Z}ee and Devangi N. Parikh and Robert A. van~de~{G}eijn},   title       = {Supporting Mixed-domain Mixed-precision Matrix Multiplicationwithin the BLIS Framework},   journal     = {ACM Transactions on Mathematical Software},   volume      = {47},   number      = {2},   pages       = {12:1--12:26},   month       = {April},   year        = {2021},   issue_date  = {April 2021},   url         = {https://doi.org/10.1145/3402225},}

Awards

Funding

This project and its associated research were partially sponsored by grants fromMicrosoft,Intel,Texas Instruments,AMD,HPE,Oracle,Huawei,Facebook,andARM,as well as grants from theNational Science Foundation (AwardsCCF-0917167, ACI-1148125/1340293, CCF-1320112, and ACI-1550493).

Any opinions, findings and conclusions or recommendations expressed in thismaterial are those of the author(s) and do not necessarily reflect the views ofthe National Science Foundation (NSF).


[8]ページ先頭

©2009-2025 Movatter.jp