kimwalisch/libpopcntPublic

NotificationsYou must be signed in to change notification settings
Fork40
Star348

🚀 Fast C/C++ bit population count library

License

BSD-2-Clause license

348 stars 40 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 370 Commits
.github		.github
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
ChangeLog		ChangeLog
LICENSE		LICENSE
README.md		README.md
benchmark.cpp		benchmark.cpp
libpopcnt.h		libpopcnt.h

Repository files navigation

libpopcnt

libpopcnt.h is a header-only C/C++ library for counting thenumber of 1 bits (bit population count) in an array as quickly aspossible using specialized CPU instructions i.e.POPCNT,AVX2,AVX512,NEON,SVE.libpopcnt.h has been tested successfully using the GCC,Clang and MSVC compilers.

C/C++ API

#include"libpopcnt.h"/* * Count the number of 1 bits in the data array * @data: An array * @size: Size of data in bytes */uint64_tpopcnt(constvoid*data,uint64_tsize);

How to compile

libpopcnt.h does not require any special compiler flags like-mavx2!To get the best performance we only recommend to compile withoptimizations enabled e.g.-O3 or-O2.

cc  -O3 program.cc++ -O3 program.cpp

CPU architectures

libpopcnt.h has hardware accelerated popcount algorithms forthe following CPU architectures:

x86	`POPCNT`,`AVX2`,`AVX512`
x86-64	`POPCNT`,`AVX2`,`AVX512`
ARM	`NEON`,`SVE`
PPC64	`POPCNTD`

For other CPU architectures a fast integer popcount algorithm is used.

How it works

On x86 CPUs,libpopcnt.h first queries your CPU's supportedinstruction sets using theCPUID instruction (this is done only once).Thenlibpopcnt.h chooses the fastest bit population count algorithmsupported by your CPU:

If the CPU supportsAVX512 theAVX512 VPOPCNT algorithm is used.
Else if the CPU supportsAVX2 theAVX2 Harley Seal algorithm is used.
Else if the CPU supportsPOPCNT thePOPCNT algorithm is used.
For CPUs withoutPOPCNT instruction a portable integer algorithm is used.

Note thatlibpopcnt.h works on all CPUs (x86, ARM, PPC, WebAssembly, ...).It is portable by default and hardware acceleration is only enabled if the CPUsupports it.libpopcnt.h it is also thread-safe.

We take performance seriously, if you compile using e.g.-march=nativeon an x86 CPU with AVX512 support then all runtimeCPUID checks are removed!

ARM SVE (Scalable Vector Extension)

ARM SVE is a new vector instruction set for ARM CPUs that was first released in2020. ARM SVE supports a variable vector length from 128 to 2048 bits. HenceARM SVE algorithms can be much faster than ARM NEON algorithms which are limitedto 128 bits vector length.

libpopcnt's new ARM SVE popcount algorithm is up to 3x faster than its ARM NEONpopcount algorithm (on AWS Graviton3 CPUs). Unfortunately runtime dispatching toARM SVE is not yet well supported by the GCC and Clang compilers and libc's.Therefore, by default only the (portable) ARM NEON popcount algorithm is enabledwhen using libpopcnt on ARM CPUs.

To enable libpopcnt's ARM SVE popcount algorithm you need to compile your programusing your compiler's ARM SVE option e.g.:

gcc -O3 -march=armv8-a+sve program.cg++ -O3 -march=armv8-a+sve program.cpp

Development

cmake.make -jmaketest

The above commands also build thebenchmark program which isuseful for benchmarkinglibpopcnt.h. Below is ausage example run on an AMD EPYC 9R14 CPU from 2023:

# Usage: ./benchmark [array bytes] [iters]./benchmarkIters: 10000000Array size: 16.00 KBAlgorithm: AVX512Status: 100%Seconds: 1.23133.5 GB/s

Acknowledgments

Some of the algorithms used inlibpopcnt.h are described in the paperFaster Population Counts using AVX2 Instructionsby Daniel Lemire, Nathan Kurz and Wojciech Mula (23 Nov 2016). The AVX2 Harley Sealpopcount algorithm used inlibpopcnt.h has been copied from Wojciech Muła'ssse-popcount GitHub repo.

About

🚀 Fast C/C++ bit population count library

Releases19

libpopcnt-3.1 Latest

Jun 29, 2024

+ 18 releases

Sponsor this project

Learn more about GitHub Sponsors

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

libpopcnt

C/C++ API

How to compile

CPU architectures

How it works

ARM SVE (Scalable Vector Extension)

Development

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases19

Sponsor this project

Uh oh!

Uh oh!

Contributors5

Uh oh!

Languages

Movatterモバイル変換

Uh oh!

License

kimwalisch/libpopcnt

Folders and files

Latest commit

History

Repository files navigation

libpopcnt

C/C++ API

How to compile

CPU architectures

How it works

ARM SVE (Scalable Vector Extension)

Development

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases19

Sponsor this project

Uh oh!

Uh oh!

Contributors5

Uh oh!

Languages