Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A simple integer compression library in Java

License

NotificationsYou must be signed in to change notification settings

fast-pack/JavaFastPFOR

Repository files navigation

docs-badgeJava CI

What does this do?

It is a library to compress and uncompress arrays of integersvery fast. The assumption is that most (but not all) values inyour array use much less than 32 bits, or that the gaps betweenthe integers use much less than 32 bits. These sort of arrays often come upwhen using differential coding in databases and informationretrieval (e.g., in inverted indexes or column stores).

Please note that random integers are not compressible, by thislibrary or by any other means. If you ever had the means ofsystematically compressing random integers, you could compressany data source to nothing, by recursive application of your technique.

This library can decompress integers at a rate of over 1.2 billions per second(4.5 GB/s). It is significantly faster than generic codecs (suchas Snappy, LZ4 and so on) when compressing arrays of integers.

The library is used inLinkedIn Pinot, a realtime distributed OLAP datastore.Part of this library has been integrated in Parquet (http://parquet.io/).A modified version of the library is included in the search engineTerrier (http://terrier.org/). This libary is used by ClueWebTools (https://github.com/lintool/clueweb). It is also used byApache NiFi.

This library inspired a compression scheme used by Apache Lucene and Apache Lucene.NET (e.g., seehttp://lucene.apache.org/core/4_6_1/core/org/apache/lucene/util/PForDeltaDocIdSet.html ).

It is a java port of the fastpfor C++ library (https://github.com/lemire/FastPFor).There is also a Go port (https://github.com/reducedb/encoding). The C++library is used by the zsearch engine (http://victorparmar.github.com/zsearch/)as well as in GMAP and GSNAP (http://research-pub.gene.com/gmap/).

Usage

packageorg.example;importme.lemire.integercompression.FastPFOR128;importme.lemire.integercompression.IntWrapper;importjava.util.Arrays;publicclassMain {publicstaticvoidmain(String[]args) {FastPFOR128fastpfor =newFastPFOR128();intN =9984;int[]data =newint[N];for (vari =0;i <N;i +=150) {data[i] =i;        }int[]compressedoutput1 =newint[N +1024];IntWrapperinputoffset1 =newIntWrapper(0);IntWrapperoutputoffset1 =newIntWrapper(0);fastpfor.compress(data,inputoffset1,N,compressedoutput1,outputoffset1);intcompressedsize1 =outputoffset1.get();int[]recovered1 =newint[N];inputoffset1 =newIntWrapper(0);outputoffset1 =newIntWrapper(0);fastpfor.uncompress(compressedoutput1,outputoffset1,compressedsize1,recovered1,inputoffset1);// quick verification: count mismatchesintmismatches =0;for (inti =0;i <N;i++) {if (data[i] !=recovered1[i])mismatches++;        }System.out.println("N=" +N +" compressedSizeWords=" +compressedsize1 +" mismatches=" +mismatches);System.out.println("first 20 original: " +Arrays.toString(Arrays.copyOf(data,20)));System.out.println("first 20 recovered: " +Arrays.toString(Arrays.copyOf(recovered1,20)));    }}

For more examples, see example.java or the examples folder.

JavaFastPFOR supports compressing and uncompressing data in chunks (e.g., seeadvancedExample inhttps://github.com/lemire/JavaFastPFOR/blob/master/example.java).

Some CODECs ("integrated codecs") assume that the integers arein sorted orders and use differential coding (they compress deltas).They can be found in the package me.lemire.integercompression.differential.Most others do not.

The Java Team at Intel (R) introduced the vector implementation for FastPFORbased on the Java Vector API that showed significant gains over thenon-vectorized implementation. For an example usage, seeexamples/vector/Example.java. The feature requires JDK 19+ and is currently foradvanced users.

JavaFastPFOR as a dependency (JitPack)

We have a demo project using JavaFastPFOR as a dependency (both Maven and Gradle). See...

https://github.com/fast-pack/JavaFastPFORDemo

  1. Maven

Using this code in your own project is easy with maven, just addthe following code in your pom.xml file:

<dependency>    <groupId>com.github.fast-pack</groupId>    <artifactId>JavaFastPFor</artifactId>    <version>JavaFastPFOR-0.3.2</version></dependency>

as well as jitpack as a repository...

<repositories><repository>    <id>jitpack.io</id>    <url>https://jitpack.io</url></repository></repositories>

Naturally, you should replace "version" by the versionyou desire.

  1. Gradle (groovy)

Then all you need is to edit yourbuild.gradle file like so:

plugins {    id'java'}repositories {    mavenCentral()    maven {        url'https://jitpack.io'    }}dependencies {    implementation'com.github.fast-pack:JavaFastPFor:JavaFastPFOR-0.3.2'}

Naturally, you should replace "version" by the versionyou desire.

Thread safety

Some codecs are thread-safe while others are not.For this reason, it is best to use one codec per thread.The memory usage of a codec instance is small in any case.

Nevertheless, if you want to reuse codec instances,note that by convention, unless the documentation of a codec specifythat it is not thread-safe, then it can be assumed to be thread-safe.

How does it compare to the Kamikaze PForDelta library?

In our tests, Kamikaze PForDelta is slower than our implementations. Seethe benchmarkresults directory for some results.

https://github.com/lemire/JavaFastPFOR/blob/master/benchmarkresults/benchmarkresults_icore7_10may2013.txt

Reference:http://sna-projects.com/kamikaze/

Requirements

Releases up to 0.1.12 require Java 7 or better.

The current development versions assume JDK 21 or better.

How fast is it?

Compile the code and executeme.lemire.integercompression.benchmarktools.Benchmark.

Speed is always reported in millions of integers per second.

For Maven users

mvn compilemvn exec:java

You may run our examples as follows:

mvn packagejavac -cp target/classes/:. example.javajava -cp target/classes/:. example

For ant users (legacy, currently untested)

If you use Apache ant, please try this:

$ ant Benchmark

or:

$ ant Benchmark -Dbenchmark.target=BenchmarkBitPacking

API Documentation

http://www.javadoc.io/doc/me.lemire.integercompression/JavaFastPFOR/

Want to read more?

This library was a key ingredient in the best paper at ECIR 2014 :

Matteo Catena, Craig Macdonald, Iadh Ounis, On Inverted Index Compression for Search Engine Efficiency, Lecture Notes in Computer Science 8416 (ECIR 2014), 2014.http://dx.doi.org/10.1007/978-3-319-06028-6_30

We wrote several research papers documenting many of the CODECs implemented here:

Ikhtear Sharif wrote his M.Sc. thesis on this library:

Ikhtear Sharif, Performance Evaluation of Fast Integer Compression Techniques Over Tables, M.Sc. thesis, UNB 2013.https://unbscholar.lib.unb.ca/islandora/object/unbscholar%3A9399/datastream/PDF/view

He also posted his slides online:http://www.slideshare.net/ikhtearSharif/ikhtear-defense

Other recommended libraries

Funding

This work was supported by NSERC grant number 26143.

About

A simple integer compression library in Java

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp