zstd
packageThis package is not in the latest version of its module.
Details
Validgo.mod file
The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license
Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version
Modules with tagged versions give importers more predictable builds.
Stable version
When a project reaches major version v1 it is considered stable.
- Learn more about best practices
Repository
Links
README¶
zstd
Zstandard is a real-time compression algorithm, providing high compression ratios.It offers a very wide range of compression / speed trade-off, while being backed by a very fast decoder.A high performance compression algorithm is implemented. For now focused on speed.
This package providescompression to anddecompression of Zstandard content.
This package is pure Go. Usenoasm andnounsafe to disable relevant features.
Thezstd package is provided as open source software using a Go standard license.
Currently the package is heavily optimized for 64 bit processors and will be significantly slower on 32 bit processors.
For seekable zstd streams, seethis excellent package.
Installation
Install usinggo get -u github.com/klauspost/compress. The package is located ingithub.com/klauspost/compress/zstd.
Compressor
Status:
STABLE - there may always be subtle bugs, a wide variety of content has been tested and the library is activelyused by several projects. This library is beingfuzz-tested for all updates.
There may still be specific combinations of data types/size/settings that could lead to edge cases,so as always, testing is recommended.
For now, a high speed (fastest) and medium-fast (default) compressor has been implemented.
- The "Fastest" compression ratio is roughly equivalent to zstd level 1.
- The "Default" compression ratio is roughly equivalent to zstd level 3 (default).
- The "Better" compression ratio is roughly equivalent to zstd level 7.
- The "Best" compression ratio is roughly equivalent to zstd level 11.
In terms of speed, it is typically 2x as fast as the stdlib deflate/gzip in its fastest mode.The compression ratio compared to stdlib is around level 3, but usually 3x as fast.
Usage
An Encoder can be used for either compressing a stream via theio.WriteCloser interface supported by the Encoder or as multiple independenttasks via theEncodeAll function.Smaller encodes are encouraged to use the EncodeAll function.UseNewWriter to create a new instance that can be used for both.
To create a writer with default options, do like this:
// Compress input to output.func Compress(in io.Reader, out io.Writer) error { enc, err := zstd.NewWriter(out) if err != nil { return err } _, err = io.Copy(enc, in) if err != nil { enc.Close() return err } return enc.Close()}Now you can encode by writing data toenc. The output will be finished writing whenClose() is called.Even if your encode fails, you should still callClose() to release any resources that may be held up.
The above is fine for big encodes. However, whenever possible try toreuse the writer.
To reuse the encoder, you can use theReset(io.Writer) function to change to another output.This will allow the encoder to reuse all resources and avoid wasteful allocations.
Currently stream encoding has 'light' concurrency, meaning up to 2 goroutines can be working on partof a stream. This is independent of theWithEncoderConcurrency(n), but that is likely to changein the future. So if you want to limit concurrency for future updates, specify the concurrencyyou would like.
If you would like stream encoding to be done without spawning async goroutines, useWithEncoderConcurrency(1)which will compress input as each block is completed, blocking on writes until each has completed.
You can specify your desired compression level usingWithEncoderLevel() option. Currently only pre-definedcompression settings can be specified.
Future Compatibility Guarantees
This will be an evolving project. When using this package it is important to note that both the compression efficiency and speed may change.
The goal will be to keep the default efficiency at the default zstd (level 3).However the encoding should never be assumed to remain the same,and you should not use hashes of compressed output for similarity checks.
The Encoder can be assumed to produce the same output from the exact same code version.However, the may be modes in the future that break this,although they will not be enabled without an explicit option.
This encoder is not designed to (and will probably never) output the exact same bitstream as the reference encoder.
Also note, that the cgo decompressor currently does notreport all errors on invalid input,omits error checks,ignores checksumsand seems to ignore concatenated streams, even thoughit is part of the spec.
Blocks
For compressing small blocks, the returned encoder has a function calledEncodeAll(src, dst []byte) []byte.
EncodeAll will encode all input in src and append it to dst.This function can be called concurrently.Each call will only run on a same goroutine as the caller.
Encoded blocks can be concatenated and the result will be the combined input stream.Data compressed with EncodeAll can be decoded with the Decoder, using either a stream orDecodeAll.
Especially when encoding blocks you should take special care to reuse the encoder.This will effectively make it run without allocations after a warmup period.To make it run completely without allocations, supply a destination buffer with space for all content.
import "github.com/klauspost/compress/zstd"// Create a writer that caches compressors.// For this operation type we supply a nil Reader.var encoder, _ = zstd.NewWriter(nil)// Compress a buffer. // If you have a destination buffer, the allocation in the call can also be eliminated.func Compress(src []byte) []byte { return encoder.EncodeAll(src, make([]byte, 0, len(src)))}You can control the maximum number of concurrent encodes using theWithEncoderConcurrency(n)option when creating the writer.
Using the Encoder for both a stream and individual blocks concurrently is safe.
Performance
I have collected some speed examples to compare speed and compression against other compressors.
fileis the input file.outis the compressor used.zskpis this package.zstdis the Datadog cgo library.gzstd/gzkpis gzip standard and this library.levelis the compression level used. Forzskplevel 1 is "fastest", level 2 is "default"; 3 is "better", 4 is "best".insize/outsizeis the input/output size.millisis the number of milliseconds used for compression.mb/sis megabytes (2^20 bytes) per second.
Silesia Corpus:http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zipThis package:file out level insize outsize millis mb/ssilesia.tar zskp 1 211947520 73821326 634 318.47silesia.tar zskp 2 211947520 67655404 1508 133.96silesia.tar zskp 3 211947520 64746933 3000 67.37silesia.tar zskp 4 211947520 60073508 16926 11.94cgo zstd:silesia.tar zstd 1 211947520 73605392 543 371.56silesia.tar zstd 3 211947520 66793289 864 233.68silesia.tar zstd 6 211947520 62916450 1913 105.66silesia.tar zstd 9 211947520 60212393 5063 39.92gzip, stdlib/this package:silesia.tar gzstd 1 211947520 80007735 1498 134.87silesia.tar gzkp 1 211947520 80088272 1009 200.31GOB stream of binary data. Highly compressible.https://files.klauspost.com/compress/gob-stream.7zfile out level insize outsize millis mb/sgob-stream zskp 1 1911399616 233948096 3230 564.34gob-stream zskp 2 1911399616 203997694 4997 364.73gob-stream zskp 3 1911399616 173526523 13435 135.68gob-stream zskp 4 1911399616 162195235 47559 38.33gob-stream zstd 1 1911399616 249810424 2637 691.26gob-stream zstd 3 1911399616 208192146 3490 522.31gob-stream zstd 6 1911399616 193632038 6687 272.56gob-stream zstd 9 1911399616 177620386 16175 112.70gob-stream gzstd 1 1911399616 357382013 9046 201.49gob-stream gzkp 1 1911399616 359136669 4885 373.08The test data for the Large Text Compression Benchmark is the first10^9 bytes of the English Wikipedia dump on Mar. 3, 2006.http://mattmahoney.net/dc/textdata.htmlfile out level insize outsize millis mb/senwik9 zskp 1 1000000000 343833605 3687 258.64enwik9 zskp 2 1000000000 317001237 7672 124.29enwik9 zskp 3 1000000000 291915823 15923 59.89enwik9 zskp 4 1000000000 261710291 77697 12.27enwik9 zstd 1 1000000000 358072021 3110 306.65enwik9 zstd 3 1000000000 313734672 4784 199.35enwik9 zstd 6 1000000000 295138875 10290 92.68enwik9 zstd 9 1000000000 278348700 28549 33.40enwik9 gzstd 1 1000000000 382578136 8608 110.78enwik9 gzkp 1 1000000000 382781160 5628 169.45Highly compressible JSON file.https://files.klauspost.com/compress/github-june-2days-2019.json.zstfile out level insize outsize millis mb/sgithub-june-2days-2019.json zskp 1 6273951764 697439532 9789 611.17github-june-2days-2019.json zskp 2 6273951764 610876538 18553 322.49github-june-2days-2019.json zskp 3 6273951764 517662858 44186 135.41github-june-2days-2019.json zskp 4 6273951764 464617114 165373 36.18github-june-2days-2019.json zstd 1 6273951764 766284037 8450 708.00github-june-2days-2019.json zstd 3 6273951764 661889476 10927 547.57github-june-2days-2019.json zstd 6 6273951764 642756859 22996 260.18github-june-2days-2019.json zstd 9 6273951764 601974523 52413 114.16github-june-2days-2019.json gzstd 1 6273951764 1164397768 26793 223.32github-june-2days-2019.json gzkp 1 6273951764 1120631856 17693 338.16VM Image, Linux mint with a few installed applications:https://files.klauspost.com/compress/rawstudio-mint14.7zfile out level insize outsize millis mb/srawstudio-mint14.tar zskp 1 8558382592 3718400221 18206 448.29rawstudio-mint14.tar zskp 2 8558382592 3326118337 37074 220.15rawstudio-mint14.tar zskp 3 8558382592 3163842361 87306 93.49rawstudio-mint14.tar zskp 4 8558382592 2970480650 783862 10.41rawstudio-mint14.tar zstd 1 8558382592 3609250104 17136 476.27rawstudio-mint14.tar zstd 3 8558382592 3341679997 29262 278.92rawstudio-mint14.tar zstd 6 8558382592 3235846406 77904 104.77rawstudio-mint14.tar zstd 9 8558382592 3160778861 140946 57.91rawstudio-mint14.tar gzstd 1 8558382592 3926234992 51345 158.96rawstudio-mint14.tar gzkp 1 8558382592 3960117298 36722 222.26CSV data:https://files.klauspost.com/compress/nyc-taxi-data-10M.csv.zstfile out level insize outsize millis mb/snyc-taxi-data-10M.csv zskp 1 3325605752 641319332 9462 335.17nyc-taxi-data-10M.csv zskp 2 3325605752 588976126 17570 180.50nyc-taxi-data-10M.csv zskp 3 3325605752 529329260 32432 97.79nyc-taxi-data-10M.csv zskp 4 3325605752 474949772 138025 22.98nyc-taxi-data-10M.csv zstd 1 3325605752 687399637 8233 385.18nyc-taxi-data-10M.csv zstd 3 3325605752 598514411 10065 315.07nyc-taxi-data-10M.csv zstd 6 3325605752 570522953 20038 158.27nyc-taxi-data-10M.csv zstd 9 3325605752 517554797 64565 49.12nyc-taxi-data-10M.csv gzstd 1 3325605752 928654908 21270 149.11nyc-taxi-data-10M.csv gzkp 1 3325605752 922273214 13929 227.68Decompressor
Status: STABLE - there may still be subtle bugs, but a wide variety of content has been tested.
This library is being continuouslyfuzz-tested,kindly supplied byfuzzit.dev.The main purpose of the fuzz testing is to ensure that it is not possible to crash the decoder,or run it past its limits with ANY input provided.
Usage
The package has been designed for two main usages, big streams of data and smaller in-memory buffers.There are two main usages of the package for these. Both of them are accessed by creating aDecoder.
For streaming use a simple setup could look like this:
import "github.com/klauspost/compress/zstd"func Decompress(in io.Reader, out io.Writer) error { d, err := zstd.NewReader(in) if err != nil { return err } defer d.Close() // Copy content... _, err = io.Copy(out, d) return err}It is important to use the "Close" function when you no longer need the Reader to stop running goroutines,when running with default settings.Goroutines will exit once an error has been returned, includingio.EOF at the end of a stream.
Streams are decoded concurrently in 4 asynchronous stages to give the best possible throughput.However, if you prefer synchronous decompression, useWithDecoderConcurrency(1) which will decompress dataas it is being requested only.
For decoding buffers, it could look something like this:
import "github.com/klauspost/compress/zstd"// Create a reader that caches decompressors.// For this operation type we supply a nil Reader.var decoder, _ = zstd.NewReader(nil, zstd.WithDecoderConcurrency(0))// Decompress a buffer. We don't supply a destination buffer,// so it will be allocated by the decoder.func Decompress(src []byte) ([]byte, error) { return decoder.DecodeAll(src, nil)}Both of these cases should provide the functionality needed.The decoder can be used forconcurrent decompression of multiple buffers.By default 4 decompressors will be created.
It will only allow a certain number of concurrent operations to run.To tweak that yourself use theWithDecoderConcurrency(n) option when creating the decoder.It is possible to useWithDecoderConcurrency(0) to create GOMAXPROCS decoders.
Dictionaries
Data compressed withdictionaries can be decompressed.
Dictionaries are added individually to Decoders.Dictionaries are generated by thezstd --train command and contains an initial state for the decoder.To add a dictionary use theWithDecoderDicts(dicts ...[]byte) option with the dictionary data.Several dictionaries can be added at once.
The dictionary will be used automatically for the data that specifies them.A re-used Decoder will still contain the dictionaries registered.
When registering multiple dictionaries with the same ID, the last one will be used.
It is possible to use dictionaries when compressing data.
To enable a dictionary useWithEncoderDict(dict []byte). Here only one dictionary will be usedand it will likely be used even if it doesn't improve compression.
The used dictionary must be used to decompress the content.
For any real gains, the dictionary should be built with similar data.If an unsuitable dictionary is used the output may be slightly larger than using no dictionary.Use thezstd commandline tool to build a dictionary from sample data.For information seezstd dictionary information.
For now there is a fixed startup performance penalty for compressing content with dictionaries.This will likely be improved over time. Just be aware to test performance when implementing.
Allocation-less operation
The decoder has been designed to operate without allocations after a warmup.
This means that you shouldstore the decoder for best performance.To re-use a stream decoder, use theReset(r io.Reader) error to switch to another stream.A decoder can safely be re-used even if the previous stream failed.
To release the resources, you must call theClose() function on a decoder.After this it canno longer be reused, but all running goroutines will be stopped.So youmust use this if you will no longer need the Reader.
For decompressing smaller buffers a single decoder can be used.When decoding buffers, you can supply a destination slice with length 0 and your expected capacity.In this case no unneeded allocations should be made.
Concurrency
The buffer decoder does everything on the same goroutine and does nothing concurrently.It can however decode several buffers concurrently. UseWithDecoderConcurrency(n) to limit that.
The stream decoder will create goroutines that:
- Reads input and splits the input into blocks.
- Decompression of literals.
- Decompression of sequences.
- Reconstruction of output stream.
So effectively this also means the decoder will "read ahead" and prepare data to always be available for output.
The concurrency level will, for streams, determine how many blocks ahead the compression will start.
Since "blocks" are quite dependent on the output of the previous block stream decoding will only have limited concurrency.
In practice this means that concurrency is often limited to utilizing about 3 cores effectively.
Benchmarks
The first two are streaming decodes and the last are smaller inputs.
Running on AMD Ryzen 9 3950X 16-Core Processor. AMD64 assembly used.
BenchmarkDecoderSilesia-32 5 206878840 ns/op1024.50 MB/s 49808 B/op 43 allocs/opBenchmarkDecoderEnwik9-32 11271809000 ns/op 786.28 MB/s 72048 B/op 52 allocs/opConcurrent blocks, performance:BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32 67356 17857 ns/op10321.96 MB/s 22.48 pct 102 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32 266656 4421 ns/op26823.21 MB/s 11.89 pct 19 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32 20992 56842 ns/op8477.17 MB/s 39.90 pct 754 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32 27456 43932 ns/op9714.01 MB/s 33.27 pct 524 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32 78432 15047 ns/op8319.15 MB/s 40.34 pct 66 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32 65800 18436 ns/op8249.63 MB/s 37.75 pct 88 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32 102993 11523 ns/op35546.09 MB/s 3.637 pct 143 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32 1000000 1070 ns/op95720.98 MB/s 80.53 pct 3 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32 749802 1752 ns/op70272.35 MB/s 100.0 pct 5 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32 22640 52934 ns/op13263.37 MB/s 26.25 pct 1014 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/html.zst-32 226412 5232 ns/op19572.27 MB/s 14.49 pct 20 B/op 0 allocs/opBenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32 923041 1276 ns/op3194.71 MB/s 31.26 pct 0 B/op 0 allocs/opThis reflects the performance around May 2022, but this may be out of date.
Zstd inside ZIP files
It is possible to use zstandard to compress individual files inside zip archives.While this isn't widely supported it can be useful for internal files.
To support the compression and decompression of these files you must register a compressor and decompressor.
It is highly recommended registering the (de)compressors on individual zip Reader/Writer and NOTuse the global registration functions. The main reason for this is that 2 registrations fromdifferent packages will result in a panic.
It is a good idea to only have a single compressor and decompressor, since they can be used for multiple zipfiles concurrently, and using a single instance will allow reusing some resources.
Seethis example forhow to compress and decompress files inside zip archives.
Contributions
Contributions are always welcome.For new features/fixes, remember to add tests and for performance enhancements include benchmarks.
For general feedback and experience reports, feel free to open an issue or write me onTwitter.
This package includes the excellentgithub.com/cespare/xxhash package Copyright (c) 2016 Caleb Spare.
Documentation¶
Overview¶
Package zstd provides decompression of zstandard files.
For advanced usage and examples, go to the README:https://github.com/klauspost/compress/tree/master/zstd#zstd
Index¶
- Constants
- Variables
- func BuildDict(o BuildDictOptions) ([]byte, error)
- func DecodeTo(dst []byte, src []byte) ([]byte, error)
- func EncodeTo(dst []byte, src []byte) []byte
- func InspectDictionary(b []byte) (interface{ ... }, error)
- func ZipCompressor(opts ...EOption) func(w io.Writer) (io.WriteCloser, error)
- func ZipDecompressor(opts ...DOption) func(r io.Reader) io.ReadCloser
- type BuildDictOptions
- type DOption
- func IgnoreChecksum(b bool) DOption
- func WithDecodeAllCapLimit(b bool) DOption
- func WithDecodeBuffersBelow(size int) DOption
- func WithDecoderConcurrency(n int) DOption
- func WithDecoderDictRaw(id uint32, content []byte) DOption
- func WithDecoderDicts(dicts ...[]byte) DOption
- func WithDecoderLowmem(b bool) DOption
- func WithDecoderMaxMemory(n uint64) DOption
- func WithDecoderMaxWindow(size uint64) DOption
- type Decoder
- type EOption
- func WithAllLitEntropyCompression(b bool) EOption
- func WithEncoderCRC(b bool) EOption
- func WithEncoderConcurrency(n int) EOption
- func WithEncoderDict(dict []byte) EOption
- func WithEncoderDictRaw(id uint32, content []byte) EOption
- func WithEncoderLevel(l EncoderLevel) EOption
- func WithEncoderPadding(n int) EOption
- func WithLowerEncoderMem(b bool) EOption
- func WithNoEntropyCompression(b bool) EOption
- func WithSingleSegment(b bool) EOption
- func WithWindowSize(n int) EOption
- func WithZeroFrames(b bool) EOption
- type Encoder
- func (e *Encoder) Close() error
- func (e *Encoder) EncodeAll(src, dst []byte) []byte
- func (e *Encoder) Flush() error
- func (e *Encoder) MaxEncodedSize(size int) int
- func (e *Encoder) ReadFrom(r io.Reader) (n int64, err error)
- func (e *Encoder) Reset(w io.Writer)
- func (e *Encoder) ResetContentSize(w io.Writer, size int64)
- func (e *Encoder) Write(p []byte) (n int, err error)
- type EncoderLevel
- type Header
- type SnappyConverter
Examples¶
Constants¶
const (// MinWindowSize is the minimum Window Size, which is 1 KB.MinWindowSize = 1 << 10// MaxWindowSize is the maximum encoder window size// and the default decoder maximum window size.MaxWindowSize = 1 << 29)
const HeaderMaxSize = 14 + 3HeaderMaxSize is the maximum size of a Frame and Block Header.If less is sent to Header.Decode it *may* still contain enough information.
const ZipMethodPKWare = 20ZipMethodPKWare is the original method number used by PKWARE to indicate Zstandard compression.Deprecated: This has been deprecated by PKWARE, use ZipMethodWinZip instead for compression.Seehttps://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.9.TXT
const ZipMethodWinZip = 93ZipMethodWinZip is the method for Zstandard compressed data inside Zip files for WinZip.Seehttps://www.winzip.com/win/en/comp_info.html
Variables¶
var (// ErrSnappyCorrupt reports that the input is invalid.ErrSnappyCorrupt =errors.New("snappy: corrupt input")// ErrSnappyTooLarge reports that the uncompressed length is too large.ErrSnappyTooLarge =errors.New("snappy: decoded block is too large")// ErrSnappyUnsupported reports that the input isn't supported.ErrSnappyUnsupported =errors.New("snappy: unsupported input"))
var (// ErrReservedBlockType is returned when a reserved block type is found.// Typically this indicates wrong or corrupted input.ErrReservedBlockType =errors.New("invalid input: reserved block type encountered")// ErrCompressedSizeTooBig is returned when a block is bigger than allowed.// Typically this indicates wrong or corrupted input.ErrCompressedSizeTooBig =errors.New("invalid input: compressed size too big")// ErrBlockTooSmall is returned when a block is too small to be decoded.// Typically returned on invalid input.ErrBlockTooSmall =errors.New("block too small")// ErrUnexpectedBlockSize is returned when a block has unexpected size.// Typically returned on invalid input.ErrUnexpectedBlockSize =errors.New("unexpected block size")// ErrMagicMismatch is returned when a "magic" number isn't what is expected.// Typically this indicates wrong or corrupted input.ErrMagicMismatch =errors.New("invalid input: magic number mismatch")// ErrWindowSizeExceeded is returned when a reference exceeds the valid window size.// Typically this indicates wrong or corrupted input.ErrWindowSizeExceeded =errors.New("window size exceeded")// ErrWindowSizeTooSmall is returned when no window size is specified.// Typically this indicates wrong or corrupted input.ErrWindowSizeTooSmall =errors.New("invalid input: window size was too small")// ErrDecoderSizeExceeded is returned if decompressed size exceeds the configured limit.ErrDecoderSizeExceeded =errors.New("decompressed size exceeds configured limit")// ErrUnknownDictionary is returned if the dictionary ID is unknown.ErrUnknownDictionary =errors.New("unknown dictionary")// ErrFrameSizeExceeded is returned if the stated frame size is exceeded.// This is only returned if SingleSegment is specified on the frame.ErrFrameSizeExceeded =errors.New("frame size exceeded")// ErrFrameSizeMismatch is returned if the stated frame size does not match the expected size.// This is only returned if SingleSegment is specified on the frame.ErrFrameSizeMismatch =errors.New("frame size does not match size on stream")// ErrCRCMismatch is returned if CRC mismatches.ErrCRCMismatch =errors.New("CRC check failed")// ErrDecoderClosed will be returned if the Decoder was used after// Close has been called.ErrDecoderClosed =errors.New("decoder used after Close")// ErrEncoderClosed will be returned if the Encoder was used after// Close has been called.ErrEncoderClosed =errors.New("encoder used after Close")// ErrDecoderNilInput is returned when a nil Reader was provided// and an operation other than Reset/DecodeAll/Close was attempted.ErrDecoderNilInput =errors.New("nil input provided as reader"))
Functions¶
funcBuildDict¶added inv1.17.0
func BuildDict(oBuildDictOptions) ([]byte,error)
funcDecodeTo¶added inv1.18.1
DecodeTo appends the decoded data from src to dst.The maximum decoded size is 1GiB,not including what may already be in dst.
funcInspectDictionary¶added inv1.16.0
func InspectDictionary(b []byte) (interface {ID()uint32ContentSize()intContent() []byteOffsets() [3]intLitEncoder() *huff0.Scratch},error)
InspectDictionary loads a zstd dictionary and provides functions to inspect the content.
funcZipCompressor¶added inv1.12.2
ZipCompressor returns a compressor that can be registered with zip libraries.The provided encoder options will be used on all encodes.
Example¶
package mainimport ("archive/zip""bytes""fmt""io""github.com/klauspost/compress/zstd")func main() {// Get zstandard de/compressors for zip.// These can be used by multiple readers and writers.compr := zstd.ZipCompressor(zstd.WithWindowSize(1<<20), zstd.WithEncoderCRC(false))decomp := zstd.ZipDecompressor()// Try it out...var buf bytes.Bufferzw := zip.NewWriter(&buf)zw.RegisterCompressor(zstd.ZipMethodWinZip, compr)zw.RegisterCompressor(zstd.ZipMethodPKWare, compr)// Create 1MB datatmp := make([]byte, 1<<20)for i := range tmp {tmp[i] = byte(i)}w, err := zw.CreateHeader(&zip.FileHeader{Name: "file1.txt",Method: zstd.ZipMethodWinZip,})if err != nil {panic(err)}w.Write(tmp)// Another...w, err = zw.CreateHeader(&zip.FileHeader{Name: "file2.txt",Method: zstd.ZipMethodPKWare,})w.Write(tmp)zw.Close()zr, err := zip.NewReader(bytes.NewReader(buf.Bytes()), int64(buf.Len()))if err != nil {panic(err)}zr.RegisterDecompressor(zstd.ZipMethodWinZip, decomp)zr.RegisterDecompressor(zstd.ZipMethodPKWare, decomp)for _, file := range zr.File {rc, err := file.Open()if err != nil {panic(err)}b, err := io.ReadAll(rc)rc.Close()if bytes.Equal(b, tmp) {fmt.Println(file.Name, "ok")} else {fmt.Println(file.Name, "mismatch")}}}Output:file1.txt okfile2.txt ok
funcZipDecompressor¶added inv1.12.2
func ZipDecompressor(opts ...DOption) func(rio.Reader)io.ReadCloser
ZipDecompressor returns a decompressor that can be registered with zip libraries.See ZipCompressor for example.Options can be specified. WithDecoderConcurrency(1) is forced,and by default a 128MB maximum decompression window is specified.The window size can be overridden if required.
Types¶
typeBuildDictOptions¶added inv1.17.0
type BuildDictOptions struct {// Dictionary ID.IDuint32// Content to use to create dictionary tables.Contents [][]byte// History to use for all blocks.History []byte// Offsets to use.Offsets [3]int// CompatV155 will make the dictionary compatible with Zstd v1.5.5 and earlier.// Seehttps://github.com/facebook/zstd/issues/3724CompatV155bool// Use the specified encoder level.// The dictionary will be built using the specified encoder level,// which will reflect speed and make the dictionary tailored for that level.// If not set SpeedBestCompression will be used.LevelEncoderLevel// DebugOut will write stats and other details here if set.DebugOutio.Writer}typeDOption¶
type DOption func(*decoderOptions)error
DOption is an option for creating a decoder.
funcIgnoreChecksum¶added inv1.15.3
IgnoreChecksum allows to forcibly ignore checksum checking.
funcWithDecodeAllCapLimit¶added inv1.15.10
WithDecodeAllCapLimit will limit DecodeAll to decoding cap(dst)-len(dst) bytes,or any size set in WithDecoderMaxMemory.This can be used to limit decoding to a specific maximum output size.Disabled by default.
funcWithDecodeBuffersBelow¶added inv1.15.11
WithDecodeBuffersBelow will fully decode readers that have a`Bytes() []byte` and `Len() int` interface similar to bytes.Buffer.This typically uses less allocations but will have the full decompressed object in memory.Note that DecodeAllCapLimit will disable this, as well as giving a size of 0 or less.Default is 128KiB.
funcWithDecoderConcurrency¶
WithDecoderConcurrency sets the number of created decoders.When decoding block with DecodeAll, this will limit the numberof possible concurrently running decodes.When decoding streams, this will limit the number ofinflight blocks.When decoding streams and setting maximum to 1,no async decoding will be done.When a value of 0 is provided GOMAXPROCS will be used.By default this will be set to 4 or GOMAXPROCS, whatever is lower.
funcWithDecoderDictRaw¶added inv1.15.15
WithDecoderDictRaw registers a dictionary that may be used by the decoder.The slice content can be arbitrary data.
funcWithDecoderDicts¶added inv1.10.9
WithDecoderDicts allows to register one or more dictionaries for the decoder.
Each slice in dict must be in thedictionary format produced by"zstd --train" from the Zstandard reference implementation.
If several dictionaries with the same ID are provided, the last one will be used.
funcWithDecoderLowmem¶
WithDecoderLowmem will set whether to use a lower amount of memory,but possibly have to allocate more while running.
funcWithDecoderMaxMemory¶
WithDecoderMaxMemory allows to set a maximum decoded size for in-memorynon-streaming operations or maximum window size for streaming operations.This can be used to control memory usage of potentially hostile content.Maximum is 1 << 63 bytes. Default is 64GiB.
funcWithDecoderMaxWindow¶added inv1.13.1
WithDecoderMaxWindow allows to set a maximum window size for decodes.This allows rejecting packets that will cause big memory usage.The Decoder will likely allocate more memory based on the WithDecoderLowmem setting.If WithDecoderMaxMemory is set to a lower value, that will be used.Default is 512MB, Maximum is ~3.75 TB as per zstandard spec.
typeDecoder¶
type Decoder struct {// contains filtered or unexported fields}Decoder provides decoding of zstandard streams.The decoder has been designed to operate without allocations after a warmup.This means that you should store the decoder for best performance.To re-use a stream decoder, use the Reset(r io.Reader) error to switch to another stream.A decoder can safely be re-used even if the previous stream failed.To release the resources, you must call the Close() function on a decoder.
funcNewReader¶
NewReader creates a new decoder.A nil Reader can be provided in which case Reset can be used to start a decode.
A Decoder can be used in two modes:
1) As a stream, or2) For stateless decoding using DecodeAll.
Only a single stream can be decoded concurrently, but the same decodercan run multiple concurrent stateless decodes. It is even possible touse stateless decodes while a stream is being decoded.
The Reset function can be used to initiate a new stream, which will considerablyreduce the allocations normally caused by NewReader.
func (*Decoder)Close¶
func (d *Decoder) Close()
Close will release all resources.It is NOT possible to reuse the decoder after this.
func (*Decoder)DecodeAll¶
DecodeAll allows stateless decoding of a blob of bytes.Output will be appended to dst, so if the destination size is knownyou can pre-allocate the destination slice to avoid allocations.DecodeAll can be used concurrently.The Decoder concurrency limits will be respected.
func (*Decoder)IOReadCloser¶added inv1.9.5
func (d *Decoder) IOReadCloser()io.ReadCloser
IOReadCloser returns the decoder as an io.ReadCloser for convenience.Any changes to the decoder will be reflected, so the returned ReadClosercan be reused along with the decoder.io.WriterTo is also supported by the returned ReadCloser.
func (*Decoder)Read¶
Read bytes from the decompressed stream into p.Returns the number of bytes read and any error that occurred.When the stream is done, io.EOF will be returned.
func (*Decoder)Reset¶
Reset will reset the decoder the supplied stream after the current has finished processing.Note that this functionality cannot be used after Close has been called.Reset can be called with a nil reader to release references to the previous reader.After being called with a nil reader, no other operations than Reset or DecodeAll or Closeshould be used.
typeEOption¶added inv1.6.0
type EOption func(*encoderOptions)error
EOption is an option for creating a encoder.
funcWithAllLitEntropyCompression¶added inv1.10.10
WithAllLitEntropyCompression will apply entropy compression if no matches are found.Disabling this will skip incompressible data faster, but in cases with no matches butskewed character distribution compression is lost.Default value depends on the compression level selected.
funcWithEncoderCRC¶added inv1.6.0
WithEncoderCRC will add CRC value to output.Output will be 4 bytes larger.
funcWithEncoderConcurrency¶added inv1.6.0
WithEncoderConcurrency will set the concurrency,meaning the maximum number of encoders to run concurrently.The value supplied must be at least 1.For streams, setting a value of 1 will disable async compression.By default this will be set to GOMAXPROCS.
funcWithEncoderDict¶added inv1.11.0
WithEncoderDict allows to register a dictionary that will be used for the encode.
The slice dict must be in thedictionary format produced by"zstd --train" from the Zstandard reference implementation.
The encoder *may* choose to use no dictionary instead for certain payloads.
funcWithEncoderDictRaw¶added inv1.15.15
WithEncoderDictRaw registers a dictionary that may be used by the encoder.
The slice content may contain arbitrary data. It will be used as an initialhistory.
Example¶
package mainimport ("bytes""fmt""github.com/klauspost/compress/zstd")func main() {// "Raw" dictionaries can be used for compressed delta encoding.source := []byte(`This is the source file. Compression of the target file withthe source file as the dictionary will produce a compresseddelta encoding of the target file.`)target := []byte(`This is the target file. Decompression of the delta encoding withthe source file as the dictionary will produce this file.`)// The dictionary id is arbitrary. We use zero for compatibility// with zstd --patch-from, but applications can use any id// not in the range [32768, 1<<31).const id = 0bestLevel := zstd.WithEncoderLevel(zstd.SpeedBestCompression)w, _ := zstd.NewWriter(nil, bestLevel,zstd.WithEncoderDictRaw(id, source))delta := w.EncodeAll(target, nil)r, _ := zstd.NewReader(nil, zstd.WithDecoderDictRaw(id, source))out, err := r.DecodeAll(delta, nil)if err != nil || !bytes.Equal(out, target) {panic("decoding error")}// Ordinary compression, for reference.w, _ = zstd.NewWriter(nil, bestLevel)compressed := w.EncodeAll(target, nil)// Check that the delta is at most half as big as the compressed file.fmt.Println(len(delta) < len(compressed)/2)}Output:true
funcWithEncoderLevel¶added inv1.7.0
func WithEncoderLevel(lEncoderLevel)EOption
WithEncoderLevel specifies a predefined compression level.
funcWithEncoderPadding¶added inv1.6.1
WithEncoderPadding will add padding to all output so the size will be a multiple of n.This can be used to obfuscate the exact output size or make blocks of a certain size.The contents will be a skippable frame, so it will be invisible by the decoder.n must be > 0 and <= 1GB, 1<<30 bytes.The padded area will be filled with data from crypto/rand.Reader.If `EncodeAll` is used with data already in the destination, the total size will be multiple of this.
funcWithLowerEncoderMem¶added inv1.11.13
WithLowerEncoderMem will trade in some memory cases trade less memory usage forslower encoding speed.This will not change the window size which is the primary function for reducingmemory usage. See WithWindowSize.
funcWithNoEntropyCompression¶added inv1.9.4
WithNoEntropyCompression will always skip entropy compression of literals.This can be useful if content has matches, but unlikely to benefit from entropycompression. Usually the slight speed improvement is not worth enabling this.
funcWithSingleSegment¶added inv1.6.0
WithSingleSegment will set the "single segment" flag when EncodeAll is used.If this flag is set, data must be regenerated within a single continuous memory segment.In this case, Window_Descriptor byte is skipped, but Frame_Content_Size is necessarily present.As a consequence, the decoder must allocate a memory segment of size equal or larger than size of your content.In order to preserve the decoder from unreasonable memory requirements,a decoder is allowed to reject a compressed frame which requests a memory size beyond decoder's authorized range.For broader compatibility, decoders are recommended to support memory sizes of at least 8 MB.This is only a recommendation, each decoder is free to support higher or lower limits, depending on local limitations.If this is not specified, block encodes will automatically choose this based on the input size and the window size.This setting has no effect on streamed encodes.
funcWithWindowSize¶added inv1.8.3
WithWindowSize will set the maximum allowed back-reference distance.The value must be a power of two between MinWindowSize and MaxWindowSize.A larger value will enable better compression but allocate more memory and,for above-default values, take considerably longer.The default value is determined by the compression level and max 8MB.
funcWithZeroFrames¶added inv1.8.2
WithZeroFrames will encode 0 length input as full frames.This can be needed for compatibility with zstandard usage,but is not needed for this package.
typeEncoder¶added inv1.6.0
type Encoder struct {// contains filtered or unexported fields}Encoder provides encoding to Zstandard.An Encoder can be used for either compressing a stream via theio.WriteCloser interface supported by the Encoder or as multiple independenttasks via the EncodeAll function.Smaller encodes are encouraged to use the EncodeAll function.Use NewWriter to create a new instance.
funcNewWriter¶added inv1.6.0
NewWriter will create a new Zstandard encoder.If the encoder will be used for encoding blocks a nil writer can be used.
func (*Encoder)Close¶added inv1.6.0
Close will flush the final output and close the stream.The function will block until everything has been written.The Encoder can still be re-used after calling this.
func (*Encoder)EncodeAll¶added inv1.6.0
EncodeAll will encode all input in src and append it to dst.This function can be called concurrently, but each call will only run on a single goroutine.If empty input is given, nothing is returned, unless WithZeroFrames is specified.Encoded blocks can be concatenated and the result will be the combined input stream.Data compressed with EncodeAll can be decoded with the Decoder,using either a stream or DecodeAll.
func (*Encoder)Flush¶added inv1.6.0
Flush will send the currently written data to outputand block until everything has been written.This should only be used on rare occasions where pushing the currently queued data is critical.
func (*Encoder)MaxEncodedSize¶added inv1.15.13
MaxEncodedSize returns the expected maximumsize of an encoded block or stream.
func (*Encoder)ReadFrom¶added inv1.6.0
ReadFrom reads data from r until EOF or error.The return value n is the number of bytes read.Any error except io.EOF encountered during the read is also returned.
The Copy function uses ReaderFrom if available.
func (*Encoder)Reset¶added inv1.6.0
Reset will re-initialize the writer and new writes will encode to the supplied writeras a new, independent stream.
func (*Encoder)ResetContentSize¶added inv1.13.4
ResetContentSize will reset and set a content size for the next stream.If the bytes written does not match the size given an error will be returnedwhen calling Close().This is removed when Reset is called.Sizes <= 0 results in no content size set.
typeEncoderLevel¶added inv1.7.0
type EncoderLevelint
EncoderLevel predefines encoder compression levels.Only use the constants made available, since the actual mappingof these values are very likely to change and your compression could changeunpredictably when upgrading the library.
const (// SpeedFastest will choose the fastest reasonable compression.// This is roughly equivalent to the fastest Zstandard mode.SpeedFastest EncoderLevel// SpeedDefault is the default "pretty fast" compression option.// This is roughly equivalent to the default Zstandard mode (level 3).SpeedDefault// SpeedBetterCompression will yield better compression than the default.// Currently it is about zstd level 7-8 with ~ 2x-3x the default CPU usage.// By using this, notice that CPU usage may go up in the future.SpeedBetterCompression// SpeedBestCompression will choose the best available compression option.// This will offer the best compression no matter the CPU cost.SpeedBestCompression)
funcEncoderLevelFromString¶added inv1.7.0
func EncoderLevelFromString(sstring) (bool,EncoderLevel)
EncoderLevelFromString will convert a string representation of an encoding level backto a compression level. The compare is not case sensitive.If the string wasn't recognized, (false, SpeedDefault) will be returned.
funcEncoderLevelFromZstd¶added inv1.7.0
func EncoderLevelFromZstd(levelint)EncoderLevel
EncoderLevelFromZstd will return an encoder level that closest matches the compressionratio of a specific zstd compression level.Many input values will provide the same compression level.
func (EncoderLevel)String¶added inv1.7.0
func (eEncoderLevel) String()string
String provides a string representation of the compression level.
typeHeader¶added inv1.11.4
type Header struct {// SingleSegment specifies whether the data is to be decompressed into a// single contiguous memory segment.// It implies that WindowSize is invalid and that FrameContentSize is valid.SingleSegmentbool// WindowSize is the window of data to keep while decoding.// Will only be set if SingleSegment is false.WindowSizeuint64// Dictionary ID.// If 0, no dictionary.DictionaryIDuint32// HasFCS specifies whether FrameContentSize has a valid value.HasFCSbool// FrameContentSize is the expected uncompressed size of the entire frame.FrameContentSizeuint64// Skippable will be true if the frame is meant to be skipped.// This implies that FirstBlock.OK is false.Skippablebool// SkippableID is the user-specific ID for the skippable frame.// Valid values are between 0 to 15, inclusive.SkippableIDint// SkippableSize is the length of the user data to skip following// the header.SkippableSizeuint32// HeaderSize is the raw size of the frame header.//// For normal frames, it includes the size of the magic number and// the size of the header (per section 3.1.1.1).// It does not include the size for any data blocks (section 3.1.1.2) nor// the size for the trailing content checksum.//// For skippable frames, this counts the size of the magic number// along with the size of the size field of the payload.// It does not include the size of the skippable payload itself.// The total frame size is the HeaderSize plus the SkippableSize.HeaderSizeint// First block information.FirstBlock struct {// OK will be set if first block could be decoded.OKbool// Is this the last block of a frame?Lastbool// Is the data compressed?// If true CompressedSize will be populated.// Unfortunately DecompressedSize cannot be determined// without decoding the blocks.Compressedbool// DecompressedSize is the expected decompressed size of the block.// Will be 0 if it cannot be determined.DecompressedSizeint// CompressedSize of the data in the block.// Does not include the block header.// Will be equal to DecompressedSize if not Compressed.CompressedSizeint}// If set there is a checksum present for the block content.// The checksum field at the end is always 4 bytes long.HasCheckSumbool}Header contains information about the first frame and block within that.
func (*Header)AppendTo¶added inv1.17.5
AppendTo will append the encoded header to the dst slice.There is no error checking performed on the header values.
func (*Header)Decode¶added inv1.11.4
Decode the header from the beginning of the stream.This will decode the frame header and the first block header if enough bytes are provided.It is recommended to provide at least HeaderMaxSize bytes.If the frame header cannot be read an error will be returned.If there isn't enough input, io.ErrUnexpectedEOF is returned.The FirstBlock.OK will indicate if enough information was available to decode the first block header.
func (*Header)DecodeAndStrip¶added inv1.17.5
DecodeAndStrip will decode the header from the beginning of the streamand on success return the remaining bytes.This will decode the frame header and the first block header if enough bytes are provided.It is recommended to provide at least HeaderMaxSize bytes.If the frame header cannot be read an error will be returned.If there isn't enough input, io.ErrUnexpectedEOF is returned.The FirstBlock.OK will indicate if enough information was available to decode the first block header.
typeSnappyConverter¶added inv1.6.0
type SnappyConverter struct {// contains filtered or unexported fields}SnappyConverter can read SnappyConverter-compressed streams and convert them to zstd.Conversion is done by converting the stream directly from Snappy without intermediatefull decoding.Therefore the compression ratio is much less than what can be done by a full decompressionand compression, and a faulty Snappy stream may lead to a faulty Zstandard stream withoutany errors being generated.No CRC value is being generated and not all CRC values of the Snappy stream are checked.However, it provides really fast recompression of Snappy streams.The converter can be reused to avoid allocations, even after errors.
Source Files¶
- bitreader.go
- bitwriter.go
- blockdec.go
- blockenc.go
- blocktype_string.go
- bytebuf.go
- bytereader.go
- decodeheader.go
- decoder.go
- decoder_options.go
- dict.go
- enc_base.go
- enc_best.go
- enc_better.go
- enc_dfast.go
- enc_fast.go
- encoder.go
- encoder_options.go
- framedec.go
- frameenc.go
- fse_decoder.go
- fse_decoder_amd64.go
- fse_encoder.go
- fse_predefined.go
- hash.go
- history.go
- matchlen_amd64.go
- seqdec.go
- seqdec_amd64.go
- seqenc.go
- simple_go124.go
- snappy.go
- zip.go
- zstd.go