- Notifications
You must be signed in to change notification settings - Fork32
Snzip, a compression/decompression tool based on snappy
License
kubo/snzip
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Snzip is one of command line tools usingsnappy. This supports several fileformats;framing-format,old framing-format,hadoop-snappy format,raw formatand obsolete three formats used by snzip,snappy-java andsnappy-in-javabefore official framing-format was defined. The default format isframing-format.
The default format was changed toframing-format in 1.0.0.Set--with-default-format=snzip
as a configure option to use obsolete snzipformat as the default format as before.
Download snzip-1.0.5.tar.gz fromhttps://github.com/kubo/snzip/releases,uncompress and untar it, and run configure.
tar xvfz snzip-1.0.5.tar.gzcd snzip-1.0.5./configuremakemake install
If you didn't install snappy under/usr
or/usr/local
, you need to specifythe location by--with-snappy
as follows.
# install snziptar xvfz snzip-1.0.5.tar.gzcd snzip-1.0.5./configure --with-snappy=/xxx/yyy/makemake install
When both dynamic and static snappy libraries are available, the formeris used by default. The compiledsnzip
depends onlibsnappy.so
.When--with-static-snappy
is passed as a configure option, the latteris used. The compiledsnzip
includes snappy library.
Note:--with-static-snappy
isn't available on some platforms.
You can use--with-default-format
to change the default compression format.
./configure --with-default-format=snzip
We don't provide rpm packages. You need to download snzip-1.0.5.tar.gzfromhttps://github.com/kubo/snzip/releases, create a rpm package as follows andinstall it.
# The rpm package will be created under $HOME/rpmbuild/RPMS.rpmbuild -tb snzip-1.0.5.tar.gz
To use source code in the github repository.
git clone git://github.com/kubo/snzip.gitcd snzip./autogen.sh./configuremakemake install
Downloadsnzip-1.0.5-win32.zip
orsnzip-1.0.5-win64.zip
fromhttps://github.com/kubo/snzip/releases and copysnzip.exe
andsnunzip.exe
to a directory in the PATH environment variable.
snzip file.tar
Compressed file name isfile.tar.sz
and the original file is deleted.The file attributes such as timestamp, mode and permissions are not changedas possible as it can.
The compressed file's format isframing-format. You need to add an option-t snappy-java
or-t snappy-in-java
to use other formats.
snzip -t snappy-java file.tar
or
snzip -t snappy-in-java file.tar
snzip -c file.tar > file.tar.sz
or
cat file.tar | snzip > file.tar.sz
You need to add an option-t [format-name]
to use formats exceptframing-format.
tar cf - files-to-be-archived | snzip > archive.tar.sz
snzip -d file.tar.sz
or
snunzip file.tar.sz
Uncompressed file name isfile.tar
and the original file is deleted.The file attributes such as timestamp, mode and permissions are not changedas possible as it can.
If the program name includesun
such assnunzip
, it acts as-d
is set.
The file format is automatically determined from the file header.However it doesn't work for some file formats such as raw and Apple iWork .iwa.
snzip -dc file.tar.sz > file.tarsnunzip -c file.tar.sz > file.tarsnzcat file.tar.sz > file.tarcat file.tar.sz | snzcat > file.tar
If the program name includescat
such as snzcat, it acts as-dc
is set.
snzip -dc archive.tar.sz | tar xf -
Raw format is native format of snappy.Unlike other formats, there are a few limitations:(1) The total data length before compression must be known on compression.(2) Automatic file format detection doesn't work on uncompression.(3) The raw format support is enabled only when snzip is compiled for snappy 1.1.3 or upper.
snzip -t raw file.tar
or
snzip -t raw < file.tar > file.tar.raw
In these examples, snzip uses a file descriptor, which directly opensthefile.tar
file, and gets the file length to be compressed.However the following command doesn't work.
cat file.tar | snzip -t raw > file.tar.raw
It uses a pipe. snzip cannot get the total length before compression.The total length must be specified by the-s
option in this case.
cat file.tar | snzip -t raw -s "size of file.tar" > file.tar.raw
snzip -t raw -d file.tar.sz
or
snunzip -t raw file.tar.sz
You need to set the-t raw
option to tell snzip the format of thefile to be uncompressed.
Hadoop-snappy format is one of the compression formats used in Hadoop.It uses its own framing format as follows:
- A compressed file consists of one or more blocks.
- A block consists of uncompressed length (big endian 4 byte integer) and one or more subblocks.
- A subblock consists of compressed length (big endian 4 byte integer) and raw compressed data.
snzip -t hadoop-snappy file_name
The default block size used bysnzip
for hadoop-snappy format is 256k.It is same with the default value of theio.compression.codec.snappy.buffersize
parameter. If the block size used bysnzip
is larger than the parameter,you would get an InternalErrorCould not decompress data. Buffer length is too small
while hadoop is reading a file compressed by snzip. You need to change the blocksize by the-b
option as follows if you get the error.
# if io.compression.codec.snappy.buffersize is 32768snzip -t hadoop-snappy -b 32768 file_name_to_be_compressed
snzip -d compressed_file.snappy
The file format is guessed by the first 8 bytes of the file.
Apple iWork .iwa format is a file format used by Apple iWork. The format wasdemystifiedhere.Basically the .iwa format consists of a Protobuf streamcompressed by Snappy.
Snzip uncompresses .iwa files to Protbuf streams and compresses Protobuf streamsto .iwa files. You need to set-t iwa
on compression and uncompression tospecify the file format.
Note: This is obsolete format. The default format was changed toframing-format.
The first three bytes are magic characters 'SNZ'.
The fourth byte is the file format version. It is 0x01.
The fifth byte is the order of the block size. The input datais divided into fixed-length blocks and each block is compressedby snappy. When it is 16 (default value), the block size is 16thpower of 2; 64 kilobytes.
The rest is pairs of a compressed data length and a compressed data blockThe compressed data length is encoded assnappy::Varint::Encode32()
does.If the length is zero, it is the end of data.
Though the rest after the end of data is ignored for now, theymay be continuously read as a next compressed file as gzip does.
Note that the uncompressed length of each compressed data block must beless than or equal to the block size specified by the fifth byte.
2-clause BSD-style license.
About
Snzip, a compression/decompression tool based on snappy