hyparam/hyparquet-compressorsPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star9

Decompressors for hyparquet

License

MIT license

9 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
hyparquet-compressors.jpg		hyparquet-compressors.jpg
package.json		package.json
rollup.config.js		rollup.config.js
tsconfig.json		tsconfig.json

Repository files navigation

hyparquet decompressors

This package provides decompressors for various compression codecs.It is designed to be used withhyparquet in order to provide full support for all parquet compression formats.

Introduction

Apache Parquet is a popular columnar storage format that is widely used in data engineering, data science, and machine learning applications for efficiently storing and processing large datasets. It supports a number of different compression formats, but most parquet files use snappy compression.

Hyparquet is a fast and lightweight parquet reader that is designed to work in both node.js and the browser.

By default, hyparquet only supportsuncompressed andsnappy compressed files (the most common parquet compression codecs). Thehyparquet-compressors package extends support for all legal parquet compression formats.

hyparquet-compressors works in both node.js and the browser. Uses js and wasm packages, no system dependencies.

Hyparquet

To usehyparquet-compressors withhyparquet, simply pass thecompressors object to theparquetReadObjects function.

import{parquetReadObjects}from'hyparquet'import{compressors}from'hyparquet-compressors'constdata=awaitparquetReadObjects({ file, compressors})

Seehyparquet repo for more info.

Compression formats

Parquet compression types supported withhyparquet-compressors:

Uncompressed
Snappy
Gzip
LZO
Brotli
LZ4
ZSTD
LZ4_RAW

Snappy

Snappy compression useshysnappy for fast snappy decompression using a minimalWASM module.

We load the wasm modulesynchronously from base64 in the js file. This avoids a network request, and greatly simplifies bundling and serving wasm.

Gzip

New gzip implementation adapted fromfflate.Includes modifications to handle repeated back-to-back gzip streams that sometimes occur in parquet files (but are not supported by fflate).

For gzip, theoutput buffer argument is optional:

Ifoutput is defined, the decompressor will write tooutput until it is full.
Ifoutput is undefined, the decompressor will allocate a new buffer, and expand it as needed to fit the uncompressed gzip data. Importantly, the caller should use thereturned buffer.

Brotli

Includes a minimal port ofbrotli.js.Our implementation uses gzip to pre-compress the brotli dictionary, in order to minimize the bundle size.

LZ4

New LZ4 implementation includes support for legacy hadoop LZ4 frame format used on some old parquet files.

Zstd

Usesfzstd for Zstandard decompression.

Bundle size

File	Size
hyparquet-compressors.min.js	116.4kb
hyparquet-compressors.min.js.gz	75.2kb

References

About

Decompressors for hyparquet

Releases

9tags

Packages

No packages published

Languages

JavaScript100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

hyparquet decompressors

Introduction

Hyparquet

Compression formats

Snappy

Gzip

Brotli

LZ4

Zstd

Bundle size

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

hyparam/hyparquet-compressors

Folders and files

Latest commit

History

Repository files navigation

hyparquet decompressors

Introduction

Hyparquet

Compression formats

Snappy

Gzip

Brotli

LZ4

Zstd

Bundle size

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages