- Notifications
You must be signed in to change notification settings - Fork6
Closed
Description
CurrentlyRAthena andnoctua support gzip compression when uploading data to S3 and Athena. Is there a better compression algorithm for flat files?Top 10 Performance Tuning Tips for Amazon Athena
| Algorithm | Splittable? | Compression ratio | Compress + Decompress speed |
|---|---|---|---|
| Gzip (DEFLATE) | No | High | Medium |
| bzip2 | Yes | Very high | Slow |
| LZO | No | Low | Fast |
| Snappy | No | Low | Very fast |
For Athena, we recommend using either Apache Parquet or Apache ORC, which compress data by default and are splittable. When they are not an option, then try BZip2 or Gzip with an optimal file size.
From this it looks like BZIP2/GZIP are currently recommended. Might need to benchmark speed of BZip2 and GZIP files when reading from Athena