Movatterモバイル変換


[0]ホーム

URL:


Home » Posts

Firefox's Optimized Zip Format: Reading Zip Files Really Quickly

November 22, 2021 · 3 min · Taras Glek | Suggest Changes
Table of Contents

This post is about minimizing amount of disk IO and CPU overhead when reading Zip files.

I recently saw an article about anew format that was faster than zip.

This is quite surprising as to my mind, zip is one of the most flexible and low-overhead formats I’ve encountered.

Some googling showed me that over past 11 years people havenoticed that Firefox usesoptimized zip files. This inspired me to document thinking behind the optimized zip format I implemented in Firefox in the pre-pandemic 2010. I had a lot of fun writing this code, was surprised that I failed to blog about it.

Zip format

The following diagram is borrowed fromcodeproject article.Wikipedia andZIP specification are also helpful.

zip structure

Zip files seem to be designed for cheap appends. For every file inside a zip, there is aLocal Header + optionally compressedfile # contents.

This is followed by aCentral Directory which acts as an index for zip contents.

To extract a file from zip file one must:

  1. Scan backwards through zip file forEnd of Central Directory marker.
  2. Read offset to begining ofCentral Directory
  3. Find relevantLocal Header offset inCentral Directory index
  4. Read + optionally decompress stored file.

Writing Optimized Zip Files

In order to optimize file IO on Firefox startup I wanted to make use of OS readahead1.

Unfortunately reading files started from ending precludes readahead. It is also suboptimal to read files from zip in random order.

The following creative interpretation of Zip spec results in optimized zip files:

  1. Since we already do PGO2 for Firefox builds, I added aZipArchiveLogger for logging zip-entries being accessed to the Firefox profiling stage.

  2. Then during the build phase, I addedoptimizejars.py3to move theCentral Directory

  3. Additionallyoptimizejars.py would lay out zip entries in order specified byZipArchiveLogger log.

  4. Wrote down length ofCentral Directory + entries in step 3.

Zip file now looks like: | 4 bytes |Central Directory |Hot Files |Cold Files |End Of Central Directory Marker |.

Thus we have a sequentual-read-friendly zip file that can still be ready by zip tools that follow the spec.

Reading Optimized Zip Files

  1. Speculatively check if we can find theCentral Directory signature 4 bytes in.
  2. If step 1 succeeded, assume first 4 bytes are our read-ahead length. Issue a platform-specific read-ahead call.

So for all cases where zip file access pattern matches one recorded during profiling phase Firefox can read the relevant resources in a single IO!

Further Zip Trivia


Comment on Twitter



  1. https://www.kernel.org/doc/ols/2007/ols2007v2-pages-273-284.pdf ↩︎

  2. https://en.wikipedia.org/wiki/Profile-guided_optimization ↩︎

  3. https://github.com/humphd/mozilla-central-old/blob/9d4d9f265e24e6358c067ae1e300c1ce3227a91d/config/optimizejars.py. This code later got subsumed intomozpack ↩︎


[8]ページ先頭

©2009-2025 Movatter.jp