Firefox's Optimized Zip Format: Reading Zip Files Really Quickly

November 22, 2021 · 3 min · Taras Glek | Suggest Changes

Table of Contents

This post is about minimizing amount of disk IO and CPU overhead when reading Zip files.

I recently saw an article about anew format that was faster than zip.

This is quite surprising as to my mind, zip is one of the most flexible and low-overhead formats I’ve encountered.

Some googling showed me that over past 11 years people havenoticed that Firefox usesoptimized zip files. This inspired me to document thinking behind the optimized zip format I implemented in Firefox in the pre-pandemic 2010. I had a lot of fun writing this code, was surprised that I failed to blog about it.

Zip format

The following diagram is borrowed fromcodeproject article.Wikipedia andZIP specification are also helpful.

zip structure

Zip files seem to be designed for cheap appends. For every file inside a zip, there is aLocal Header + optionally compressedfile # contents.

This is followed by aCentral Directory which acts as an index for zip contents.

To extract a file from zip file one must:

Scan backwards through zip file forEnd of Central Directory marker.
Read offset to begining ofCentral Directory
Find relevantLocal Header offset inCentral Directory index
Read + optionally decompress stored file.

Writing Optimized Zip Files

In order to optimize file IO on Firefox startup I wanted to make use of OS readahead¹.

Unfortunately reading files started from ending precludes readahead. It is also suboptimal to read files from zip in random order.

The following creative interpretation of Zip spec results in optimized zip files:

Since we already do PGO² for Firefox builds, I added aZipArchiveLogger for logging zip-entries being accessed to the Firefox profiling stage.
Then during the build phase, I addedoptimizejars.py³to move theCentral Directory
Additionallyoptimizejars.py would lay out zip entries in order specified byZipArchiveLogger log.
Wrote down length ofCentral Directory + entries in step 3.

Thus we have a sequentual-read-friendly zip file that can still be ready by zip tools that follow the spec.

Reading Optimized Zip Files

Speculatively check if we can find theCentral Directory signature 4 bytes in.
If step 1 succeeded, assume first 4 bytes are our read-ahead length. Issue a platform-specific read-ahead call.

So for all cases where zip file access pattern matches one recorded during profiling phase Firefox can read the relevant resources in a single IO!

Further Zip Trivia

https://bugzilla.mozilla.org/show_bug.cgi?id=701875 renamed .jar files .ja so Microsoft System restore wouldn’t corrupt Firefox.
At the time optimized jar change broke antivirus scanners, which further sped up Firefox startup :)
Zip reader in Firefox uses mmap thus zip files can be nested with 0 overhead (without compression). This also broke OS/2 support as OS/2 did not have a concept of memory mapping.
This work would not have worked-out without massive amounts of help and follow-on work from Mike Hommey, Michael Wu and Robert Kaiser.

Comment on Twitter

Movatterモバイル変換

Zip format#

Writing Optimized Zip Files#

Reading Optimized Zip Files#

Further Zip Trivia#

Zip format

Writing Optimized Zip Files

Reading Optimized Zip Files

Further Zip Trivia