- Notifications
You must be signed in to change notification settings - Fork66
A fast high compression read-only file system for Linux, Windows and macOS
License
mhx/dwarfs
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
TheDeduplicatingWarp-speedAdvancedRead-onlyFileSystem.
A fast high compression read-only file system for Linux and Windows.
- Overview
- History
- Building and Installing
- Usage
- Using the Libraries
- Windows Support
- macOS Support
- Use Cases
- Dealing with Bit Rot
- Extended Attributes
- Comparison
- Performance Monitoring
- Other Obscure Features
- Stargazers over Time
DwarFS is a read-only file system with a focus on achievingveryhigh compression ratios in particular for very redundant data.
This probably doesn't sound very exciting, because if it's redundant,itshould compress well. However, I found that other read-only,compressed file systems don't do a very good job at making use ofthis redundancy. Seehere for a comparison with othercompressed file systems.
DwarFS alsodoesn't compromise on speed and for my use cases I'vefound it to be on par with or perform better than SquashFS. For myprimary use case,DwarFS compression is an order of magnitude betterthan SquashFS compression, it's6 times faster to build the filesystem, it's typically faster to access files on DwarFS and it usesless CPU resources.
To give you an idea of what DwarFS is capable of, here's a quick comparisonof DwarFS and SquashFS on a set of video files with a total size of 39 GiB.The twist is that each unique video file has two sibling files with adifferent set of audio streams (this isan actual use case).So there's redundancy in both the video and audio data, but as the streamsare interleaved and identical blocks are typically very far apart, it'schallenging to make use of that redundancy for compression. SquashFSessentially fails to compress the source data at all, whereas DwarFS isable to reduce the size by almost a factor of 3, which is close to thetheoretical maximum:
$ du -hs dwarfs-video-test39G dwarfs-video-test$ ls -lh dwarfs-video-test.*fs-rw-r--r-- 1 mhx users 14G Jul 2 13:01 dwarfs-video-test.dwarfs-rw-r--r-- 1 mhx users 39G Jul 12 09:41 dwarfs-video-test.squashfs
Furthermore, when mounting the SquashFS image and performing a random-readthroughput test usingfio-3.34, bothsquashfuse
andsquashfuse_ll
top out at around 230 MiB/s:
$ fio --readonly --rw=randread --name=randread --bs=64k --direct=1 \ --opendir=mnt --numjobs=4 --ioengine=libaio --iodepth=32 \ --group_reporting --runtime=60 --time_based[...] READ: bw=230MiB/s (241MB/s), 230MiB/s-230MiB/s (241MB/s-241MB/s), io=13.5GiB (14.5GB), run=60004-60004msec
In comparison, DwarFS manages to sustainrandom read rates of 20 GiB/s:
READ: bw=20.2GiB/s (21.7GB/s), 20.2GiB/s-20.2GiB/s (21.7GB/s-21.7GB/s), io=1212GiB (1301GB), run=60001-60001msec
Distinct features of DwarFS are:
Clustering of files by similarity using a similarity hash function.This makes it easier to exploit the redundancy across file boundaries.
Segmentation analysis across file system blocks in order to reducethe size of the uncompressed file system. This saves memory whenusing the compressed file system and thus potentially allows forhigher cache hit rates as more data can be kept in the cache.
Categorization framework to categorizefiles or even fragments of files and then process individual categoriesdifferently. For example, this allows you to not waste time trying tocompress incompressible files or to compress PCM audio data using FLACcompression.
Highly multi-threaded implementation. Both thefile system creation tool as well as theFUSE driver are able to make good use of themany cores of your system.
I started working on DwarFS in 2013 and my main use case and majormotivation was that I had several hundred different versions of Perlthat were taking up something around 30 gigabytes of disk space, andI was unwilling to spend more than 10% of my hard drive keeping themaround for when I happened to need them.
Up until then, I had been usingCromfsfor squeezing them into a manageable size. However, I was getting moreand more annoyed by the time it took to build the filesystem imageand, to make things worse, more often than not it was crashing afterabout an hour or so.
I had obviously also looked intoSquashFS,but never got anywhere close to the compression rates of Cromfs.
This alone wouldn't have been enough to get me into writing DwarFS,but at around the same time, I was pretty obsessed with the recentdevelopments and features of newer C++ standards and really wanteda C++ hobby project to work on. Also, I've wanted to do somethingwithFUSEfor quite some time. Last but not least, I had been thinking aboutthe problem of compressed file systems for a bit and had some ideasthat I definitely wanted to try.
The majority of the code was written in 2013, then I did a coupleof cleanups, bugfixes and refactors every once in a while, but Inever really got it to a state where I would feel happy releasingit. It was too awkward to build with its dependency on Facebook's(quite awesome)folly libraryand it didn't have any documentation.
Digging out the project again this year, things didn't look as grimas they used to. Folly now builds with CMake and so I just pulledit in as a submodule. Most other dependencies can be satisfiedfrom packages that should be widely available. And I've writtensome rudimentary docs as well.
DwarFS should usually build fine with minimal changes out of the box.If it doesn't, please file a issue. I've set upCI jobsusing Docker images for Ubuntu (22.04and24.04),Fedora RawhideandArchthat can help with determining an up-to-date set of dependencies.Note that building from the release tarball requires less dependenciesthan building from the git repository, notably theronn
tool as wellas Python and themistletoe
Python module are not required whenbuilding from the release tarball.
There are some things to be aware of:
There's a tendency to try and unbundle thefollyandfbthrift libraries thatare included as submodules and are built along with DwarFS.While I agree with the sentiment, it's unfortunately a bad idea.Besides the fact that folly does not make any claims about ABIstability (i.e. you can't just dynamically link a binary builtagainst one version of folly against another version), it's noteven possible to safely link against a folly library built withdifferent compile options. Even subtle differences, such as theC++ standard version, can cause run-time errors.Seethis issuefor details. Currently, it is not even possible to use externalversions of folly/fbthrift as DwarFS is building minimal subsets ofboth libraries; these are bundled in the
dwarfs_common
libraryand they are strictly used internally, i.e. none of the folly orfbthrift headers are required to build against DwarFS' libraries.Similar issues can arise when using a system-installed versionof GoogleTest. GoogleTest itself recommends that it is beingdownloaded as part of the build. However, you can use the systeminstalled version by passing
-DPREFER_SYSTEM_GTEST=ON
to thecmake
call. Use at your own risk.For other bundled libraries (namely
fmt
,parallel-hashmap
,range-v3
), the system installed version is used as long as itmeets the minimum required version. Otherwise, the preferredversion is fetched during the build.
Each release has pre-built,statically linked binaries forLinux-x86_64
,Linux-aarch64
andWindows-AMD64
available for download. Theseshould run withoutany dependencies and can be useful especially on older distributionswhere you can't easily build the tools from source.
In addition to the binary tarballs, there's auniversal binaryavailable for each architecture. These universal binaries containall tools (mkdwarfs
,dwarfsck
,dwarfsextract
and thedwarfs
FUSE driver) in a single executable. These executables are compressedusingupx, so they are much smaller thanthe individual tools combined. However, it also means the binaries needto be decompressed each time they are run, which can have a significantoverhead. If that is an issue, you can either stick to the "classic"individual binaries or you can decompress the universal binary, e.g.:
upx -d dwarfs-universal-0.7.0-Linux-aarch64
The universal binaries can be run through symbolic links named afterthe proper tool. e.g.:
$ ln -s dwarfs-universal-0.7.0-Linux-aarch64 mkdwarfs$ ./mkdwarfs --help
This also works on Windows if the file system supports symbolic links:
> mklink mkdwarfs.exe dwarfs-universal-0.7.0-Windows-AMD64.exe> .\mkdwarfs.exe --help
Alternatively, you can select the tool by passing--tool=<name>
asthe first argument on the command line:
> .\dwarfs-universal-0.7.0-Windows-AMD64.exe --tool=mkdwarfs --help
Note that just like thedwarfs.exe
Windows binary, the universalWindows binary depends on thewinfsp-x64.dll
from theWinFsp project. However, for theuniversal binary, the DLL is loaded lazily, so you can still use allother tools without the DLL.See theWindows Support section for more details.
DwarFS usesCMake as a build tool.
It uses bothBoost andFolly, though the latter isincluded as a submodule since very few distributions actuallyoffer packages for it. Folly itself has a number of dependencies,so please checkherefor an up-to-date list.
It also usesFacebook Thrift,in particular thefrozen
library, for storing metadata in a highlyspace-efficient, memory-mappable and well defined format. It's alsoincluded as a submodule, and we only build the compiler and a veryreduced library that contains just enough for DwarFS to work.
Other than that, DwarFS really only depends on FUSE3 and on a setof compression libraries that Folly already depends on (namelylz4,zstdandliblzma).
The dependency ongoogletestwill be automatically resolved if you build with tests.
A good starting point for apt-based systems is probably:
$ apt install \ gcc \ g++ \ clang \ git \ ccache \ ninja-build \ cmake \ make \ bison \ flex \ fuse3 \ pkg-config \ binutils-dev \ libacl1-dev \ libarchive-dev \ libbenchmark-dev \ libboost-chrono-dev \ libboost-context-dev \ libboost-filesystem-dev \ libboost-iostreams-dev \ libboost-program-options-dev \ libboost-regex-dev \ libboost-system-dev \ libboost-thread-dev \ libbrotli-dev \ libevent-dev \ libhowardhinnant-date-dev \ libjemalloc-dev \ libdouble-conversion-dev \ libiberty-dev \ liblz4-dev \ liblzma-dev \ libzstd-dev \ libxxhash-dev \ libmagic-dev \ libparallel-hashmap-dev \ librange-v3-dev \ libssl-dev \ libunwind-dev \ libdwarf-dev \ libelf-dev \ libfmt-dev \ libfuse3-dev \ libgoogle-glog-dev \ libutfcpp-dev \ libflac++-dev \ nlohmann-json3-dev
Note that when building withgcc
, the optimization level will beset to-O2
instead of the CMake default of-O3
for releasebuilds. At least with versions up togcc-10
, the-O3
build isup to 70% slower than abuild with-O2
.
First, unpack the release archive:
$ tar xvf dwarfs-x.y.z.tar.xz$ cd dwarfs-x.y.z
Alternatively, you can also clone the git repository, but be awarethat this has more dependencies and the build will likely take longerbecause the release archive ships with most of the auto-generatedfiles that will have to be generated when building from the repository:
$ git clone --recurse-submodules https://github.com/mhx/dwarfs$ cd dwarfs
Once all dependencies have been installed, you can build DwarFSusing:
$ mkdir build$ cd build$ cmake .. -GNinja -DWITH_TESTS=ON$ ninja
You can then run tests with:
$ ctest -j
All binaries usejemallocas a memory allocator by default, as it is typically uses much lesssystem memory compared to theglibc
ortcmalloc
allocators.To disable the use ofjemalloc
, pass-DUSE_JEMALLOC=0
on thecmake
command line.
It is also possible to build/install the DwarFS libraries, tools,and FUSE driver independently. This is mostly interesting whenpackaging DwarFS. Note that the tools and FUSE driver require thelibraries to be either built or already installed. To build justthe libraries, use:
$ cmake .. -GNinja -DWITH_TESTS=ON -DWITH_LIBDWARFS=ON -DWITH_TOOLS=OFF -DWITH_FUSE_DRIVER=OFF
Once the libraries are tested and installed, you can build thetools (i.e.mkdwarfs
,dwarfsck
,dwarfsextract
) using:
$ cmake .. -GNinja -DWITH_TESTS=ON -DWITH_LIBDWARFS=OFF -DWITH_TOOLS=ON -DWITH_FUSE_DRIVER=OFF
To build the FUSE driver, use:
$ cmake .. -GNinja -DWITH_TESTS=ON -DWITH_LIBDWARFS=OFF -DWITH_TOOLS=OFF -DWITH_FUSE_DRIVER=ON
Installing is as easy as:
$ sudo ninja install
Though you don't have to install the tools to play with them.
Attempting to build statically linked binaries is highly discouragedand not officially supported. That being said, here's how to set upan environment where youmight be able to build static binaries.
This has been tested withubuntu-22.04-live-server-amd64.iso
. First,install all the packages listed as dependencies above. Also install:
$ apt install ccache ninja libacl1-dev
ccache
andninja
are optional, but help with a speedy compile.
Depending on your distribution, you'll need to build and install staticversions of some libraries, e.g.libarchive
andlibmagic
for Ubuntu:
$ wget https://github.com/libarchive/libarchive/releases/download/v3.6.2/libarchive-3.6.2.tar.xz$ tar xf libarchive-3.6.2.tar.xz && cd libarchive-3.6.2$ ./configure --prefix=/opt/static-libs --without-iconv --without-xml2 --without-expat$ make && sudo make install
$ wget ftp://ftp.astron.com/pub/file/file-5.44.tar.gz$ tar xf file-5.44.tar.gz && cd file-5.44$ ./configure --prefix=/opt/static-libs --enable-static=yes --enable-shared=no$ make && make install
That's it! Now you can try building static binaries for DwarFS:
$ git clone --recurse-submodules https://github.com/mhx/dwarfs$ cd dwarfs && mkdir build && cd build$ cmake .. -GNinja -DWITH_TESTS=ON -DSTATIC_BUILD_DO_NOT_USE=ON \ -DSTATIC_BUILD_EXTRA_PREFIX=/opt/static-libs$ ninja$ ninja test
Please check out the manual pages formkdwarfs,dwarfs,dwarfsck anddwarfsextract. You can also access the manualpages using the--man
option to each binary, e.g.:
$ mkdwarfs --man
Thedwarfs manual page also shows an example for settingup DwarFS withoverlayfsin order to create a writable file system mount on top a read-onlyDwarFS image.
A description of the DwarFS filesystem format can be found indwarfs-format.
A high-level overview of the internal operation ofmkdwarfs
is showninthis sequence diagram.
Using the DwarFS libraries should be pretty straightforward if you'reusingCMake to build your project. For a quickstart, have a look at theexample code that usesthe libraries to print information about a DwarFS image (likedwarfsck
)or extract it (likedwarfsextract
).
There are five individual libraries:
dwarfs_common
contains the common code required by all the otherlibraries. The interfaces are defined indwarfs/
.dwarfs_reader
contains all code required to read data from aDwarFS image. The interfaces are defined indwarfs/reader/
.dwarfs_extractor
contains the ccode required to extract a DwarFSimage usinglibarchive
. The interfacesare defined indwarfs/utility/filesystem_extractor.h
.dwarfs_writer
contains the code required to create DwarFS images.The interfaces are defined indwarfs/writer/
.dwarfs_rewrite
contains the code to re-write DwarFS images. Theinterfaces are defined indwarfs/utility/rewrite_filesystem.h
.
The headers ininternal
subfolders are only accessible at buildtime and won't be installed. The same goes for thetool
subfolder.
The reader and extractor APIs should be fairly stable. The writerAPIs are likely going to change. Note, however, that there are noguarantees on API stability before this project reaches version 1.0.0.
Support for the Windows operating system is currently experimental.Having worked pretty much exclusively in a Unix world for the past twodecades, my experience with Windows development is rather limited andI'd expect there to definitely be bugs and rough edges in the Windowscode.
The Windows version of the DwarFS filesystem driver relies on the awesomeWinFsp project and itswinfsp-x64.dll
must be discoverable by thedwarfs.exe
driver.
The different tools should behave pretty much the same whether you'reusing them on Linux or Windows. The file system images can be copiedbetween Linux and Windows and images created on one OS should work fineon the other.
There are a few things worth pointing out, though:
DwarFS supports both hardlinks and symlinks on Windows, just as itdoes on Linux. However, creating hardlinks and symlinks seems torequire admin privileges on Windows, so if you want to e.g. extracta DwarFS image that contains links of some sort, you might run intoerrors if you don't have the right privileges.
Due to aproblem inWinFsp, symlinks cannot currently point outside of the mounted filesystem. Furthermore, due to anotherproblem in WinFsp,symlinks with a drive letter will appear with a mangled target path.
The DwarFS driver on Windows correctly reports hardlink counts viaits API, but currently these counts are not correctly propagatedto the Windows file system layer. This is presumably due to aproblem in WinFsp.
When mounting a DwarFS image on Windows, the mount point must notexist. This is different from Linux, where the mount point mustactually exist. Also, it's possible to mount a DwarFS image as adrive letter, e.g.
dwarfs.exe image.dwarfs Z:
Filter rules for
mkdwarfs
always require Unix path separators,regardless of whether it's running on Windows or Linux.
Building on Windows is not too complicated thanks tovcpkg.You'll need to install:
WinFsp
is expected to be installed inC:\Program Files (x86)\WinFsp
;if it's not, you'll need to setWINFSP_PATH
when running CMake viacmake/win.bat
.
Now you need to clonevcpkg
anddwarfs
:
> cd %HOMEPATH%> mkdir git> cd git> git clone https://github.com/Microsoft/vcpkg.git> git clone https://github.com/mhx/dwarfs
Then, bootstrapvcpkg
:
> .\vcpkg\bootstrap-vcpkg.bat
And build DwarFS:
> cd dwarfs> mkdir build> cd build> ..\cmake\win.bat> ninja
Once that's done, you should be able to run the tests.SetCTEST_PARALLEL_LEVEL
according to the number of CPU cores inyour machine.
> set CTEST_PARALLEL_LEVEL=10> ninja test
The DwarFS libraries and tools (mkdwarfs
,dwarfsck
,dwarfsextract
)are now available fromHomebrew:
$ brew install dwarfs$ brew test dwarfs
The macOS version of the DwarFS filesystem driver relies on the awesomemacFUSE project and is available fromgromgit'shomebrew-fuse tap:
$ brew tap gromgit/homebrew-fuse$ brew install dwarfs-fuse-mac
Astrophotography can generate huge amounts of raw image data. During asingle night, it's not unlikely to end up with a few dozens of gigabytesof data. With most dedicated astrophotography cameras, this data ends upin the form of FITS images. These are usually uncompressed, don't compressvery well with standard compression algorithms, and while there are certaincompressed FITS formats, these aren't widely supported.
One of the compression formats (simply called "Rice") compresses reasonablywell and is really fast. However, its implementation for compressed FITShas a few drawbacks. The most severe drawbacks are that compression isn'tquite as good as it could be for color sensors and sensors with a less than16 bits of resolution.
DwarFS supports thericepp
(Rice++) compression, which builds on the basicidea of Rice compression, but makes a few enhancements: it compresses colorand low bit depth images significantly better and always searches for theoptimum solution during compression instead of relying on a heuristic.
Let's look at an example using 129 images (darks, flats and lights) takenwith an ASI1600MM camera. Each image is 32 MiB, so a total of 4 GiB of data.Compressing these with the standardfpack
tool takes about 16.6 secondsand yields a total output size of 2.2 GiB:
$ time fpack */*.fit */*/*.fituser14.992system1.592total16.616$ find . -name '*.fz' -print0 | xargs -0 cat | wc -c2369943360
However, this leaves you with*.fz
files that not every application canactually read.
Using DwarFS, here's what we get:
$ mkdwarfs -i ASI1600 -o asi1600-20.dwarfs -S 20 --categorizeI 08:47:47.459077 scanning "ASI1600"I 08:47:47.491492 assigning directory and link inodes...I 08:47:47.491560 waiting for background scanners...I 08:47:47.675241 scanning CPU time: 1.051sI 08:47:47.675271 finalizing file inodes...I 08:47:47.675330 saved 0 B / 3.941 GiB in 0/258 duplicate filesI 08:47:47.675360 assigning device inodes...I 08:47:47.675371 assigning pipe/socket inodes...I 08:47:47.675381 building metadata...I 08:47:47.675393 building blocks...I 08:47:47.675398 saving names and symlinks...I 08:47:47.675514 updating name and link indices...I 08:47:47.675796 waiting for segmenting/blockifying to finish...I 08:47:50.274285 total ordering CPU time: 616.3usI 08:47:50.274329 total segmenting CPU time: 1.132sI 08:47:50.279476 saving chunks...I 08:47:50.279622 saving directories...I 08:47:50.279674 saving shared files table...I 08:47:50.280745 saving names table... [1.047ms]I 08:47:50.280768 saving symlinks table... [743ns]I 08:47:50.282031 waiting for compression to finish...I 08:47:50.823924 compressed 3.941 GiB to 1.201 GiB (ratio=0.304825)I 08:47:50.824280 compression CPU time: 17.92sI 08:47:50.824316 filesystem created without errors [3.366s]⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯waiting for block compression to finish5 dirs, 0/0 soft/hard links, 258/258 files, 0 otheroriginal size: 3.941 GiB, hashed: 315.4 KiB (18 files, 0 B/s)scanned: 3.941 GiB (258 files, 117.1 GiB/s), categorizing: 0 B/ssaved by deduplication: 0 B (0 files), saved by segmenting: 0 Bfilesystem: 3.941 GiB in 4037 blocks (4550 chunks, 516/516 fragments, 258 inodes)compressed filesystem: 4037 blocks/1.201 GiB written
In less than 3.4 seconds, it compresses the data down to 1.2 GiB, almosthalf the size of thefpack
output.
In addition to saving a lot of disk space, this can also be useful when yourdata is stored on a NAS. Here's a comparison of the same set of data accessedover a 1 Gb/s network connection, first using the uncompressed raw data:
find /mnt/ASI1600 -name '*.fit' -print0 | xargs -0 -P4 -n1 cat | dd of=/dev/null status=progress4229012160 bytes (4.2 GB, 3.9 GiB) copied, 36.0455 s, 117 MB/s
And next using a DwarFS image on the same share:
$ dwarfs /mnt/asi1600-20.dwarfs mnt$ find mnt -name '*.fit' -print0 | xargs -0 -P4 -n1 cat | dd of=/dev/null status=progress4229012160 bytes (4.2 GB, 3.9 GiB) copied, 14.3681 s, 294 MB/s
That's roughly 2.5 times faster. You can very likely see similar resultswith slow external hard drives.
Currently, DwarFS has no built-in ability to add recovery information to afile system image. However, for archival purposes, it's a good idea to havesuch recovery information in order to be able to repair a damaged image.
This is fortunately relatively straightforward using something likepar2cmdline:
$ par2create -n1 asi1600-20.dwarfs
This will create two additional files that you can place alongside the image(or on a different storage), as you'll only need them if DwarFS has detectedan issue with the file system image. If there's an issue, you can run
$ par2repair asi1600-20.dwarfs
which will very likely be able to recover the image if less than 5% (that'sthe default used bypar2create
) of the image are damaged.
Extended attributes are not currently supported. Any extended attributesstored in the source file system will not currently be preserved whenbuilding a DwarFS image usingmkdwarfs
.
That being said, the root inode of a mounted DwarFS image currently exposesone or two extended attributes on Linux:
$ attr -l mntAttribute "dwarfs.driver.pid" has a 4 byte value for mntAttribute "dwarfs.driver.perfmon" has a 4849 byte value for mnt
Thedwarfs.driver.pid
attribute simply contains the PID of the DwarFSFUSE driver. Thedwarfs.driver.perfmon
attribute contains the currentresults of theperformance monitor.
Furthermore, each regular file exposes an attributedwarfs.inodeinfo
with information about the underlying inode:
$ attr -l "05 Disappear.caf"Attribute "dwarfs.inodeinfo" has a 448 byte value for 05 Disappear.caf
The attribute contains a JSON object with information about theunderlying inode:
$ attr -qg dwarfs.inodeinfo "05 Disappear.caf"{ "chunks": [ { "block": 2, "category": "pcmaudio/metadata", "offset": 270976, "size": 4096 }, { "block": 414, "category": "pcmaudio/waveform", "offset": 37594368, "size": 29514492 }, { "block": 419, "category": "pcmaudio/waveform", "offset": 0, "size": 29385468 } ], "gid": 100, "mode": 33188, "modestring": "----rw-r--r--", "uid": 1000}
This is useful, for example, to check how a particular file is spreadacross multiple blocks or which categories have been assigned to thefile.
The SquashFS,xz
,lrzip
,zpaq
andwimlib
tests were all done onan 8 core Intel(R) Xeon(R) E-2286M CPU @ 2.40GHz with 64 GiB of RAM.
The Cromfs tests were done with an older version of DwarFSon a 6 core Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz with 64 GiB of RAM.
The EROFS tests were done using DwarFS v0.9.8 and EROFS v1.7.1 on anIntel(R) Core(TM) i9-13900K with 64 GiB of RAM.
The systems were mostly idle during all of the tests.
The source directory contained1139 different Perl installationsfrom 284 distinct releases, a total of 47.65 GiB of data in 1,927,501files and 330,733 directories. The source directory was freshlyunpacked from a tar archive to an XFS partition on a 970 EVO Plus 2TBNVME drive, so most of its contents were likely cached.
I'm using the same compression type and compression level forSquashFS that is the default setting for DwarFS:
$ time mksquashfs install perl-install.squashfs -comp zstd -Xcompression-level 22Parallel mksquashfs: Using 16 processorsCreating 4.0 filesystem on perl-install-zstd.squashfs, block size 131072.[=========================================================/] 2107401/2107401 100%Exportable Squashfs 4.0 filesystem, zstd compressed, data block size 131072 compressed data, compressed metadata, compressed fragments, compressed xattrs, compressed ids duplicates are removedFilesystem size 4637597.63 Kbytes (4528.90 Mbytes) 9.29% of uncompressed filesystem size (49922299.04 Kbytes)Inode table size 19100802 bytes (18653.13 Kbytes) 26.06% of uncompressed inode table size (73307702 bytes)Directory table size 19128340 bytes (18680.02 Kbytes) 46.28% of uncompressed directory table size (41335540 bytes)Number of duplicate files found 1780387Number of inodes 2255794Number of files 1925061Number of fragments 28713Number of symbolic links 0Number of device nodes 0Number of fifo nodes 0Number of socket nodes 0Number of directories 330733Number of ids (unique uids + gids) 2Number of uids 1 mhx (1000)Number of gids 1 users (100)real 32m54.713suser 501m46.382ssys 0m58.528s
For DwarFS, I'm sticking to the defaults:
$ time mkdwarfs -i install -o perl-install.dwarfsI 11:33:33.310931 scanning installI 11:33:39.026712 waiting for background scanners...I 11:33:50.681305 assigning directory and link inodes...I 11:33:50.888441 finding duplicate files...I 11:34:01.120800 saved 28.2 GiB / 47.65 GiB in 1782826/1927501 duplicate filesI 11:34:01.122608 waiting for inode scanners...I 11:34:12.839065 assigning device inodes...I 11:34:12.875520 assigning pipe/socket inodes...I 11:34:12.910431 building metadata...I 11:34:12.910524 building blocks...I 11:34:12.910594 saving names and links...I 11:34:12.910691 bloom filter size: 32 KiBI 11:34:12.910760 ordering 144675 inodes using nilsimsa similarity...I 11:34:12.915555 nilsimsa: depth=20000 (1000), limit=255I 11:34:13.052525 updating name and link indices...I 11:34:13.276233 pre-sorted index (660176 name, 366179 path lookups) [360.6ms]I 11:35:44.039375 144675 inodes ordered [91.13s]I 11:35:44.041427 waiting for segmenting/blockifying to finish...I 11:37:38.823902 bloom filter reject rate: 96.017% (TPR=0.244%, lookups=4740563665)I 11:37:38.823963 segmentation matches: good=454708, bad=6819, total=464247I 11:37:38.824005 segmentation collisions: L1=0.008%, L2=0.000% [2233254 hashes]I 11:37:38.824038 saving chunks...I 11:37:38.860939 saving directories...I 11:37:41.318747 waiting for compression to finish...I 11:38:56.046809 compressed 47.65 GiB to 430.9 MiB (ratio=0.00883101)I 11:38:56.304922 filesystem created without errors [323s]⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯waiting for block compression to finish330733 dirs, 0/2440 soft/hard links, 1927501/1927501 files, 0 otheroriginal size: 47.65 GiB, dedupe: 28.2 GiB (1782826 files), segment: 15.19 GiBfilesystem: 4.261 GiB in 273 blocks (319178 chunks, 144675/144675 inodes)compressed filesystem: 273 blocks/430.9 MiB written [depth: 20000]█████████████████████████████████████████████████████████████████████████████▏100% |real 5m23.030suser 78m7.554ssys 1m47.968s
So in this comparison,mkdwarfs
ismore than 6 times faster thanmksquashfs
,both in terms of CPU time and wall clock time.
$ ll perl-install.*fs-rw-r--r-- 1 mhx users 447230618 Mar 3 20:28 perl-install.dwarfs-rw-r--r-- 1 mhx users 4748902400 Mar 3 20:10 perl-install.squashfs
In terms of compression ratio, theDwarFS file system is more than 10 timessmaller than the SquashFS file system. With DwarFS, the content has beencompressed down to less than 0.9% (!) of its original size. This compressionratio only considers the data stored in the individual files, not the actualdisk space used. On the original XFS file system, according todu
, thesource folder uses 52 GiB, sothe DwarFS image actually only uses 0.8% ofthe original space.
Here's another comparison usinglzma
compression instead ofzstd
:
$ time mksquashfs install perl-install-lzma.squashfs -comp lzmareal 13m42.825suser 205m40.851ssys 3m29.088s
$ time mkdwarfs -i install -o perl-install-lzma.dwarfs -l9real 3m43.937suser 49m45.295ssys 1m44.550s
$ ll perl-install-lzma.*fs-rw-r--r-- 1 mhx users 315482627 Mar 3 21:23 perl-install-lzma.dwarfs-rw-r--r-- 1 mhx users 3838406656 Mar 3 20:50 perl-install-lzma.squashfs
It's immediately obvious that the runs are significantly faster and theresulting images are significantly smaller. Still,mkdwarfs
is about4 times faster and produces and image that's12 times smaller thanthe SquashFS image. The DwarFS image is only 0.6% of the original file size.
So, why not uselzma
instead ofzstd
by default? The reason is thatlzma
is about an order of magnitude slower to decompress thanzstd
. If you'reonly accessing data on your compressed filesystem occasionally, this mightnot be a big deal, but if you use it extensively,zstd
will result inbetter performance.
The comparisons above are not completely fair.mksquashfs
by defaultuses a block size of 128KiB, whereasmkdwarfs
uses 16MiB blocks by default,or even 64MiB blocks with-l9
. When using identical block sizes for bothfile systems, the difference, quite expectedly, becomes a lot less dramatic:
$ time mksquashfs install perl-install-lzma-1M.squashfs -comp lzma -b 1Mreal 15m43.319suser 139m24.533ssys 0m45.132s
$ time mkdwarfs -i install -o perl-install-lzma-1M.dwarfs -l9 -S20 -B3real 4m25.973suser 52m15.100ssys 7m41.889s
$ ll perl-install*.*fs-rw-r--r-- 1 mhx users 935953866 Mar 13 12:12 perl-install-lzma-1M.dwarfs-rw-r--r-- 1 mhx users 3407474688 Mar 3 21:54 perl-install-lzma-1M.squashfs
Even this isstill not entirely fair, as it uses a feature (-B3
) that allowsDwarFS to reference file chunks from up to two previous filesystem blocks.
But the point is that this is really where SquashFS tops out, as it doesn'tsupport larger block sizes or back-referencing. And as you'll see below, thelarger blocks that DwarFS is using by default don't necessarily negativelyimpact performance.
DwarFS also features an option to recompress an existing file system witha different compression algorithm. This can be useful as it allows relativelyfast experimentation with different algorithms and options without requiringa full rebuild of the file system. For example, recompressing the above filesystem with the best possible compression (-l 9
):
$ time mkdwarfs --recompress -i perl-install.dwarfs -o perl-lzma-re.dwarfs -l9I 20:28:03.246534 filesystem rewrittenwithout errors [148.3s]⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯filesystem: 4.261 GiB in 273 blocks (0 chunks, 0 inodes)compressed filesystem: 273/273 blocks/372.7 MiB written████████████████████████████████████████████████████████████████████▏100% \real 2m28.279suser 37m8.825ssys 0m43.256s
$ ll perl-*.dwarfs-rw-r--r-- 1 mhx users 447230618 Mar 3 20:28 perl-install.dwarfs-rw-r--r-- 1 mhx users 390845518 Mar 4 20:28 perl-lzma-re.dwarfs-rw-r--r-- 1 mhx users 315482627 Mar 3 21:23 perl-install-lzma.dwarfs
Note that while the recompressed filesystem is smaller than the original image,it is still a lot bigger than the filesystem we previously build with-l9
.The reason is that the recompressed image still uses the same block size, andthe block size cannot be changed by recompressing.
In terms of how fast the file system is when using it, a quick testI've done is to freshly mount the filesystem created above and runeach of the 1139perl
executables to print their version.
$ hyperfine -c "umount mnt" -p "umount mnt; dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1" -P procs 5 20 -D 5 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'"Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P5 sh -c '$0 -v >/dev/null' Time (mean ± σ): 1.810 s ± 0.013 s [User: 1.847 s, System: 0.623 s] Range (min … max): 1.788 s … 1.825 s 10 runsBenchmark #2: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null' Time (mean ± σ): 1.333 s ± 0.009 s [User: 1.993 s, System: 0.656 s] Range (min … max): 1.321 s … 1.354 s 10 runsBenchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P15 sh -c '$0 -v >/dev/null' Time (mean ± σ): 1.181 s ± 0.018 s [User: 2.086 s, System: 0.712 s] Range (min … max): 1.165 s … 1.214 s 10 runsBenchmark #4: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null' Time (mean ± σ): 1.149 s ± 0.015 s [User: 2.128 s, System: 0.781 s] Range (min … max): 1.136 s … 1.186 s 10 runs
These timings are forinitial runs on a freshly mounted file system,running 5, 10, 15 and 20 processes in parallel. 1.1 seconds means thatit takes only about 1 millisecond per Perl binary.
Following are timings forsubsequent runs, both on DwarFS (atmnt
)and the original XFS (atinstall
). DwarFS is around 15% slower here:
$ hyperfine -P procs 10 20 -D 10 -w1 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'" "ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'"Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null' Time (mean ± σ): 347.0 ms ± 7.2 ms [User: 1.755 s, System: 0.452 s] Range (min … max): 341.3 ms … 365.2 ms 10 runsBenchmark #2: ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null' Time (mean ± σ): 302.5 ms ± 3.3 ms [User: 1.656 s, System: 0.377 s] Range (min … max): 297.1 ms … 308.7 ms 10 runsBenchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null' Time (mean ± σ): 342.2 ms ± 4.1 ms [User: 1.766 s, System: 0.451 s] Range (min … max): 336.0 ms … 349.7 ms 10 runsBenchmark #4: ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null' Time (mean ± σ): 302.0 ms ± 3.0 ms [User: 1.659 s, System: 0.374 s] Range (min … max): 297.0 ms … 305.4 ms 10 runsSummary 'ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'' ran 1.00 ± 0.01 times faster than 'ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'' 1.13 ± 0.02 times faster than 'ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'' 1.15 ± 0.03 times faster than 'ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null''
Using the lzma-compressed file system, the metrics forinitial runs lookconsiderably worse (about an order of magnitude):
$ hyperfine -c "umount mnt" -p "umount mnt; dwarfs perl-install-lzma.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1" -P procs 5 20 -D 5 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'"Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P5 sh -c '$0 -v >/dev/null' Time (mean ± σ): 10.660 s ± 0.057 s [User: 1.952 s, System: 0.729 s] Range (min … max): 10.615 s … 10.811 s 10 runsBenchmark #2: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null' Time (mean ± σ): 9.092 s ± 0.021 s [User: 1.979 s, System: 0.680 s] Range (min … max): 9.059 s … 9.126 s 10 runsBenchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P15 sh -c '$0 -v >/dev/null' Time (mean ± σ): 9.012 s ± 0.188 s [User: 2.077 s, System: 0.702 s] Range (min … max): 8.839 s … 9.277 s 10 runsBenchmark #4: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null' Time (mean ± σ): 9.004 s ± 0.298 s [User: 2.134 s, System: 0.736 s] Range (min … max): 8.611 s … 9.555 s 10 runs
So you might want to consider usingzstd
instead oflzma
if you'dlike to optimize for file system performance. It's also the defaultcompression used bymkdwarfs
.
Now here's a comparison with the SquashFS filesystem:
$ hyperfine -c 'sudo umount mnt' -p 'umount mnt; dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1' -n dwarfs-zstd "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'" -p 'sudo umount mnt; sudo mount -t squashfs perl-install.squashfs mnt; sleep 1' -n squashfs-zstd "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'"Benchmark #1: dwarfs-zstd Time (mean ± σ): 1.151 s ± 0.015 s [User: 2.147 s, System: 0.769 s] Range (min … max): 1.118 s … 1.174 s 10 runsBenchmark #2: squashfs-zstd Time (mean ± σ): 6.733 s ± 0.007 s [User: 3.188 s, System: 17.015 s] Range (min … max): 6.721 s … 6.743 s 10 runsSummary 'dwarfs-zstd' ran 5.85 ± 0.08 times faster than 'squashfs-zstd'
So, DwarFS is almost six times faster than SquashFS. But what's more,SquashFS also uses significantly more CPU power. However, the numbersshown above for DwarFS obviously don't include the time spent in thedwarfs
process, so I repeated the test outside of hyperfine:
$ time dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4 -freal 0m4.569suser 0m2.154ssys 0m1.846s
So, in total, DwarFS was using 5.7 seconds of CPU time, whereasSquashFS was using 20.2 seconds, almost four times as much. Ignorethe 'real' time, this is only how long it took me to unmount thefile system again after mounting it.
Another real-life test was to build and test a Perl module with 624different Perl versions in the compressed file system. The module I'veused,Tie::Hash::Indexed,has an XS component that requires a C compiler to build. So this reallyaccesses a lot of different stuff in the file system:
The
perl
executables and its shared librariesThe Perl modules used for writing the Makefile
Perl's C header files used for building the module
More Perl modules used for running the tests
I wrote a little script to be able to run multiple builds in parallel:
#!/bin/bashset -euperl=$1dir=$(echo"$perl"| cut -d/ --output-delimiter=- -f5,6)rsync -a Tie-Hash-Indexed/$dir/cd$dir$1 Makefile.PL>/dev/null2>&1maketest>/dev/null2>&1cd ..rm -rf$direcho$perl
The following command will run up to 16 builds in parallel on the 8 coreXeon CPU, including debug, optimized and threaded versions of all Perlreleases between 5.10.0 and 5.33.3, a total of 624perl
installations:
$ time ls -1 /tmp/perl/install/*/perl-5.??.?/bin/perl5* | sort -t / -k 8 | xargs -d $'\n' -P 16 -n 1 ./build.sh
Tests were done with a cleanly mounted file system to make sure the cacheswere empty.ccache
was primed to make sure all compiler runs could besatisfied from the cache. With SquashFS, the timing was:
real 0m52.385suser 8m10.333ssys 4m10.056s
And with DwarFS:
real 0m50.469suser 9m22.597ssys 1m18.469s
So, frankly, not much of a difference, with DwarFS being just a bit faster.Thedwarfs
process itself used:
real 0m56.686suser 0m18.857ssys 0m21.058s
So again, DwarFS used less raw CPU power overall, but in terms of wallclocktime, the difference is really marginal.
This test uses slightly less pathological input data: the root filesystem ofa recent Raspberry Pi OS release. This file system also contains device inodes,so in order to preserve those, we pass--with-devices
tomkdwarfs
:
$ time sudo mkdwarfs -i raspbian -o raspbian.dwarfs --with-devicesI 21:30:29.812562 scanning raspbianI 21:30:29.908984 waiting for background scanners...I 21:30:30.217446 assigning directory and link inodes...I 21:30:30.221941 finding duplicate files...I 21:30:30.288099 saved 31.05 MiB / 1007 MiB in 1617/34582 duplicate filesI 21:30:30.288143 waiting for inode scanners...I 21:30:31.393710 assigning device inodes...I 21:30:31.394481 assigning pipe/socket inodes...I 21:30:31.395196 building metadata...I 21:30:31.395230 building blocks...I 21:30:31.395291 saving names and links...I 21:30:31.395374 ordering 32965 inodes using nilsimsa similarity...I 21:30:31.396254 nilsimsa: depth=20000 (1000), limit=255I 21:30:31.407967 pre-sorted index (46431 name, 2206 path lookups) [11.66ms]I 21:30:31.410089 updating name and link indices...I 21:30:38.178505 32965 inodes ordered [6.783s]I 21:30:38.179417 waiting for segmenting/blockifying to finish...I 21:31:06.248304 saving chunks...I 21:31:06.251998 saving directories...I 21:31:06.402559 waiting for compression to finish...I 21:31:16.425563 compressed 1007 MiB to 287 MiB (ratio=0.285036)I 21:31:16.464772 filesystem created without errors [46.65s]⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯waiting for block compression to finish4435 dirs, 5908/0 soft/hard links, 34582/34582 files, 7 otheroriginal size: 1007 MiB, dedupe: 31.05 MiB (1617 files), segment: 47.23 MiBfilesystem: 928.4 MiB in 59 blocks (38890 chunks, 32965/32965 inodes)compressed filesystem: 59 blocks/287 MiB written [depth: 20000]████████████████████████████████████████████████████████████████████▏100% |real 0m46.711suser 10m39.038ssys 0m8.123s
Again, SquashFS uses the same compression options:
$ time sudo mksquashfs raspbian raspbian.squashfs -comp zstd -Xcompression-level 22Parallel mksquashfs: Using 16 processorsCreating 4.0 filesystem on raspbian.squashfs, block size 131072.[===============================================================\] 39232/39232 100%Exportable Squashfs 4.0 filesystem, zstd compressed, data block size 131072 compressed data, compressed metadata, compressed fragments, compressed xattrs, compressed ids duplicates are removedFilesystem size 371934.50 Kbytes (363.22 Mbytes) 35.98% of uncompressed filesystem size (1033650.60 Kbytes)Inode table size 399913 bytes (390.54 Kbytes) 26.53% of uncompressed inode table size (1507581 bytes)Directory table size 408749 bytes (399.17 Kbytes) 42.31% of uncompressed directory table size (966174 bytes)Number of duplicate files found 1618Number of inodes 44932Number of files 34582Number of fragments 3290Number of symbolic links 5908Number of device nodes 7Number of fifo nodes 0Number of socket nodes 0Number of directories 4435Number of ids (unique uids + gids) 18Number of uids 5 root (0) mhx (1000) unknown (103) shutdown (6) unknown (106)Number of gids 15 root (0) unknown (109) unknown (42) unknown (1000) users (100) unknown (43) tty (5) unknown (108) unknown (111) unknown (110) unknown (50) mail (12) nobody (65534) adm (4) mem (8)real 0m50.124suser 9m41.708ssys 0m1.727s
The difference in speed is almost negligible. SquashFS is just a bitslower here. In terms of compression, the difference also isn't huge:
$ ls -lh raspbian.* *.xz-rw-r--r-- 1 mhx users 297M Mar 4 21:32 2020-08-20-raspios-buster-armhf-lite.img.xz-rw-r--r-- 1 root root 287M Mar 4 21:31 raspbian.dwarfs-rw-r--r-- 1 root root 364M Mar 4 21:33 raspbian.squashfs
Interestingly,xz
actually can't compress the whole original imagebetter than DwarFS.
We can even again try to increase the DwarFS compression level:
$ time sudo mkdwarfs -i raspbian -o raspbian-9.dwarfs --with-devices -l9real 0m54.161suser 8m40.109ssys 0m7.101s
Now that actually gets the DwarFS image size well below that of thexz
archive:
$ ls -lh raspbian-9.dwarfs *.xz-rw-r--r-- 1 root root 244M Mar 4 21:36 raspbian-9.dwarfs-rw-r--r-- 1 mhx users 297M Mar 4 21:32 2020-08-20-raspios-buster-armhf-lite.img.xz
Even if you actually build a tarball and compress that (instead ofcompressing the EXT4 file system itself),xz
isn't quite able tomatch the DwarFS image size:
$ time sudo tar cf - raspbian | xz -9 -vT 0 >raspbian.tar.xz 100 % 246.9 MiB / 1,037.2 MiB = 0.238 13 MiB/s 1:18real 1m18.226suser 6m35.381ssys 0m2.205s
$ ls -lh raspbian.tar.xz-rw-r--r-- 1 mhx users 247M Mar 4 21:40 raspbian.tar.xz
DwarFS also comes with thedwarfsextract toolthat allows extraction of a filesystem image without the FUSE driver.So here's a comparison of the extraction speed:
$ time sudo tar xf raspbian.tar.xz -C out1real 0m12.846suser 0m12.313ssys 0m1.616s
$ time sudo dwarfsextract -i raspbian-9.dwarfs -o out2real 0m3.825suser 0m13.234ssys 0m1.382s
So,dwarfsextract
is almost 4 times faster thanks to using multipleworker threads for decompression. It's writing about 300 MiB/s in thisexample.
Another nice feature ofdwarfsextract
is that it allows you to directlyoutput data in an archive format, so you could create a tarball fromyour image without extracting the files to disk:
$ dwarfsextract -i raspbian-9.dwarfs -f ustar | xz -9 -T0 >raspbian2.tar.xz
This has the interesting side-effect that the resulting tarball willlikely be smaller than the one built straight from the directory:
$ ls -lh raspbian*.tar.xz-rw-r--r-- 1 mhx users 247M Mar 4 21:40 raspbian.tar.xz-rw-r--r-- 1 mhx users 240M Mar 4 23:52 raspbian2.tar.xz
That's becausedwarfsextract
writes files in inode-order, and bydefault inodes are ordered by similarity for the best possiblecompression.
lrzip is a compression utilitytargeted especially at compressing large files. From its description,it looks like it does something very similar to DwarFS, i.e. it looksfor duplicate segments before passing the de-duplicated data on toanlzma
compressor.
When I first read aboutlrzip
, I was pretty certain it would easilybeat DwarFS. So let's take a look.lrzip
operates on a single file,so it's necessary to first build a tarball:
$ time tar cf perl-install.tar installreal 2m9.568suser 0m3.757ssys 0m26.623s
Now we can runlrzip
:
$ time lrzip -vL9 -o perl-install.tar.lrzip perl-install.tarThe following options are in effect for this COMPRESSION.Threading is ENABLED. Number of CPUs detected: 16Detected 67106172928 bytes ramCompression level 9Nice Value: 19Show ProgressVerboseOutput Filename Specified: perl-install.tar.lrzipTemporary Directory set as: ./Compression mode is: LZMA. LZO Compressibility testing enabledHeuristically Computed Compression Window: 426 = 42600MBFile size: 52615639040Will take 2 passesBeginning rzip pre-processing phaseBeginning rzip pre-processing phaseperl-install.tar - Compression Ratio: 100.378. Average Compression Speed: 14.536MB/s.Total time: 00:57:32.47real 57m32.472suser 81m44.104ssys 4m50.221s
That definitely took a while. This is about an order of magnitudeslower thanmkdwarfs
and it barely makes use of the 8 cores.
$ ll -h perl-install.tar.lrzip-rw-r--r-- 1 mhx users 500M Mar 6 21:16 perl-install.tar.lrzip
This is a surprisingly disappointing result. The archive is 65% largerthan a DwarFS image at-l9
that takes less than 4 minutes to build.Also, you can't just access the files in the.lrzip
without fullyunpacking the archive first.
That being said, itis better than just usingxz
on the tarball:
$ time xz -T0 -v9 -c perl-install.tar >perl-install.tar.xzperl-install.tar (1/1) 100 % 4,317.0 MiB / 49.0 GiB = 0.086 24 MiB/s 34:55real 34m55.450suser 543m50.810ssys 0m26.533s
$ ll perl-install.tar.xz -h-rw-r--r-- 1 mhx users 4.3G Mar 6 22:59 perl-install.tar.xz
zpaq is a journaling backuputility and archiver. Again, it appears to share some of the ideas inDwarFS, like segmentation analysis, but it also adds some features ontop that make it useful for incremental backups. However, it's alsonot usable as a file system, so data needs to be extracted before itcan be used.
Anyway, how does it fare in terms of speed and compression performance?
$ time zpaq a perl-install.zpaq install -m5
After a few million lines of output that (I think) cannot be turned off:
2258234 +added, 0 -removed.0.000000 + (51161.953159 -> 8932.000297 -> 490.227707) = 490.227707 MB2828.082 seconds (all OK)real 47m8.104suser 714m44.286ssys 3m6.751s
So, it's an order of magnitude slower thanmkdwarfs
and uses 14 timesas much CPU resources asmkdwarfs -l9
. The resulting archive it prettyclose in size to the default configuration DwarFS image, but it's morethan 50% bigger than the image produced bymkdwarfs -l9
.
$ ll perl-install*.*-rw-r--r-- 1 mhx users 490227707 Mar 7 01:38 perl-install.zpaq-rw-r--r-- 1 mhx users 315482627 Mar 3 21:23 perl-install-l9.dwarfs-rw-r--r-- 1 mhx users 447230618 Mar 3 20:28 perl-install.dwarfs
What'sreally surprising is how slow it is to extract thezpaq
archive again:
$ time zpaq x perl-install.zpaq2798.097 seconds (all OK)real 46m38.117suser 711m18.734ssys 3m47.876s
That's 700 times slower than extracting the DwarFS image.
zpaqfranz is a derivative of zpaq.Much to my delight, it doesn't generate millions of lines of output.It claims to be multi-threaded and de-duplicating, so definitely worthtaking a look. Like zpaq, it supports incremental backups.
We'll use a different input to compare zpaqfranz and DwarFS: The source codeof 670 different releases of the "wine" emulator. That's 73 gigabytes of datain total, spread across slightly more than 3 million files. It's obviouslyhighly redundant and should thus be a good data set to compare the tools.For reference, a.tar.xz
of the directory is still 7 GiB in size and aSquashFS image of the data gets down to around 1.6 GiB. An "optimized".tar.xz
, where the input files were ordered by similarity, compresses downto 399 MiB, almost 20 times better than without ordering.
Now it's time to try zpaqfranz. The input data is stored on a fast SSD and alarge fraction of it is already in the file system cache from previous runs,so disk I/O is not a bottleneck.
$ time ./zpaqfranz a winesrc.zpaq winesrczpaqfranz v58.8k-JIT-L(2023-08-05)Creating winesrc.zpaq at offset 0 + 0Add 2024-01-11 07:25:22 3.117.413 69.632.090.852 ( 64.85 GB) 16T (362.904 dirs)3.480.317 +added, 0 -removed.0 + (69.632.090.852 -> 8.347.553.798 -> 617.600.892) = 617.600.892 @ 58.38 MB/s1137.441 seconds (000:18:57) (all OK)real 18m58.632suser 11m51.052ssys 1m3.389s
That is considerably faster than the original zpaq, and uses about 60 timesless CPU resources. The output file is 589 MiB, so slightly larger than boththe "optimized".tar.gz
and the zpaq output.
How doesmkdwarfs
do?
$ time mkdwarfs -i winesrc -o winesrc.dwarfs -l9[...]I 07:55:20.546636 compressed 64.85 GiB to 93.2 MiB (ratio=0.00140344)I 07:55:20.826699 compression CPU time: 6.726mI 07:55:20.827338 filesystem created without errors [2.283m][...]real 2m17.100suser 9m53.633ssys 2m29.236s
It uses pretty much the same amount of CPU resources, but finishes more than8 times faster. The DwarFS output file is more than 6 times smaller.
You can actually squeeze a bit more redundancy out of the original data bytweaking the similarity ordering and switching from lzma to brotli compression,albeit at a somewhat slower compression speed:
mkdwarfs -i winesrc -o winesrc.dwarfs -l9 -C brotli:quality=11:lgwin=26 --order=nilsimsa:max-cluster-size=200k[...]I 08:21:01.138075 compressed 64.85 GiB to 73.52 MiB (ratio=0.00110716)I 08:21:01.485737 compression CPU time: 36.58mI 08:21:01.486313 filesystem created without errors [5.501m][...]real 5m30.178suser 40m59.193ssys 2m36.234s
That's almost a 1000x reduction in size.
Let's also look at decompression speed:
$ time zpaqfranz x winesrc.zpaqzpaqfranz v58.8k-JIT-L(2023-08-05)/home/mhx/winesrc.zpaq:1 versions, 3.480.317 files, 617.600.892 bytes (588.99 MB)Extract 69.632.090.852 bytes (64.85 GB) in 3.117.413 files (362.904 folders) / 16 T 99.18% 00:00:00 ( 64.32 GB)=>( 64.85 GB) 548.83 MB/sec125.636 seconds (000:02:05) (all OK)real 2m6.968suser 1m36.177ssys 1m10.980s
$ time dwarfsextract -i winesrc.dwarfsreal 1m49.182suser 0m34.667ssys 1m28.733s
Decompression time is pretty much in the same ballpark, with just slightlyshorter times for the DwarFS image.
wimlib is a really interesting project that isa lot more mature than DwarFS. While DwarFS at its core has a librarycomponent that could potentially be ported to other operating systems,wimlib already is available on many platforms. It also seems to havequite a rich set of features, so it's definitely worth taking a look at.
I first triedwimcapture
on the perl dataset:
$ time wimcapture --unix-data --solid --solid-chunk-size=16M install perl-install.wimScanning "install"47 GiB scanned (1927501 files, 330733 directories)Using LZMS compression with 16 threadsArchiving file data: 19 GiB of 19 GiB (100%) donereal 15m23.310suser 174m29.274ssys 0m42.921s
$ ll perl-install.*-rw-r--r-- 1 mhx users 447230618 Mar 3 20:28 perl-install.dwarfs-rw-r--r-- 1 mhx users 315482627 Mar 3 21:23 perl-install-l9.dwarfs-rw-r--r-- 1 mhx users 4748902400 Mar 3 20:10 perl-install.squashfs-rw-r--r-- 1 mhx users 1016981520 Mar 6 21:12 perl-install.wim
So, wimlib is definitely much better than squashfs, in terms of bothcompression ratio and speed. DwarFS is however about 3 times faster tocreate the file system and the DwarFS file system less than half the size.When switching to LZMA compression, the DwarFS file system is more than3 times smaller (wimlib uses LZMS compression by default).
What's a bit surprising is that mounting awim file takes quite a bitof time:
$ time wimmount perl-install.wim mnt[WARNING] Mounting a WIM file containing solid-compressed data; file access may be slow.real 0m2.038suser 0m1.764ssys 0m0.242s
Mounting the DwarFS image takes almost no time in comparison:
$ time git/github/dwarfs/build-clang-11/dwarfs perl-install-default.dwarfs mntI 00:23:39.238182 dwarfs (v0.4.0, fuse version 35)real 0m0.003suser 0m0.003ssys 0m0.000s
That's just because it immediately forks into background by default andinitializes the file system in the background. However, even whenrunning it in the foreground, initializing the file system takes onlyabout 60 milliseconds:
$ dwarfs perl-install.dwarfs mnt -fI 00:25:03.186005 dwarfs (v0.4.0, fuse version 35)I 00:25:03.248061 file system initialized [60.95ms]
If you actually build the DwarFS file system with uncompressed metadata,mounting is basically instantaneous:
$ dwarfs perl-install-meta.dwarfs mnt -fI 00:27:52.667026 dwarfs (v0.4.0, fuse version 35)I 00:27:52.671066 file system initialized [2.879ms]
I've tried running the benchmark where all 1139perl
executablesprint their version with the wimlib image, but after about 10 minutes,it still hadn't finished the first run (with the DwarFS image, one runtook slightly more than 2 seconds). I then tried the following instead:
$ ls -1 /tmp/perl/install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P1 sh -c 'time $0 -v >/dev/null' 2>&1 | grep ^realreal 0m0.802sreal 0m0.652sreal 0m1.677sreal 0m1.973sreal 0m1.435sreal 0m1.879sreal 0m2.003sreal 0m1.695sreal 0m2.343sreal 0m1.899sreal 0m1.809sreal 0m1.790sreal 0m2.115s
Judging from that, it would have probably taken about half an hourfor a single run, which makes at least the--solid
wim image prettymuch unusable for actually working with the file system.
The--solid
option was suggested to me because it resembles the waythat DwarFS actually organizes data internally. However, judging by thewarning when mounting a solid image, it's probably not ideal when usingthe image as a mounted file system. So I tried again without--solid
:
$ time wimcapture --unix-data install perl-install-nonsolid.wimScanning "install"47 GiB scanned (1927501 files, 330733 directories)Using LZX compression with 16 threadsArchiving file data: 19 GiB of 19 GiB (100%) donereal 8m39.034suser 64m58.575ssys 0m32.003s
This is still more than 3 minutes slower thanmkdwarfs
. However, ityields an image that's almost 10 times the size of the DwarFS imageand comparable in size to the SquashFS image:
$ ll perl-install-nonsolid.wim -h-rw-r--r-- 1 mhx users 4.6G Mar 6 23:24 perl-install-nonsolid.wim
Thisstill takes surprisingly long to mount:
$ time wimmount perl-install-nonsolid.wim mntreal 0m1.603suser 0m1.327ssys 0m0.275s
However, it's really usable as a file system, even though it's about4-5 times slower than the DwarFS image:
$ hyperfine -c 'umount mnt' -p 'umount mnt; dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1' -n dwarfs "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'" -p 'umount mnt; wimmount perl-install-nonsolid.wim mnt; sleep 1' -n wimlib "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'"Benchmark #1: dwarfs Time (mean ± σ): 1.149 s ± 0.019 s [User: 2.147 s, System: 0.739 s] Range (min … max): 1.122 s … 1.187 s 10 runsBenchmark #2: wimlib Time (mean ± σ): 7.542 s ± 0.069 s [User: 2.787 s, System: 0.694 s] Range (min … max): 7.490 s … 7.732 s 10 runsSummary 'dwarfs' ran 6.56 ± 0.12 times faster than 'wimlib'
I usedCromfs in the pastfor compressed file systems and remember that it did a pretty good jobin terms of compression ratio. But it was never fast. However, I didn'tquite remember justhow slow it was until I tried to set up a test.
Here's a run on the Perl dataset, with the block size set to 16 MiB tomatch the default of DwarFS, and with additional options suggested tospeed up compression:
$ time mkcromfs -f 16777216 -qq -e -r100000 install perl-install.cromfsWriting perl-install.cromfs...mkcromfs: Automatically enabling --24bitblocknums because it seems possible for this filesystem.Root pseudo file is 108 bytesInotab spans 0x7f3a18259000..0x7f3a1bfffb9cRoot inode spans 0x7f3a205d2948..0x7f3a205d294cBeginning task for Files and directories: Finding identical blocks2163608 reuse opportunities found. 561362 unique blocks. Block table will be 79.4% smaller than without the index search.Beginning task for Files and directories: BlockifyingBlockifying: 0.04% (140017/2724970) idx(siz=80423,del=0) rawin(20.97 MB)rawout(20.97 MB)diff(1956 bytes)Termination signalled, cleaning up temporariesreal 29m9.634suser 201m37.816ssys 2m15.005s
So, it processed 21 MiB out of 48 GiB in half an hour, using almosttwice as much CPU resources as DwarFS for thewhole file system.At this point I decided it's likely not worth waiting (presumably)another month (!) formkcromfs
to finish. I double checked thatI didn't accidentally build a debugging version,mkcromfs
wasdefinitely built with-O3
.
I then tried once more with a smaller version of the Perl dataset.This only has 20 versions (instead of 1139) of Perl, and obviouslya lot less redundancy:
$ time mkcromfs -f 16777216 -qq -e -r100000 install-small perl-install.cromfsWriting perl-install.cromfs...mkcromfs: Automatically enabling --16bitblocknums because it seems possible for this filesystem.Root pseudo file is 108 bytesInotab spans 0x7f00e0774000..0x7f00e08410a8Root inode spans 0x7f00b40048f8..0x7f00b40048fcBeginning task for Files and directories: Finding identical blocks25362 reuse opportunities found. 9815 unique blocks. Block table will be 72.1% smaller than without the index search.Beginning task for Files and directories: BlockifyingCompressing raw rootdir inode (28 bytes)z=982370,del=2) rawin(641.56 MB)rawout(252.72 MB)diff(388.84 MB) compressed into 35 bytesINOTAB pseudo file is 839.85 kBInotab inode spans 0x7f00bc036ed8..0x7f00bc036ef4Beginning task for INOTAB: Finding identical blocks0 reuse opportunities found. 13 unique blocks. Block table will be 0.0% smaller than without the index search.Beginning task for INOTAB: Blockifyingmkcromfs: Automatically enabling --packedblocks because it is possible for this filesystem.Compressing raw inotab inode (52 bytes) compressed into 58 bytesCompressing 9828 block records (4 bytes each, total 39312 bytes) compressed into 15890 bytesCompressing and writing 16 fblocks...16 fblocks were written: 35.31 MB = 13.90 % of 254.01 MBFilesystem size: 35.33 MB = 5.50 % of original 642.22 MBEndreal 27m38.833suser 277m36.208ssys 11m36.945s
And repeating the same task withmkdwarfs
:
$ time mkdwarfs -i install-small -o perl-install-small.dwarfs21:13:38.131724 scanning install-small21:13:38.320139 waiting for background scanners...21:13:38.727024 assigning directory and link inodes...21:13:38.731807 finding duplicate files...21:13:38.832524 saved 267.8 MiB / 611.8 MiB in 22842/26401 duplicate files21:13:38.832598 waiting for inode scanners...21:13:39.619963 assigning device inodes...21:13:39.620855 assigning pipe/socket inodes...21:13:39.621356 building metadata...21:13:39.621453 building blocks...21:13:39.621472 saving names and links...21:13:39.621655 ordering 3559 inodes using nilsimsa similarity...21:13:39.622031 nilsimsa: depth=20000, limit=25521:13:39.629206 updating name and link indices...21:13:39.630142 pre-sorted index (3360 name, 2127 path lookups) [8.014ms]21:13:39.752051 3559 inodes ordered [130.3ms]21:13:39.752101 waiting for segmenting/blockifying to finish...21:13:53.250951 saving chunks...21:13:53.251581 saving directories...21:13:53.303862 waiting for compression to finish...21:14:11.073273 compressed 611.8 MiB to 24.01 MiB (ratio=0.0392411)21:14:11.091099 filesystem created without errors [32.96s]⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯waiting for block compression to finish3334 dirs, 0/0 soft/hard links, 26401/26401 files, 0 otheroriginal size: 611.8 MiB, dedupe: 267.8 MiB (22842 files), segment: 121.5 MiBfilesystem: 222.5 MiB in 14 blocks (7177 chunks, 3559/3559 inodes)compressed filesystem: 14 blocks/24.01 MiB written██████████████████████████████████████████████████████████████████████▏100% \real 0m33.007suser 3m43.324ssys 0m4.015s
So,mkdwarfs
is about 50 times faster thanmkcromfs
and uses 75 timesless CPU resources. At the same time, the DwarFS file system is 30% smaller:
$ ls -l perl-install-small.*fs-rw-r--r-- 1 mhx users 35328512 Dec 8 14:25 perl-install-small.cromfs-rw-r--r-- 1 mhx users 25175016 Dec 10 21:14 perl-install-small.dwarfs
I noticed that theblockifying
step that took ages for the full datasetwithmkcromfs
ran substantially faster (in terms of MiB/second) on thesmaller dataset, which makes me wonder if there's some quadratic complexitybehaviour that's slowing downmkcromfs
.
In order to be completely fair, I also ranmkdwarfs
with-l 9
to enableLZMA compression (which is whatmkcromfs
uses by default):
$ time mkdwarfs -i install-small -o perl-install-small-l9.dwarfs -l 921:16:21.874975 scanning install-small21:16:22.092201 waiting for background scanners...21:16:22.489470 assigning directory and link inodes...21:16:22.495216 finding duplicate files...21:16:22.611221 saved 267.8 MiB / 611.8 MiB in 22842/26401 duplicate files21:16:22.611314 waiting for inode scanners...21:16:23.394332 assigning device inodes...21:16:23.395184 assigning pipe/socket inodes...21:16:23.395616 building metadata...21:16:23.395676 building blocks...21:16:23.395685 saving names and links...21:16:23.395830 ordering 3559 inodes using nilsimsa similarity...21:16:23.396097 nilsimsa: depth=50000, limit=25521:16:23.401042 updating name and link indices...21:16:23.403127 pre-sorted index (3360 name, 2127 path lookups) [6.936ms]21:16:23.524914 3559 inodes ordered [129ms]21:16:23.525006 waiting for segmenting/blockifying to finish...21:16:33.865023 saving chunks...21:16:33.865883 saving directories...21:16:33.900140 waiting for compression to finish...21:17:10.505779 compressed 611.8 MiB to 17.44 MiB (ratio=0.0284969)21:17:10.526171 filesystem created without errors [48.65s]⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯waiting for block compression to finish3334 dirs, 0/0 soft/hard links, 26401/26401 files, 0 otheroriginal size: 611.8 MiB, dedupe: 267.8 MiB (22842 files), segment: 122.2 MiBfilesystem: 221.8 MiB in 4 blocks (7304 chunks, 3559/3559 inodes)compressed filesystem: 4 blocks/17.44 MiB written██████████████████████████████████████████████████████████████████████▏100% /real 0m48.683suser 2m24.905ssys 0m3.292s
$ ls -l perl-install-small*.*fs-rw-r--r-- 1 mhx users 18282075 Dec 10 21:17 perl-install-small-l9.dwarfs-rw-r--r-- 1 mhx users 35328512 Dec 8 14:25 perl-install-small.cromfs-rw-r--r-- 1 mhx users 25175016 Dec 10 21:14 perl-install-small.dwarfs
It takes about 15 seconds longer to build the DwarFS file system with LZMAcompression (this is still 35 times faster than Cromfs), but reduces thesize even further to make it almost half the size of the Cromfs file system.
I would have added some benchmarks with the Cromfs FUSE driver, but sadlyit crashed right upon trying to list the directory after mounting.
EROFS is a read-only compressedfile system that has been added to the Linux kernel recently.Its goals are different from those of DwarFS, though. It is designed tobe lightweight (which DwarFS is definitely not) and to run on constrainedhardware like embedded devices or smartphones. It is not designed to providemaximum compression. It currently supports LZ4 and LZMA compression.
Running it on the full Perl dataset using options given in the README for"well-compressed images":
$ time mkfs.erofs -C1048576 -Eztailpacking,fragments,all-fragments,dedupe -zlzma,9 perl-install-lzma9.erofs perl-installmkfs.erofs 1.7.1-gd93a18c9<W> erofs: It may take a longer time since MicroLZMA is still single-threaded for now.Build completed.------Filesystem UUID: 538ce164-5f9d-4a6a-9808-5915f17ced30Filesystem total blocks: 599854 (of 4096-byte blocks)Filesystem total inodes: 2255795Filesystem total metadata blocks: 74253Filesystem total deduplicated bytes (of source files): 29625028195user2:35:08.03system1:12.65total2:39:25.35$ ll -h perl-install-lzma9.erofs-rw-r--r-- 1 mhx mhx 2.3G Apr 15 16:23 perl-install-lzma9.erofs
That's definitely slower than SquashFS, but also significantly smaller.
For a fair comparison, let's use the same 1 MiB block size with DwarFS,but also tweak the options for best compression:
$ time mkdwarfs -i perl-install -o perl-install-1M.dwarfs -l9 -S20 -B64 --order=nilsimsa:max-cluster-size=150000[...]330733 dirs, 0/2440 soft/hard links, 1927501/1927501 files, 0 otheroriginal size: 47.49 GiB, hashed: 43.47 GiB (1920025 files, 1.451 GiB/s)scanned: 19.45 GiB (144675 files, 159.3 MiB/s), categorizing: 0 B/ssaved by deduplication: 28.03 GiB (1780386 files), saved by segmenting: 15.4 GiBfilesystem: 4.053 GiB in 4151 blocks (937069 chunks, 144674/144674 fragments, 144675 inodes)compressed filesystem: 4151 blocks/806.2 MiB written[...]user24:27.47system4:20.74total3:26.79
That's significantly smaller and, almost more importantly, 46 timesfaster thanmkfs.erofs
.
Actually using the file system images, here's how DwarFS performs:
$ dwarfs perl-install-1M.dwarfs mnt -oworkers=8$ find mnt -type f -print0 | xargs -0 -P16 -n64 cat | dd of=/dev/null bs=1M status=progress50392172594 bytes (50 GB, 47 GiB) copied, 19 s, 2.7 GB/s0+1662649 records in0+1662649 records out51161953159 bytes (51 GB, 48 GiB) copied, 19.4813 s, 2.6 GB/s
Reading every single file from 16 parallel processes took less than20 seconds. The FUSE driver consumed 143 seconds of CPU time.
Here's the same for EROFS:
$ erofsfuse perl-install-lzma9.erofs mnt$ find mnt -type f -print0 | xargs -0 -P16 -n64 cat | dd of=/dev/null bs=1M status=progress2594306810 bytes (2.6 GB, 2.4 GiB) copied, 300 s, 8.6 MB/s^C0+133296 records in0+133296 records out2595212832 bytes (2.6 GB, 2.4 GiB) copied, 300.336 s, 8.6 MB/s
Note that I've stopped this after 5 minutes. The DwarFS FUSE driverdelivered about 300 times faster throughput compared to EROFS. TheEROFS FUSE driver consumed 50 minutes (!) of CPU time for only about5% of the data, i.e. more than 400 times the CPU time consumed bythe DwarFS FUSE driver.
I've tried two more EROFS configurations on the same set of data.The first one uses more or less just the defaults:
$ time mkfs.erofs -zlz4hc,12 perl-install-lz4hc.erofs perl-installmkfs.erofs 1.7.1-gd93a18c9Build completed.------Filesystem UUID: b75142ed-6cf3-46a4-84f3-12693f7759a0Filesystem total blocks: 5847130 (of 4096-byte blocks)Filesystem total inodes: 2255794Filesystem total metadata blocks: 419699Filesystem total deduplicated bytes (of source files): 0user3:38:23.36system1:10.84total3:41:37.33
The second one additionally enables the-Ededupe
option:
$ time mkfs.erofs -zlz4hc,12 -Ededupe perl-install-lz4hc-dedupe.erofs perl-installmkfs.erofs 1.7.1-gd93a18c9Build completed.------Filesystem UUID: 0ccf581e-ad3b-4d08-8b10-5b7e15f8e3cdFilesystem total blocks: 1510091 (of 4096-byte blocks)Filesystem total inodes: 2255794Filesystem total metadata blocks: 435599Filesystem total deduplicated bytes (of source files): 19220717568user4:19:57.61system1:21.62total4:23:55.85
I don't know why these are even slower than the first, seemingly morecomplex, set of options. As was to be expected, the resulting imageswere significantly bigger:
$ ll -h perl-install*.erofs-rw-r--r-- 1 mhx mhx 5.8G Apr 16 02:46 perl-install-lz4hc-dedupe.erofs-rw-r--r-- 1 mhx mhx 23G Apr 15 22:34 perl-install-lz4hc.erofs-rw-r--r-- 1 mhx mhx 2.3G Apr 15 16:23 perl-install-lzma9.erofs
The good news is that these performmuch better and even outperformDwarFS, albeit by a small margin:
$ erofsfuse perl-install-lz4hc.erofs mnt$ find mnt -type f -print0 | xargs -0 -P16 -n64 cat | dd of=/dev/null bs=1M status=progress49920168315 bytes (50 GB, 46 GiB) copied, 16 s, 3.1 GB/s0+1493031 records in0+1493031 records out51161953159 bytes (51 GB, 48 GiB) copied, 16.4329 s, 3.1 GB/s
The deduplicated version is even a tiny bit faster:
$ erofsfuse perl-install-lz4hc-dedupe.erofs mntfind mnt -type f -print0 | xargs -0 -P16 -n64 cat | dd of=/dev/null bs=1M status=progress50808037121 bytes (51 GB, 47 GiB) copied, 16 s, 3.2 GB/s0+1499949 records in0+1499949 records out51161953159 bytes (51 GB, 48 GiB) copied, 16.1184 s, 3.2 GB/s
The EROFS kernel driver wasn't any faster than the FUSE driver.
The FUSE driver used about 27 seconds of CPU time in both cases,substantially less than before and 5 times less than DwarFS.
DwarFS can get close to the throughput of EROFS by usingzstd
insteadoflzma
compression:
$ dwarfs perl-install-1M-zstd.dwarfs mnt -oworkers=8find mnt -type f -print0 | xargs -0 -P16 -n64 cat | dd of=/dev/null bs=1M status=progress49224202357 bytes (49 GB, 46 GiB) copied, 16 s, 3.1 GB/s0+1529018 records in0+1529018 records out51161953159 bytes (51 GB, 48 GiB) copied, 16.6716 s, 3.1 GB/s
I came acrossfuse-archivewhile looking for FUSE drivers to mount archives and it seems to bethe most versatile of the alternatives (and the one that actuallycompiles out of the box).
An interesting test case straight from fuse-archive's README is inthePerformancesection: an archive with a single huge file full of zeroes. Let'smake the example a bit more extreme and use a 1 GiB file instead ofjust 256 MiB:
$ mkdir zerotest$ truncate --size=1G zerotest/zeroes
Now, we build several different archives and a DwarFS image:
$ time mkdwarfs -i zerotest -o zerotest.dwarfs -W16 --log-level=warn --progress=nonereal 0m7.604suser 0m7.521ssys 0m0.083s$ time zip -9 zerotest.zip zerotest/zeroes adding: zerotest/zeroes (deflated 100%)real 0m4.923suser 0m4.840ssys 0m0.080s$ time 7z a -bb0 -bd zerotest.7z zerotest/zeroes7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,16 CPUs Intel(R) Xeon(R) E-2286M CPU @ 2.40GHz (906ED),ASM,AES-NI)Scanning the drive:1 file, 1073741824 bytes (1024 MiB)Creating archive: zerotest.7zItems to compress: 1Files read from disk: 1Archive size: 157819 bytes (155 KiB)Everything is Okreal 0m5.535suser 0m48.281ssys 0m1.116s$ time tar --zstd -cf zerotest.tar.zstd zerotest/zeroesreal 0m0.449suser 0m0.510ssys 0m0.610s
Turns out thattar --zstd
is easily winning the compression speedtest. Looking at the file sizes did actually blow my mind just a bit:
$ ll zerotest.* --sort=size-rw-r--r-- 1 mhx users 1042231 Jul 1 15:24 zerotest.zip-rw-r--r-- 1 mhx users 157819 Jul 1 15:26 zerotest.7z-rw-r--r-- 1 mhx users 33762 Jul 1 15:28 zerotest.tar.zstd-rw-r--r-- 1 mhx users 848 Jul 1 15:23 zerotest.dwarfs
I definitely didn't expect the DwarFS image to bethat small.Dropping the section index would actually save another 100 bytes.So, if you want to archive lots of zeroes, DwarFS is your friend.
Anyway, let's look at how fast and efficiently the zeroes canbe read from the different archives. First, thezip
archive:
$ time dd if=mnt/zerotest/zeroes of=/dev/null status=progress1020117504 bytes (1.0 GB, 973 MiB) copied, 2 s, 510 MB/s2097152+0 records in2097152+0 records out1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.10309 s, 511 MB/sreal 0m2.104suser 0m0.264ssys 0m0.486s
CPU time used by the FUSE driver was 1.8 seconds and mount timewas in the milliseconds.
Now, the7z
archive:
$ time dd if=mnt/zerotest/zeroes of=/dev/null status=progress594759168 bytes (595 MB, 567 MiB) copied, 1 s, 595 MB/s2097152+0 records in2097152+0 records out1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.76904 s, 607 MB/sreal 0m1.772suser 0m0.229ssys 0m0.572s
CPU time used by the FUSE driver was 2.9 seconds and mount timewas just over 1.0 seconds.
Now, the.tar.zstd
archive:
$ time dd if=mnt/zerotest/zeroes of=/dev/null status=progress2097152+0 records in2097152+0 records out1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.799409 s, 1.3 GB/sreal 0m0.801suser 0m0.262ssys 0m0.537s
CPU time used by the FUSE driver was 0.53 seconds and mount timewas 0.13 seconds.
Last but not least, let's look at DwarFS:
$ time dd if=mnt/zeroes of=/dev/null status=progress2097152+0 records in2097152+0 records out1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.753 s, 1.4 GB/sreal 0m0.757suser 0m0.220ssys 0m0.534s
CPU time used by the FUSE driver was 0.17 seconds and mount timewas less than a millisecond.
If we increase the block size for thedd
command, we can geteven higher throughput. For fuse-archive with the.tar.zstd
:
$ time dd if=mnt/zerotest/zeroes of=/dev/null status=progress bs=1638465536+0 records in65536+0 records out1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.318682 s, 3.4 GB/sreal 0m0.323suser 0m0.005ssys 0m0.154s
And for DwarFS:
$ time dd if=mnt/zeroes of=/dev/null status=progress bs=1638465536+0 records in65536+0 records out1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.172226 s, 6.2 GB/sreal 0m0.176suser 0m0.020ssys 0m0.141s
This is all nice, but what about a more real-life use case?Let's take the 1.82.0 boost release archives:
$ ll --sort=size boost_1_82_0.*-rw-r--r-- 1 mhx users 208188085 Apr 10 14:25 boost_1_82_0.zip-rw-r--r-- 1 mhx users 142580547 Apr 10 14:23 boost_1_82_0.tar.gz-rw-r--r-- 1 mhx users 121325129 Apr 10 14:23 boost_1_82_0.tar.bz2-rw-r--r-- 1 mhx users 105901369 Jun 28 12:47 boost_1_82_0.dwarfs-rw-r--r-- 1 mhx users 103710551 Apr 10 14:25 boost_1_82_0.7z
Here are the timings for mounting each archive and then usingtar
to build another archive from the mountpoint and just countingthe number of bytes in that archive, e.g.:
$ time tar cf - mnt | wc -c803614720real 0m4.602suser 0m0.156ssys 0m1.123s
Here are the results in terms of wallclock time and FUSE driverCPU time:
Archive | Mount Time | tar Wallclock Time | FUSE Driver CPU Time |
---|---|---|---|
.zip | 0.458s | 5.073s | 4.418s |
.tar.gz | 1.391s | 3.483s | 3.943s |
.tar.bz2 | 15.663s | 17.942s | 32.040s |
.7z | 0.321s | 32.554s | 31.625s |
.dwarfs | 0.013s | 2.974s | 1.984s |
DwarFS easily wins all categories while still compressing the dataalmost as well as7z
.
What about accessing files more randomly?
$ find mnt -type f -print0 | xargs -0 -P32 -n32 cat | dd of=/dev/null status=progress
It turns out that fuse-archive grinds to a halt in this case, so I hadto run the test on a subset (theboost
subdirectory) of the data.The.tar.bz2
and.7z
archives were so slow to read that I stoppedthem after a few minutes.
Archive | Throughput | Wallclock Time | FUSE Driver CPU Time |
---|---|---|---|
.zip | 1.8 MB/s | 83.245s | 83.669s |
.tar.gz | 1.2 MB/s | 121.377s | 122.711s |
.tar.bz2 | 0.2 MB/s | - | - |
.7z | 0.3 MB/s | - | - |
.dwarfs | 598.0 MB/s | 0.249s | 1.099s |
Both the FUSE driver anddwarfsextract
by default have support forsimple performance monitoring. You can build binaries without thisfeature (-DENABLE_PERFMON=OFF
), but impact should be negligible evenif performance monitoring is enabled at run-time.
To enable the performance monitor, you pass a list of components for whichyou want to collect latency metrics, e.g.:
$ dwarfs test.dwarfs mnt -f -operfmon=fuse
When the driver exits, you will see output like this:
[fuse.op_read] samples: 45145 overall: 3.214s avg latency: 71.2us p50 latency: 131.1us p90 latency: 131.1us p99 latency: 262.1us[fuse.op_readdir] samples: 2 overall: 51.31ms avg latency: 25.65ms p50 latency: 32.77us p90 latency: 67.11ms p99 latency: 67.11ms[fuse.op_lookup] samples: 16 overall: 19.98ms avg latency: 1.249ms p50 latency: 2.097ms p90 latency: 4.194ms p99 latency: 4.194ms[fuse.op_init] samples: 1 overall: 199.4us avg latency: 199.4us p50 latency: 262.1us p90 latency: 262.1us p99 latency: 262.1us[fuse.op_open] samples: 16 overall: 122.2us avg latency: 7.641us p50 latency: 4.096us p90 latency: 32.77us p99 latency: 32.77us[fuse.op_getattr] samples: 1 overall: 5.786us avg latency: 5.786us p50 latency: 8.192us p90 latency: 8.192us p99 latency: 8.192us
The metrics should be self-explanatory. However, note that thepercentile metrics are logarithmically quantized in order to useas little resources as possible. As a result, you will only seevalues that look an awful lot like powers of two.
Currently, the supported components arefuse
for the FUSEoperations,filesystem_v2
for the DwarFS file system componentandinode_reader_v2
for the component that handles allread()
system calls.
The FUSE driver also exposes the performance monitor metrics viaanextended attribute.
This only works on Linux and usually only makes sense if you have CPUswith different types of cores (e.g. "performance" vs "efficiency" cores)and arereally trying to squeeze the last ounce of speed out of DwarFS.
By setting the environment variableDWARFS_WORKER_GROUP_AFFINITY
, youcan set the CPU affinity of different worker thread groups, e.g.:
export DWARFS_WORKER_GROUP_AFFINITY=blockify=3:compress=6,7
This will set the affinity of theblockify
worker group to CPU 3 andthe affinity of thecompress
worker group to CPUs 6 and 7.
You can use this feature for all tools that use one or more worker threadgroups. For example, the FUSE driverdwarfs
anddwarfsextract
use aworker groupblkcache
that the block cache (i.e. block decompression andlookup) runs on.mkdwarfs
uses a whole array of different worker groups,namelycompress
for compression,scanner
for scanning,ordering
forinput ordering, andblockify
for segmenting.blockify
is what you wouldtypically want to run on your "performance" cores.
About
A fast high compression read-only file system for Linux, Windows and macOS