Fossil achieves efficient storage and low-bandwidth synchronizationthrough the use of delta-compression. Instead of storingor transmitting the complete content of an artifact, Fossil stores ortransmits only the changes relative to a related artifact.
This document describes the delta-encoding format used by Fossil.The intended audience is developers working on eitherFossil itself, or on tools compatible withFossil. Understanding of this document isnot required forordinary users of Fossil. This document is an implementation detail.
This document only describes the delta file format. Aseparate document describes one possiblealgorithm for generating deltas in this format.
The core routines used to generate and apply deltas in this formatare contained in thedelta.c source file. Interfacelogic, including "test-*" commands to directly manipulate deltas arecontained in thedeltacmd.c source file. SQL functionsto create, apply, and analyze deltas are implemented by code in thedeltafunc.c source file.
The following command-line tools are available to create and applydeltas and to test the delta logic:
When running thefossil sql command to get aninteractive SQL session connected to the repository, the followingadditional SQL functions are provided:
leftmargin = 0.1 box height 50% "Header" box same "Segments" box same "Trailer"→ /pikchrshow
A delta consists of three parts, a "header", a "trailer", and a"segment-list" between them.
Both header and trailer provide information about the targethelping the decoder, and the segment-list describes how the target canbe constructed from the original.
leftmargin = 0.1 box height 50% "Size" box same "\"\\n\""→ /pikchrshow
The header consists of a single number followed by a newlinecharacter (ASCII 0x0a). The number is the length of the target inbytes.
This means that, given a delta, the decoder can compute the size ofthe target (and allocate any necessary memory based on that) by simplyreading the first line of the delta and decoding the number foundthere. In other words, before it has to decode everything else.
leftmargin = 0.1 box height 50% "Checksum" box same "\";\""→ /pikchrshow
The trailer consists of a single number followed by a semicolon (ASCII0x3b). This number is a checksum of the target and can be used by adecoder to verify that the delta applied correctly, reconstructing thetarget from the original.
The checksum is computed by treating the target as a series of32-bit integer numbers (MSB first), and summing these up, modulo2^32-1. A target whose length is not a multiple of 4 is padded with0-bytes (ASCII 0x00) at the end.
By putting this information at the end of the delta a decoder hasit available immediately after the target has been reconstructedfully.
leftmargin = 0.1 PART1: [ B1: box height 50% width 15% "" B2: box same "" B3: box same "" "***" box height 50% width 15% "" I1: line down 50% from B2 .s arrow right until even with B3.e box "Insert Literal" height 50% line down 75% from I1 .s arrow right until even with B3.e box "Copy Range" height 50% ] down PART2: [ "" box "Length" height 50% right box "\":\"" same box "Bytes" same ] with .nw at previous.sw→ /pikchrshow
The segment-list of a delta describes how to create the target fromthe original by a combination of inserting literal byte-sequences andcopying ranges of bytes from the original. This is where thecompression takes place, by encoding the large common parts oforiginal and target in small copy instructions.
The target is constructed from beginning to end, with the datagenerated by each instruction appended after the data of all previousinstructions, with no gaps.
A literal is specified by two elements, the size of the literal inbytes, and the bytes of the literal itself.
leftmargin = 0.1 box "Length" height 50% box "\":\"" same box "Bytes" same→ /pikchrshow
The length is written first, followed by a colon character (ASCII0x3a), followed by the bytes of the literal.
A range to copy is specified by two numbers, the offset of thefirst byte in the original to copy, and the size of the range, inbytes. The size zero is special, its usage indicates that the rangeextends to the end of the original.
leftmargin = 0.1 box "Length" height 50% box "\"@\"" same box "Offset" same box "\",\"" same→ /pikchrshow
The length is written first, followed by an "at" character (ASCII0x40), then the offset, followed by a comma (ASCII 0x2c).
The format currently handles only 32 bit integer numbers. They arewritten base-64 encoded, MSB first, and without leading"0"-characters, except if they are significant (i.e. 0 => "0").
The base-64 encoding uses one character for each 6 bits ofthe integer to be encoded. The encoding characters are:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~
The least significant 6 bits of the integer are encoded bythe first character, followed bythe next 6 bits, and so on until all non-zero bits of the integerare encoded. The minimum number of encoding characters is used.Note that for integers less than 10, the base-64 coding is aASCII decimal rendering of the number itself.
Value | Encoding |
---|---|
0 | 0 |
6246 | 1Xb |
-1101438770 | 2zMM3E |
An example of a delta using the specified encoding is:
1Xb4E@0,2:thFN@4C,6:scenda1B@Jd,6:scenda5x@Kt,6:pieces79@Qt,F: Example: eskil~E@Y0,2zMM3E;
This can be taken apart into the following parts:
What | Encoding | Meaning | Details |
---|---|---|---|
Header | 1Xb | Size | 6246 |
S-List | 4E@0, | Copy | 270 @ 0 |
2:th | Literal | 2 'th' | |
FN@4C, | Copy | 983 @ 268 | |
6:scenda | Literal | 6 'scenda' | |
1B@Jd, | Copy | 75 @ 1256 | |
6:scenda | Literal | 6 'scenda' | |
5x@Kt, | Copy | 380 @ 1336 | |
6:pieces | Literal | 6 'pieces' | |
79@Qt, | Copy | 457 @ 1720 | |
F: Example: eskil | Literal | 15 ' Example: eskil' | |
~E@Y0, | Copy | 4046 @ 2176 | |
Trailer | 2zMM3E | Checksum | -1101438770 |
The unified diff behind the above delta is
bluepeak:(761) ~/Projects/Tcl/Fossil/Devel/devel > diff -u ../DELTA/old ../DELTA/new--- ../DELTA/old 2007-08-23 21:14:40.000000000 -0700+++ ../DELTA/new 2007-08-23 21:14:33.000000000 -0700@@ -5,7 +5,7 @@ * If the server does not have write permission on the database file, or on the directory containing the database file (and- it is thus unable to update database because it cannot create+ it is thus unable to update the database because it cannot create a rollback journal) then it currently fails silently on a push. It needs to return a helpful error.@@ -27,8 +27,8 @@ * Additional information displayed for the "vinfo" page: + All leaves of this version that are not included in the- descendant list. With date, user, comment, and hyperlink.- Leaves in the descendant table should be marked as such.+ descendant list. With date, user, comment, and hyperlink.+ Leaves in the descendant table should be marked as such. See the compute_leaves() function to see how to find all leaves. + Add file diff links to the file change list.@@ -37,7 +37,7 @@ * The /xfer handler (for push, pull, and clone) does not do delta compression. This results in excess bandwidth usage.- There are some code in xfer.c that are sketches of ideas on+ There are some pieces in xfer.c that are sketches of ideas on how to do delta compression, but nothing has been implemented. * Enhancements to the diff and tkdiff commands in the cli.@@ -45,7 +45,7 @@ single file. Allow diffs against any two arbitrary versions, not just diffs against the current check-out. Allow configuration options to replace tkdiff with some other- visual differ of the users choice.+ visual differ of the users choice. Example: eskil. * Ticketing interface (expand this bullet)