- Notifications
You must be signed in to change notification settings - Fork28
BSD-licensed implementation of rsync
License
kristapsdz/openrsync
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This system has been merged into OpenBSD base. If you'd like tocontribute to openrsync, please mail your patches totech@openbsd.org.This repository is simply the OpenBSD version plus some glue forportability.
This is an implementation ofrsync with aBSD (ISC) license. It's compatible with a modern rsync (3.1.3 is usedfor testing, but any supporting protocol 27 will do), but accepts only asubset of rsync's command-line arguments.
Its officially-supported operating system is OpenBSD, but it willcompile and run on other UNIX systems. SeePortabilityfor details.
The canonical documentation for openrsync is its manual pages. Seersync(5)andrsyncd(5)for protocol details or utility documentation inopenrsync(1).If you'd like to write your own rsync implementation, the protocolmanpages should have all the information required.
TheArchitecture andAlgorithm sectionson this page serve to introduce developers to the source code. They arenon-canonical.
openrsync is written as part of therpki-client(1)project, anRPKIvalidator for OpenBSD. openrsync was funded byNetNod,IIS.SE,SUNET and6connect.
On an up-to-date UNIX system, simply download and run:
% ./configure% make# make install
This will install the openrsync utility and manual pages.It's ok to have an installation of rsync at the same time: the two willnot collide in any way.
If you upgrade your sources and want to re-install, just run the same.If you'd like to uninstall the sources:
# make uninstall
If you'd like to interact with the openrsync as a server, you can runthe following:
% rsync --rsync-path=openrsync src/* dst% openrsync --rsync-path=openrsync src/* dst
If you'd like openrsync and rsync to interact, it's important to usecommand-line flags available on both.Seeopenrsync(1)for a listing.
For a robust description of the rsync algorithm, see "The rsyncalgorithm", by Andrew Tridgelland Paul Mackerras.Andrew Tridgell's PhD thesis, "Efficient Algorithms for Sorting andSynchronization", covers thetopics in more detail.This gives a description suitable for delving into the source code.
The rsync algorithm has two components: thesender and thereceiver.The sender manages source files; the receiver manages the destination.In the following invocation, first the sender is hostremote and thereceiver is the localhost, then the opposite.
% openrsync -lrtp remote:foo/bar ~/baz/xyzzy% openrsync -lrtp ~/foo/bar remote:baz/xyzzy
The algorithm hinges upon a file list of names and metadata (e.g., mode,mtime, etc.) shared between components.The file list describes all source files of the update and is generatedby the sender.The sharing is implemented inflist.c.
After sharing this list, both the receiver and sender independently sortthe entries by the filenames' lexicographical order.This allows the file list to be sent and received out of order.The ordering preserves a directory-first order, so directories areprocessed before their contained files.Moreover, once sorted, both sender and receiver may refer to fileentries by their position in the sorted array.
After the receiver reads the list, it iterates through each file inthe list, passing information to the sender so that the sender may sendback instructions to update the file.This is called the "block exchange" and is the maintstay of the rsyncalgorithm.During the block exchange, the sender waits to receive a request forupdate or end of sequence message; once a request is received, it scansfor new blocks to send to the receiver.
Once the block exchange is complete, the files are all up to date.
The receiver is implemented inreceiver.c;the sender, insender.c.A great deal of the block exchange happens inblocks.c.
The block exchange sequence is different for whether the file is adirectory, symbolic link, or regular file.
For symbolic links, the information required by the receiver is alreadyencoded in the file list metadata.The symbolic link is updated to point to the correct target.No update is requested from the sender.
For directories, the directory is created if it does not already exist.No update is requested from the sender.
Regular files are handled as follows.First, the file is checked to see if it's up to date.This happens if the file size and last modification time are the same.If so, no update is requested from the sender.
Otherwise, the receiver examines each file in blocks of a fixed size.SeeBlock sizes for details.(The terminal block may be smaller if the file size is not divisible bythe block size.)If the file is empty or does not exist, it will have zero blocks.Each block is hashed twice: first, with a fast Adler-32 type 4-bytehash; second, with a slower MD4 16-byte hash.These hashes are implemented inhash.c.The receiver sends the file's block hashes to the sender.
Once accepted, the sender examines the corresponding file with the givenblocks.For each byte in the source file, the sender computes a fast hash giventhe block size.It then looks for matching fast hashes in the sent block information.If it finds a match, it then computes and checks the slow hash.If no match is found, it continues to the next byte.The matching (and indeed all block operation) is implemented inblock.c.
When a match is found, the data prior to the match is first sent as astream of bytes to the receiver.This is followed by an identifier for the found block, or zero if nomore data is forthcoming.
The receiver writes the stream of bytes first, then copies the data inthe identified block if one has been specified.This continues until the end of file, at which point the file has beenfully reconstituted.
If the file does not exist on the receiver side---the basis case---theentire file is sent as a stream of bytes.
Following this, the whole file is hashed using an MD4 hash.These hashes are then compared; and on success, the algorithm continuesto the next file.
The block size algorithm plays a crucial role in the protocolefficiency.In general, the block size is the rounded square root of the total filesize.The minimum block size, however, is 700 B.Otherwise, the square root computation is simplysqrt(3) followed byceil(3)
For reasons unknown, the square root result is rounded up to the nearestmultiple of eight.
Each openrsync session is divided into a runningserver andclientprocess.The client openrsync process is executed by the user.
% openrsync -rlpt host:path/to/source dest
The server openrsync is executed on a remote host either on-demand overssh(1) or as a persistent networkdaemon.If executed overssh(1), the serveropenrsync is distinguished from a client (user-started) openrsync by the--server flag.
Once the client or server openrsync process starts, it examines thecommand-line arguments to determine whether it's inreceiver orsender mode.(The daemon is sent the command-line arguments in a protocol-specificway described inrsyncd(5),but otherwise does the same thing.)The receiver is the destination for files; the sender is the origin.There is always one receiver and one sender.
The server process is explicitly instructed that it is a sender with the--sender command-line flag, otherwise it is a receiver.The client process implicitly determines its status by looking at thefiles passed on the command line for whether they are local or remote.
openrsync path/to/source host:destinationopenrsync host:source path/to/destination
In the first example, the client is the sender: itsends data fromitself to the server.In the second, the opposite is true in that itreceives data.
The client's command-line files may have any of the following hostspecifications that determine locality.
- local:../path/to/source ../another
- remote server:host:path/to/source :path/to/another
- remote daemon:rsync://host/module/path ::another
Host specifications must be consistent: sources must all be local or allbe remote on the same host. Both may not be remote. (Aside: it'stechnically possible to do this. I'm not sure why the GPL rsync islimited to one or the other.)
If the source or destination is on a remote server, the client thenfork(2)s and starts the serveropenrsync on the remote host overssh(1).The client and the server subsequently communicate oversocketpair(2) pipes.If on a remote daemon, the client doesnot fork, but instead connectsto the standalone server with a networksocket(2).
The server's command-line, whether passed to an openrsync spawned on-demandover anssh(1) session or passed to the daemon,differs from the client's.
openrsync --server [--sender] . files...
The files given are either the single destination directory when in receivermode, or the list of sources when in sender mode.The standalone full-stop is a mystery to me.
Locality detection and routing to client and server run-times arehandled inmain.c.The client for a server is implemented inclient.cand the server inserver.c.The client for a network daemon is insocket.c.Invocation of the remote server openrsync is managed inchild.c.
Once the client and server begin, they start to negotiate the transferof files over the connected socket.The protocol used is specified inrsync(5).For daemon connections, thersyncd(5)protocol is also used for handshaking.
The receiver side is managed inreceiver.cand the sender insender.c.
The receiver side technically has two functions: not only must it uploadblock metadata to the sender, it must also handle data writes as theyare sent by the sender.The rsync protocol is designed so that the sender receives blockrequests and continuously sends data to the receiver.
To accomplish this, the receiver multitasks as theuploader anddownloader. These roles are implemented inuploader.c.anddownloader.c,respectively.The multitasking takes place by a finite state machine driven by datacoming from the sender and files on disc are they are ready to bechecksummed and uploaded.
The uploader scans through the list of files and asynchronously opensfiles to process blocks.While it waits for the files to open, it relinquishes control to theevent loop.When files are available, it hashes and checksums blocks and uploads tothe sender.
The downloader waits on data from the sender.When data is ready (and prefixed by the file it will update), thedownloader asynchronously opens the existing file to perform any blockcopying.When the file is available for reading, it then continues to read datafrom the sender and copy from the existing file.
The design of rsync involves another mode running alongside thereceiver: the generator.This is implemented as another processfork(2)ed from the receiver, andcommunicating with the receiver and sender.
In openrsync, the generator and receiver are one process, and an eventloop is used for speedy responses to read and write requests.
Besides the usual defensive programming, openrsync makes significant useof native security features.
The system operations available to executing code are foremost limitedby OpenBSD'spledge(2). The pledgesgiven depend upon the operating mode. For example, the receiver needswrite access to the disc---but only when not in dry-run mode (-n).The daemon client needs DNS and network access, but only to a point.pledge(2) allows available resourcesto be limited over the course of operation.
The second tool is OpenBSD'sunveil(2), which limits access tothe file-system. This protects against rogue attempts to "break out" ofthe destination. It's an attractive alternative tochroot(2) because it doesn't requireroot permissions to execute.
On the receiver side, the file-system isunveil(2)ed at and beneath thedestination directory.After the creation of the destination directory, only targets withinthat directory may be accessed or modified.
Lastly, the MD4 hashs are seeded witharc4random(3) instead of withtime(3). This is only applicable whenrunning openrsync in server mode, as the server generates the seed.
Many have asked about portability.
The only officially-supported operating system is OpenBSD, as this hasconsiderable security features. openrsync does, however, useoconfigure for compilationon non-OpenBSD systems. This is to encourage porting.
It currently is portable across Linux (glibc and musl), FreeBSD, NetBSD,Mac OS X, and OmniOS. This is enforced by the GitHub CI mechanism,which tests on this systems. Architectures tested for include x86_64,aarch64, and s390x.
The actual work of porting is matching the security features provided byOpenBSD'spledge(2) andunveil(2). These are criticalelements to the functionality of the system. Without them, your systemaccepts arbitrary data from the public network.
This is possible (I think?) with FreeBSD'sCapsicum, but Linux's securityfacilities are a mess, and will take an expert hand to properly secure.
rsync has specific running modes for the super-user.It also pumps arbitrary data from the network onto your file-system.openrsync is about 10 000 lines of C code: do you trust me not to makemistakes?
About
BSD-licensed implementation of rsync