Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Tools for synching and streaming files from Windows to Linux

License

NotificationsYou must be signed in to change notification settings

google/cdc-file-transfer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Born from the ashes of Stadia, this repository contains tools for syncing andstreaming files from Windows to Windows or Linux. The tools are based on ContentDefined Chunking (CDC), in particularFastCDC,to split up files into chunks.

History

At Stadia, game developers had access to Linux cloud instances to run games.Most developers wrote their games on Windows, though. Therefore, they needed away to make them available on the remote Linux instance.

As developers had SSH access to those instances, they could usescp to copythe game content. However, this was impractical, especially with the shift toworking from home during the pandemic with sub-par internet connections.scpalways copies full files, there is no "delta mode" to copy only the things thatchanged, it is slow for many small files, and there is no fast compression.

To help this situation, we developed two tools,cdc_rsync andcdc_stream,which enable developers to quickly iterate on their games without repeatedlyincurring the cost of transmitting dozens of GBs.

CDC RSync

cdc_rsync is a tool to sync files from a Windows machine to a Linux device,similar to the standard Linuxrsync. It isbasically a copy tool, but optimized for the case where there is already an oldversion of the files available in the target directory.

  • It quickly skips files if timestamp and file size match.
  • It uses fast compression for all data transfer.
  • If a file changed, it determines which parts changed and only transfers thedifferences.

cdc_rsync demo

The remote diffing algorithm is based on CDC. In our tests, it is up to 30xfaster than the one used inrsync (1500 MB/s vs 50 MB/s).

The following chart shows a comparison ofcdc_rsync and Linuxrsync runningunder Cygwin on Windows. The test data consists of 58 development buildsof some game provided to us for evaluation purposes. The builds are 40-45 GBlarge. For this experiment, we uploaded the first build, then synced the secondbuild with each of the two tools and measured the time. For example, syncingfrom build 1 to build 2 took 210 seconds with the Cygwinrsync, but only 75seconds withcdc_rsync. The three outliers are probably feature drops fromanother development branch, where the delta was much higher. Overall,cdc_rsync syncs files about3 times faster than Cygwinrsync.

Comparison of cdc_rsync and Linux rsync running in Cygwin

We also ran the experiment with the native Linuxrsync, i.e syncing Linux toLinux, to rule out issues with Cygwin. Linuxrsync performed on average 35%worse than Cygwinrsync, which can be attributed to CPU differences. We didnot include it in the figure because of this, but you can find ithere.

How does it work and why is it faster?

The standard Linuxrsync splits a file into fixed-size chunks of typicallyseveral KB.

Linux rsync uses fixed size chunks

If the file is modified in the middle, e.g. by insertingxxxx after567,this usually means thatthe modified chunks as well asall subsequent chunks change.

Fixed size chunks after inserting data

The standardrsync algorithm hashes the chunks of the remote "old" fileand sends the hashes to the local device. The local device then figures outwhich parts of the "new" file matches known chunks.

Syncing a file with the standard Linux rsync
Standard rsync algorithm

This is a simplification. The actual algorithm is more complicated and usestwo hashes, a weak rolling hash and a strong hash, seehere for a great overview. What makesrsync relatively slow is the "no match" situation where the rolling hash doesnot match any remote hash, and the algorithm has to roll the hash forward andperform a hash map lookup for each byte.rsync goes togreat lengthsoptimizing lookups.

cdc_rsync does not use fixed-size chunks, but instead variable-size,content-defined chunks. That means, chunk boundaries are determined by thelocal content of the file, in practice a 64 byte sliding window. For moredetails, seethe FastCDC paperor take a look atour implementation.

cdc_rsync uses variable, content-defined size chunks

If the file is modified in the middle, onlythe modifiedchunks, but notsubsequent chunkschange (unless they are less than 64 bytes away from the modifications).

Content-defined chunks after inserting data

Computing the chunk boundaries is cheap and involves only a left-shift, a memorylookup, anadd and anand operation for each input byte. This is cheaperthan the hash map lookup for the standardrsync algorithm.

Because of this, thecdc_rsync algorithm is faster than the standardrsync. It is also simpler. Since chunk boundaries move along with insertionsor deletions, the task to match local and remote hashes is a trivial setdifference operation. It does not involve a per-byte hash map lookup.

Syncing a file with cdc_rsync
cdc_rsync algorithm

CDC Stream

cdc_stream is a tool to stream files and directories from a Windows machine toa Linux device. Conceptually, it is similar tosshfs, but it is optimized for read speed.

  • It caches streamed data on the Linux device.
  • If a file is re-read on Linux after it changed on Windows, only thedifferences are streamed again. The rest is read from the cache.
  • Stat operations are very fast since the directory metadata (filenames,permissions etc.) is provided in a streaming-friendly way.

To efficiently determine which parts of a file changed, the tool uses the sameCDC-based diffing algorithm ascdc_rsync. Changes to Windows files are almostimmediately reflected on Linux, with a delay of roughly (0.5s + 0.7s x totalsize of changed files in GB).

cdc_stream demo

The tool does not support writing files back from Linux to Windows; the Linuxdirectory is readonly.

The following chart compares times from starting a game to reaching the menu.In one case, the game is streamed viasshfs, in the other case we usecdc_stream. Overall, we see a2x to 5x speedup.

Comparison of cdc_stream and sshfs

Supported Platforms

cdc_rsyncFromTo
Windows x86_641
Ubuntu 22.04 x86_642
Ubuntu 22.04 aarch64
macOS 13 x86_643
macOS 13 aarch643
cdc_streamFromTo
Windows x86_64
Ubuntu 22.04 x86_64
Ubuntu 22.04 aarch64
macOS 13 x86_643
macOS 13 aarch643

1 Only local syncs, e.g.cdc_rsync C:\src\* C:\dst. Support forremote syncs is being added, see#61.
2 See#56.
3 See#62.

Getting Started

Download the precompiled binaries from thelatest release to aWindows device and unzip them. The Linux binaries are automatically deployedto~/.cache/cdc-file-transfer by the Windows tools. There is no need to manuallydeploy them. We currently provide Linux binaries compiled onGithub's latest Ubuntu version.If the binaries work for you, you can skip the following two sections.

Alternatively, the project can be built from source. Some binaries have to bebuilt on Windows, some on Linux.

Prerequisites for Building

To build the tools from source, the following steps have to be executed onboth Windows and Linux.

  • Download and install Bazel fromhere. Seeworkflow logs for thecurrently used version.
  • Clone the repository.
    git clone https://github.com/google/cdc-file-transfer
  • Initialize submodules.
    cd cdc-file-transfergit submodule update --init --recursive

Finally, install an SSH client on the Windows machine if not present.The file transfer tools requiressh.exe andsftp.exe.

Building

The two tools CDC RSync and CDC Stream can be built and used independently.

CDC RSync

  • On a Linux device, build the Linux components
    bazel build --config linux --compilation_mode=opt --linkopt=-Wl,--strip-all --copt=-fdata-sections --copt=-ffunction-sections --linkopt=-Wl,--gc-sections //cdc_rsync_server
  • On a Windows device, build the Windows components
    bazel build --config windows --compilation_mode=opt --copt=/GL //cdc_rsync
  • Copy the Linux build output filecdc_rsync_server frombazel-bin/cdc_rsync_server tobazel-bin\cdc_rsync on the Windows machine.

CDC Stream

  • On a Linux device, build the Linux components
    bazel build --config linux --compilation_mode=opt --linkopt=-Wl,--strip-all --copt=-fdata-sections --copt=-ffunction-sections --linkopt=-Wl,--gc-sections //cdc_fuse_fs
  • On a Windows device, build the Windows components
    bazel build --config windows --compilation_mode=opt --copt=/GL //cdc_stream
  • Copy the Linux build output filescdc_fuse_fs andlibfuse.so frombazel-bin/cdc_fuse_fs tobazel-bin\cdc_stream on the Windows machine.

Usage

The tools require a setup where you can use SSH and SFTP from the Windowsmachine to the Linux device without entering a password, e.g. by using key-basedauthentication.

Configuring SSH and SFTP

By default, the tools searchssh.exe andsftp.exe from the path environmentvariable. If you can run the following commands in a Windows cmd withoutentering your password, you are all set:

ssh user@linux.device.comsftp user@linux.device.com

Here,user is the Linux user andlinux.device.com is the Linux host toSSH into or copy the file to.

If additional arguments are required, it is recommended to provide an SSH configfile. By default, bothssh.exe andsftp.exe use the file at%USERPROFILE%\.ssh\config on Windows, if it exists. A possible config filethat sets a username, a port, an identity file and a known host file could lookas follows:

Host linux_deviceHostName linux.device.comUser userPort 12345IdentityFile C:\path\to\id_rsaUserKnownHostsFile C:\path\to\known_hosts

Ifssh.exe orsftp.exe cannot be found, you can specify the full paths viathe command line arguments--ssh-command and--sftp-command forcdc_rsyncandcdc_stream start (see below), or set the environment variablesCDC_SSH_COMMAND andCDC_SFTP_COMMAND, e.g.

set CDC_SSH_COMMAND="C:\path with space\to\ssh.exe"set CDC_SFTP_COMMAND="C:\path with space\to\sftp.exe"

Note that you can also specify SSH configuration via the environment variablesinstead of using a config file:

set CDC_SSH_COMMAND=C:\path\to\ssh.exe -p 12345 -i C:\path\to\id_rsa -oUserKnownHostsFile=C:\path\to\known_hostsset CDC_SFTP_COMMAND=C:\path\to\sftp.exe -P 12345 -i C:\path\to\id_rsa -oUserKnownHostsFile=C:\path\to\known_hosts

Note the small-p forssh.exe and the capital-P forsftp.exe.

Google Specific

For Google internal usage, set the following environment variables to enable SSHauthentication using a Google security key:

set CDC_SSH_COMMAND=C:\gnubby\bin\ssh.exeset CDC_SFTP_COMMAND=C:\gnubby\bin\sftp.exe

Note that you will have to touch the security key multiple times during thefirst run. Subsequent runs only require a single touch.

CDC RSync

cdc_rsync is used similar toscp or the Linuxrsync command. To sync asingle Windows fileC:\path\to\file.txt to the home directory~ on the Linuxdevicelinux.device.com, run

cdc_rsync C:\path\to\file.txt user@linux.device.com:~

cdc_rsync understands the usual Windows wildcards* and?.

cdc_rsync C:\path\to\*.txt user@linux.device.com:~

To sync the contents of the Windows directoryC:\path\to\assets recursively to~/assets on the Linux device, run

cdc_rsync C:\path\to\assets\* user@linux.device.com:~/assets -r

To get per file progress, add-v:

cdc_rsync C:\path\to\assets\* user@linux.device.com:~/assets -vr

The tool also supports local syncs:

cdc_rsync C:\path\to\assets\* C:\path\to\destination -vr

CDC Stream

To stream the Windows directoryC:\path\to\assets to~/assets on the Linuxdevice, run

cdc_stream start C:\path\to\assets user@linux.device.com:~/assets

This makes all files and directories inC:\path\to\assets available on~/assets immediately, as if it were a local copy. However, data is streamedfrom Windows to Linux as files are accessed.

To stop the streaming session, enter

cdc_stream stop user@linux.device.com:~/assets

The command also accepts wildcards. For instance,

cdc_stream stop user@*:*

stops all existing streaming sessions for the given user.

Troubleshooting

On first run,cdc_stream starts a background service, which does all the work.Thecdc_stream start andcdc_stream stop commands are just RPC clients thattalk to the service.

The service logs to%APPDATA%\cdc-file-transfer\logs by default. The logs areuseful to investigate issues with asset streaming. To pass custom arguments, orto debug the service, create a JSON config file at%APPDATA%\cdc-file-transfer\cdc_stream.json with command line flags.For instance,

{ "verbosity":3 }

instructs the service to log debug messages. Trycdc_stream start-service -hfor a list of available flags. Alternatively, run the service manually with

cdc_stream start-service

and pass the flags as command line arguments. When you run the service manually,the flag--log-to-stdout is particularly useful as it logs to the consoleinstead of to the file.

cdc_rsync always logs to the console. To increase log verbosity, pass-vvvfor debug logs or-vvvv for verbose logs.

For both sync and stream, the debug logs contain all SSH and SFTP commands thatare attempted to run, which is very useful for troubleshooting. If a commandfails unexpectedly, copy it and run it in isolation. Pass-vv or-vvv foradditional debug output.

About

Tools for synching and streaming files from Windows to Linux

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp