Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

🧵 A HTTP(S) mirroring tool written in Rust, proposed for ustcmirror

License

NotificationsYou must be signed in to change notification settings

taoky/tsumugu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A HTTP(S) syncing tool with lower overhead, for OSS mirrors.

Instead ofHEADing every single file, tsumugu parses directory listing HTML and downloads only files that do not seem to be up-to-date.

Design goals

To successfully sync from these domains, where lftp/rclone fails or finds difficulties:

TODOs

  • Add "--include": Sync even if the file is excluded by--exclude regex.
  • Add supported Debian, Ubuntu, Fedora and RHEL versions support to--include regex.
    • Something like--include debian/${DEBIAN_VERSIONS}?
  • Check for APT/YUM repo integrity (avoid keeping old invalid metadata files)
    • (This is experimental and may not work well)

Usage

>./tsumugu --helpA HTTP(S) syncing tool with lower overhead, for OSS mirrorsUsage: tsumugu <COMMAND>Commands:  sync  Sync files from upstream to local  list  List files from upstream  help  Print this message or the help of the given subcommand(s)Options:  -h, --help     Print help  -V, --version  Print version>./tsumugu sync --helpSync files from upstream to localUsage: tsumugu sync [OPTIONS] <UPSTREAM> <LOCAL>Arguments:  <UPSTREAM>  The upstream URL  <LOCAL>     The local directoryOptions:      --user-agent <USER_AGENT>          Customize tsumugu's user agent [default: tsumugu]      --dry-run          Do not download files and cleanup      --threads <THREADS>          Threads at work [default: 2]      --no-delete          Do not clean up after sync      --max-delete <MAX_DELETE>          Set max delete count [default: 100]      --timezone-file <TIMEZONE_FILE>          You can set a valid URL for guessing. Set it to "no" to disable this behavior. By default it would recursively find the first file to HEAD for guessing      --timezone <TIMEZONE>          Manually set timezone (+- hrs). This overrides timezone_file      --retry <RETRY>          Retry count for each request [default: 3]      --head-before-get          Do an HEAD before actual GET. Otherwise when head-before-get and allow-time-from-parser are not set, when GETting tsumugu would try checking if we still need to download it      --parser <PARSER>          Choose a main parser [default: nginx] [possible values: nginx, apache-f2, docker, directory-lister, lighttpd, caddy, fancy-index, gradle, fallback]      --parser-match <PARSER_MATCH>          Choose supplementary parsers. Format: "parsername:matchpattern". matchpattern is a relative path regex. Supports multiple      --exclude <EXCLUDE>          Excluded relative path regex. Supports multiple      --include <INCLUDE>          Included relative path regex (even if excluded). Supports multiple      --skip-if-exists <SKIP_IF_EXISTS>          Skip relative path regex if they exist. Supports multiple      --compare-size-only <COMPARE_SIZE_ONLY>          Relative path regex for those compare size only **after** HEAD (head_before_get on) or GET (head_before_get off)      --trust-mtime-from-parser          Allow mtime from parser if not available from HTTP headers [aliases: allow-mtime-from-parser]      --apt-packages          (Experimental) APT Packages file parser to find out missing packages      --yum-packages          (Experimental) YUM Packages file parser to find out missing packages      --ignore-nonexist          Ignore 404 NOT FOUND as error when downloading files      --auto-fallback          Allow automatically choose fallback parser when ParseError occurred      --header <HEADER>          Custom header for HTTP(S) requests in format "Headerkey: headervalue". Supports multiple      --exclusion-v2          The exclusion v2 mode. To keep compatibility, this is off by default  -h, --help          Print help  -V, --version          Print version>./tsumugu list --helpList files from upstreamUsage: tsumugu list [OPTIONS] <UPSTREAM>Arguments:  <UPSTREAM>  The upstream URLOptions:      --user-agent <USER_AGENT>        Customize tsumugu's user agent [default: tsumugu]      --parser <PARSER>                Choose a main parser [default: nginx] [possible values: nginx, apache-f2, docker, directory-lister, lighttpd, caddy, fancy-index, gradle, fallback]      --exclude <EXCLUDE>              Excluded relative path regex. Supports multiple      --include <INCLUDE>              Included relative path regex (even if excluded). Supports multiple      --upstream-base <UPSTREAM_BASE>  The upstream base starting with "/" [default: /]      --header <HEADER>                Custom header for HTTP(S) requests in format "Headerkey: headervalue". Supports multiple      --exclusion-v2                   The exclusion v2 mode. To keep compatibility, this is off by default  -h, --help                           Print help  -V, --version                        Print version

For a very brief introduction of parser, see./docs/parser.md.

Exit code

  • 0: Success
  • 1: Failed to list
  • 2: Failed to download
  • 3: A panic!() occurred
  • 4: Error when cleaning up
  • 25: The limit stopped deletions

Building with musl

Unfortunately, this requires openssl-sys, which is not included in cross's prebuilt images. Tryhttps://github.com/clux/muslrust.

Evaluation

Default concurrency is 2 threads.

(Note: Please seeexamples for latest commands to sync.)

Proxmox uses a self-hosted CDN server architecture, and unfortunately its server limits concurrency to only 1 (as far as I could test). With traditional lftp/rclone it could take > 10 hours to sync once (even when your local files are identical with remote ones).

Note: Consider usingProxmox Offline Mirror or other tools likeapt-mirror if you only need its APT repository.

>time ./tsumugu sync --threads 1 --dry-run --exclude'^temp' http://download.proxmox.com/ /srv/repo/proxmox/...real1m48.746suser0m3.468ssys0m3.385s

We usea special script for syncing docker-ce before, but tsumugu can also handle this now. And also, for 30x inside linux/centos/ and linux/rhel/, tsumugu could create symlinks as what this script do before.

>time ./tsumugu sync --timezone-file https://download.docker.com/linux/centos/docker-ce-staging.repo --parser docker --dry-run https://download.docker.com/ /srv/repo/docker-ce/...real8m32.674suser0m4.532ssys0m2.855s

lftp/rclone fails to handle complex HTML.

>time ./tsumugu sync --parser apache-f2 --dry-run --exclude'^mageia' --exclude'^macosx' --exclude'^debian' --exclude'^ubuntu' --exclude'^fedora' --include'^debian/dists/${DEBIAN_CURRENT}' --include'^ubuntu/dists/${UBUNTU_LTS}' --include'^fedora/${FEDORA_CURRENT}' https://dl.winehq.org/wine-builds/ /srv/repo/wine/wine-builds/...<TIMESTAMP>  INFO ThreadId(01) tsumugu: (Estimated) Total objects: 17514, total size: 342.28 GiBreal0m5.664suser0m1.475ssys0m0.294s

Notes

Yuki integration

Seehttps://github.com/ustclug/ustcmirror-images#tsumugu.

YAML example:

envs:UPSTREAM:http://download.proxmox.com/TSUMUGU_EXCLUDE:--exclude ^temp --exclude pmg/dists/.+changelog$ --exclude devel/dists/.+changelog$TSUMUGU_TIMEZONEFILE:http://download.proxmox.com/images/aplinfo.datTSUMUGU_THREADS:1image:ustcmirror/tsumugu:latestinterval:12 3 * * *logRotCycle:10name:proxmoxstorageDir:/srv/repo/proxmox/

More examples inexamples/.

Regex variables

See./src/regex_manager/mod.rs.

Exclusion and inclusion

There's a breaking change since 20240902. User regexes with^ and$ would be affected.

See./docs/exclusion.md.

Deduplication

Tsumugu relies on local file size and mtime to check if file shall be downloaded. Some file-level deduplicators likejdupes would ignore file mtime when deduplicating with hard links. This could be an issue for some repos, as some files would be redownloaded again and again every time as it does not have a correct mtime locally.

Workarounds:

  • Set--compare-size-only.
  • Use filesystem-level/block-level deduplication likezfs dedup.
  • Use another file-level deduplicator which considers mtime (though I don't know which would do this).

Also, if you are sure that some directory is identical with another, you could manually create a symlink for that. Tsumugu would ignore symlinks during syncing.

Acknowledgements

Special thanks toNJU Mirror for extensive testing and bug reporting.

Naming

The name "tsumugu", and current branch name "pudding", are derived from the mangaA Drift Girl and a Noble Moon.

And...tsumugu, drawn as simplified version of hitori

Tsumugu in the appearance of a very simplified version of Hitori (Obviously I am not very good at drawing though).

Old (2020), unfinished golang version is named as "traverse", under themain-old branch.

About

🧵 A HTTP(S) mirroring tool written in Rust, proposed for ustcmirror

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp