Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Adam Mateusz Brożyński
Adam Mateusz Brożyński

Posted on

     

How to copy lots of data fast?

The best way to copy a lot of data in Linux fast is usingtar. It's much faster thancp or copying in any file manager. Here's a simple command with progress bar (pv needs to be installed) that needs to be executed inside a folder that you want to copy recursively:

$ tar cf - . | pv | (cd /destination/path; tar xf -)

Top comments(3)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss
CollapseExpand
 
bbkr profile image
Paweł bbkr Pabian
Raku / Rust / Perl programmer. Databases tamer. And optimization maniac.
  • Location
    Gdańsk
  • Education
    Gdańsk University of Technology
  • Work
    GetResponse
  • Joined
• Edited on• Edited

This is not universal advice.

Tar (for younger audience - Tape ARchiving ) is a way to present set of files as single continuous file that can be streamed to/from magnetic tape storage. So for this method to be advantageous additional CPU time on both sides must be lower than doing full synchronous disk operations sequentially for each file. Sotar | xxx | tar:

  • Will befaster in network transfers, much faster thanscp that needs to proces each file sequentially and wait for network confirmation from other side.
  • Will beslightly slower for filesystems with very fast SSDs.
  • Will bemuch slower for filesystems with Copy on Write, that do not copy actual file until some changes were made to copied version.

I just did some quick benchmark on 20G repository with 3000 files on APFS filesystem on PCIe NVMe 3.0 disk and:

  • tar | tar took 20s
  • regularcp -r managed to finish in 12s
  • cp -r -c (Copy on Write) finished in 1.3s
CollapseExpand
 
ordigital profile image
Adam Mateusz Brożyński
.
  • Location
    Częstochowa, Poland
  • Work
    Full stack developer at Ordigital
  • Joined
• Edited on• Edited

I had to copy files from HDD to SSD, SSD to SSD, SSD to NVMe 3 and NVMe3 to NVMe3. It was from 80GB to 2TB data folders.Tar was always extremely faster. Whattar did in minutes, it tookcp hours. So from my perspective it is universal advice if someone has lots of data (which for me is more than 100GB) and there's a lot of small files there (cp fails completely in this case). I guess if someone is just copying few big filescp could work, but it's not the case in most backup situations.

CollapseExpand
 
bbkr profile image
Paweł bbkr Pabian
Raku / Rust / Perl programmer. Databases tamer. And optimization maniac.
  • Location
    Gdańsk
  • Education
    Gdańsk University of Technology
  • Work
    GetResponse
  • Joined

Hours? I think something was really off with your system configuration (journaling issues? incomplete RAID array? not enough PCIe lanes? kernel setup?). That was extremely slow even for SATA 3.0 SSD which should crunch 2TB folder with moderate amount of files in it in ~1h using purecp.

Anyway -taris helpful when full, synchronous roundabout of copying single file is costly. But for those cases I preferfind | parallel combo because:

  • it performs nearly identical totar
  • can take advantage of CoW if copy is on the same disk
  • actual method of copying can be easily swapped -cp,scp,rsync, etc.

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

.
  • Location
    Częstochowa, Poland
  • Work
    Full stack developer at Ordigital
  • Joined

Trending onDEV CommunityHot

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp