Data validation and change detection

Cloud Storage encourages you to validate the data you transferto and from your buckets. This page describes best practices for performingvalidations using eitherCRC32C or MD5 checksums and describes the changedetection algorithm used by thegcloud storage rsync command.

Protect against data corruption by using hashes

There are a variety of ways that data can be corrupted while uploading to ordownloading from the Cloud:

  • Memory errors on client or server computers, or routers along the path
  • Software bugs (e.g., in a library that customers use)
  • Changes to the source file when an upload occurs over an extended period oftime

Cloud Storage supports two types of hashes you can use to check the integrityof your data: CRC32C and MD5. CRC32C is the recommended validation methodfor performing integrity checks. Customers that prefer MD5 can use that hash,but MD5 hashes arenot supported for all objects.

Client-side validation

You can perform a integrity check for downloaded data by hashing the data on thefly and comparing your results to the server-supplied checksums. Note, however,that the server-supplied checksums are based on the complete object as it'sstored in Cloud Storage, which means the following types of downloadscan't be validated using the server-supplied hashes:

  • Downloads which undergodecompressive transcoding, because theserver-supplied checksum represents the object in its compressed state, whilethe served data has compression removed and consequently a different hashvalue.

  • A response that contains only aportion of the object data, which occurswhen making arange request. Cloud Storage recommends using rangerequests only for restarting the download of a full object after the lastreceived offset, because in that case you can compute and validate thechecksum when the full download completes.

In cases where you can validate the download using checksums, you should discarddownloaded data with incorrect hash values and use therecommended retry logic to retry the request.

Server-side validation

Cloud Storage performs server-side validation in the following cases:

  • When you perform a copy or rewrite request within Cloud Storage.

    • The server-side validation is performed automatically byCloud Storage based on anon-editable checksum stored withthe source object.
  • When you supply an object's expected MD5 or CRC32C hash in an upload request.Cloud Storage only creates the object if the hash you provide matches thevalue Cloud Storage calculates. If it does not match, the request isrejected with aBadRequestException: 400 error.

    • In applicable JSON API requests, you supply checksums as part of theobjects resource.

    • In applicable XML API requests, you supply checksums using thex-goog-hash header. The XML API also accepts the standard HTTPContent-MD5 header (see thespecification).

    • Alternatively, you can perform client-side validation of your uploads byissuing a request for the uploaded object's metadata, comparing theuploaded object's hash value to the expected value, and deleting theobject in case of a mismatch. This method is useful if the object's MD5 orCRC32C hash isn't known at the start of the upload.

In the case ofparallel composite uploads, users should perform anintegrity check for each component upload and then usepreconditions withtheir compose requests to protect against race conditions. Compose requestsdon't perform server-side validation, so users who want an end-to-end integritycheck should perform client-side validation on the new composite object.

Google Cloud CLI validation

For theGoogle Cloud CLI, data copied to or from a Cloud Storagebucket is validated. This applies tocp,mv, andrsync commands. If thechecksum of the source data does not match the checksum of the destination data,the gcloud CLI deletes the invalid copy and prints a warning message.This very rarely happens. If it does, you should retry the operation.

This automatic validation occurs after the object itself is finalized, soinvalid objects are visible for 1-3 seconds before they're identified anddeleted. Additionally, there is a chance that the gcloud CLI could beinterrupted after the upload completes but before it performs the validation,leaving the invalid object in place. These issues can be avoided when uploadingsingle files to Cloud Storage by usingserver-side validation,which occurs when using the--content-md5 flag.

Note: The--content-md5 flag is ignored forobjects that don't have an MD5 hash.

Change detection forrsync

Thegcloud storage rsync command can also use MD5 or CRC32C checksums todetermine if there is a difference between the version of an object found at thesource and the version found at the destination. The command compares checksumsin the following cases:

  • The source and destination are both cloud buckets and the object has anMD5 or CRC32C checksum in both buckets.

  • The object does not have a file modification time (mtime) in either thesource or destination.

In cases where the relevant object has anmtime value in both the source anddestination, such as when the source and destination are file systems, thersync command compares the objects' size andmtime, instead of usingchecksums. Similarly, if the source is a cloud bucket and the destination is alocal file system, thersync command uses the time created for the sourceobject as a substitute formtime, and the command does not use checksums.

If neithermtime nor checksums are available,rsync only compares file sizeswhen determining if there is a change between the source version of an objectand the destination version. For example, neithermtime nor checksums areavailable when comparingcomposite objects with objects at a cloud providerthat doesn't support CRC32C, because composite objects don't have MD5 checksums.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.