Movatterモバイル変換

[0]ホーム

Jump to content

Data corruption

Edit links

From Wikipedia, the free encyclopedia

Errors in computer data that introduce unintended changes to the original data

"Corrupted" redirects here. For the Japanese metal band, seeCorrupted (band).

Photo data corruption; in this case, a result of a failed data recovery from a hard disk drive

Data corruption is the undesired alteration incomputer data that occurs during writing, reading, storage, transmission, or processing. Computer systems use a number of measures to provide end-to-enddata integrity, or lack of errors.

In general, when data corruption occurs, afile containing that data will produce unexpected results when accessed by the system or the related application. Results could range from a minor loss of data to a system crash. For example, if adocument file is corrupted, when a person tries to open that file with a document editor they may get anerror message, thus the file might not be opened or might open with some of the data corrupted (or in some cases, completely corrupted, leaving the document unintelligible). The adjacent image is a corrupted image file in which most of the information has been lost.

Some types ofmalware may intentionally corrupt files as part of theirpayloads, usually by overwriting them with inoperative or garbage code, while a non-malicious virus may also unintentionally corrupt files when it accesses them. If a virus ortrojan with this payload method manages to alter files critical to the running of the computer's operating system software or physical hardware, the entire system may be rendered unusable.

Some programs can give a suggestion to repair the file automatically (after the error), and some programs cannot repair it. It depends on the level of corruption, and the built-in functionality of the application to handle the error. There are various causes of the corruption.

Overview

[edit]

Screen output of an Atari 2600 with corrupted RAM.

A video that has been corrupted, displaying bright flashing light and color

There are two types of data corruption associated with computer systems: undetected and detected. Undetected data corruption, also known assilent data corruption, results in the most dangerous errors as there is no indication that the data is incorrect. Detecteddata corruption may be permanent with the loss of data, or may be temporary when some part of the system is able to detect and correct the error; there is no data corruption in the latter case.

Data corruption can occur at any level in a system, from the host to the storage medium. Modern systems attempt to detect corruption at many layers and then recover or correct the corruption; this is almost always successful but very rarely the information arriving in the systems memory is corrupted and can cause unpredictable results.

Data corruption during transmission has a variety of causes. Interruption of data transmission causesinformation loss. Environmental conditions can interfere with data transmission, especially when dealing with wireless transmission methods. Heavy clouds can block satellite transmissions. Wireless networks are susceptible to interference from devices such as microwave ovens.

Hardware and software failure are the two main causes fordata loss.Background radiation,head crashes, andaging or wear of the storage device fall into the former category, while software failure typically occurs due tobugs in the code.Cosmic rays cause mostsoft errors in DRAM.^[1]

Silent

[edit]

Some errors go unnoticed, without being detected by the disk firmware or the host operating system; these errors are known assilent data corruption.^[2]

There are many error sources beyond the disk storage subsystem itself. For instance, cables might be slightly loose, the power supply might be unreliable,^[3] external vibrations such as a loud sound,^[4] the network might introduce undetected corruption,^[5]cosmic radiation and many other causes ofsoft memory errors, etc. In 39,000 storage systems that were analyzed, firmware bugs accounted for 5–10% of storage failures.^[6] The error rates as observed by aCERN study on silent corruption are far higher than one in every 10¹⁶ bits.^[7]Amazon Web Services acknowledged that data corruption was the cause of a widespread outage of theirAmazon S3 storage network in 2008.^[8] In 2021, faulty processor cores were identified as an additional cause in publications by Google and Facebook; cores were found to be faulty at a rate of several in thousands of cores.^[9]^[10]

One problem is that hard disk drive capacities have increased substantially, but their error rates remain unchanged. The data corruption rate has always been roughly constant in time, meaning that modern disks are not much safer than old disks. In old disks the probability of data corruption was very small because they stored tiny amounts of data. In modern disks the probability is much larger because they store much more data, whilst not being safer. That way, silent data corruption has not been a serious concern while storage devices remained relatively small and slow. In modern times and with the advent of larger drives and very fast RAID setups, users are capable of transferring 10¹⁶ bits in a reasonably short time, thus easily reaching the data corruption thresholds.^[11]

As an example,ZFS creator Jeff Bonwick stated that the fast database atGreenplum, which is a database software company specializing in large-scale data warehousing and analytics, faces silent corruption every 15 minutes.^[12] As another example, a real-life study performed byNetApp on more than 1.5 million HDDs over 41 months found more than 400,000 silent data corruptions, out of which more than 30,000 were not detected by the hardware RAID controller (only detected duringscrubbing).^[13] Another study, performed byCERN over six months and involving about 97 petabytes of data, found that about 128 megabytes of data became permanently corrupted silently somewhere in the pathway from network to disk.^[14]

Silent data corruption may result incascading failures, in which the system may run for a period of time with undetected initial error causing increasingly more problems until it is ultimately detected.^[15] For example, a failure affecting file systemmetadata can result in multiple files being partially damaged or made completely inaccessible as the file system is used in its corrupted state.

Countermeasures

[edit]

See also:Error detection and correction

When data corruption behaves as aPoisson process, where eachbit of data has an independently low probability of being changed, data corruption can generally be detected by the use ofchecksums, and can often becorrected by the use oferror correcting codes (ECC).

If an uncorrectable data corruption is detected, procedures such as automatic retransmission or restoration frombackups can be applied. Certain levels ofRAID disk arrays have the ability to store and evaluateparity bits for data across a set of hard disks and can reconstruct corrupted data upon the failure of a single or multiple disks, depending on the level of RAID implemented. SomeCPU architectures employ various transparent checks to detect and mitigate data corruption inCPU caches,CPU buffers andinstruction pipelines; an example isIntel Instruction Replay technology, which is available onIntel Itanium processors.^[16]

Many errors are detected and corrected by the hard disk drives using the ECC codes^[17] which are stored on disk for each sector. If the disk drive detects multiple read errors on a sector it may make a copy of the failing sector on another part of the disk, by remapping the failed sector of the disk to a spare sector without the involvement of the operating system (though this may be delayed until the next write to the sector). This "silent correction" can be monitored usingS.M.A.R.T. and tools available for most operating systems to automatically check the disk drive for impending failures by watching for deteriorating SMART parameters.

Somefile systems, such asBtrfs,HAMMER,ReFS, andZFS, use internal data andmetadata checksumming to detect silent data corruption. In addition, if a corruption is detected and the file system uses integrated RAID mechanisms that providedata redundancy, such file systems can also reconstruct corrupted data in a transparent way.^[18] This approach allows improved data integrity protection covering the entire data paths, which is usually known asend-to-end data protection, compared with other data integrity approaches that do not span different layers in the storage stack and allow data corruption to occur while the data passes boundaries between the different layers.^[19]

Data scrubbing is another method to reduce the likelihood of data corruption, as disk errors are caught and recovered from before multiple errors accumulate and overwhelm the number of parity bits. Instead of parity being checked on each read, the parity is checked during a regular scan of the disk, often done as a low priority background process. The "data scrubbing" operation activates a parity check. If a user simply runs a normal program that reads data from the disk, then the parity would not be checked unless parity-check-on-read was both supported and enabled on the disk subsystem.

If appropriate mechanisms are employed to detect and remedy data corruption, data integrity can be maintained. This is particularly important in commercial applications (e.g.banking), where an undetected error could either corrupt a database index or change data to drastically affect an account balance, and in the use ofencrypted orcompressed data, where a small error can make an extensive dataset unusable.^[7]

References

[edit]

^Scientific American (2008-07-21)."Solar Storms: Fast Facts".Nature Publishing Group.Archived from the original on 2010-12-26. Retrieved2009-12-08.
^"Silent Data Corruption". Google Inc. 2023. RetrievedJanuary 30, 2023.Silent Data Corruption (SDC), sometimes referred to as Silent Data Error (SDE), is an industry-wide issue impacting not only long-protected memory, storage, and networking, but also computer CPUs.
^Eric Lowe (16 November 2005)."ZFS saves the day(-ta)!".Oracle – Core Dumps of a Kernel Hacker's Brain – Eric Lowe's Blog. Oracle. Archived fromthe original(Blog) on 5 February 2012. Retrieved9 June 2012.
^bcantrill (31 December 2008)."Shouting in the Datacenter"(Video file).YouTube.Archived from the original on 3 July 2012. Retrieved9 June 2012.
^jforonda (31 January 2007)."Faulty FC port meets ZFS"(Blog).Blogger – Outside the Box.Archived from the original on 26 April 2012. Retrieved9 June 2012.
^"Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics"(PDF). USENIX.Archived(PDF) from the original on 2022-01-25. Retrieved2014-01-18.
^^a ^bBernd Panzer-Steindel (8 April 2007)."Draft 1.3".Data integrity. CERN.Archived from the original on 27 October 2012. Retrieved9 June 2012.
^"AWS Service Availability".status.aws.amazon.com. Archived fromthe original on December 25, 2008. Retrieved11 July 2025.
^Hochschild, Peter H.; Turner, Paul Jack; Mogul, Jeffrey C.; Govindaraju, Rama Krishna; Ranganathan, Parthasarathy; Culler, David E.; Vahdat, Amin (2021)."Cores that don't count"(PDF).Proceedings of the Workshop on Hot Topics in Operating Systems. pp. 9–16.doi:10.1145/3458336.3465297.ISBN 9781450384384.S2CID 235311320.Archived(PDF) from the original on 2021-06-03. Retrieved2021-06-02.
^HotOS 2021: Cores That Don't Count (Fun Hardware), 27 May 2021,archived from the original on 2021-12-22, retrieved2021-06-02
^"Silent data corruption in disk arrays: A solution". NEC. 2009. Archived fromthe original(PDF) on 29 October 2013. Retrieved14 December 2020.
^"A Conversation with Jeff Bonwick and Bill Moore". Association for Computing Machinery. November 15, 2007.Archived from the original on 16 July 2011. Retrieved14 December 2020.
^David S. H. Rosenthal (October 1, 2010)."Keeping Bits Safe: How Hard Can It Be?".ACM Queue.Archived from the original on December 17, 2013. Retrieved2014-01-02.; Bairavasundaram, L., Goodson, G., Schroeder, B., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H. 2008. An analysis of data corruption in the storage stack. In Proceedings of 6th Usenix Conference on File and Storage Technologies.
^Kelemen, P.Silent corruptions(PDF). 8th Annual Workshop on Linux Clusters for Super Computing.
^David Fiala; Frank Mueller; Christian Engelmann; Rolf Riesen; Kurt Ferreira; Ron Brightwell (November 2012)."Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing"(PDF).fiala.me.IEEE.Archived(PDF) from the original on 2014-11-07. Retrieved2015-01-26.
^Steve Bostian (2012)."Rachet Up Reliability for Mission-Critical Applications: Intel Instruction Replay Technology"(PDF).Intel.Archived(PDF) from the original on 2016-02-02. Retrieved2016-01-27.
^"Read Error Severities and Error Management Logic".Archived from the original on 7 April 2012. Retrieved4 April 2012.
^Margaret Bierman; Lenz Grimmer (August 2012)."How I Use the Advanced Capabilities of Btrfs".Oracle Corporation.Archived from the original on 2014-01-02. Retrieved2014-01-02.
^Yupu Zhang; Abhishek Rajimwale;Andrea Arpaci-Dusseau; Remzi H. Arpaci-Dusseau (2010)."End-to-end data integrity for file systems: a ZFS case study"(PDF).USENIX Conference on File and Storage Technologies.CiteSeerX 10.1.1.154.3979.S2CID 5722163.Wikidata Q111972797. Retrieved2014-08-12.

External links

[edit]

v t e Data
Acquisition Augmentation Analysis Anonymization Archaeology Big Cleansing Collection Compression Corruption Curation Deduplication Degradation De-identification Ecosystem Editing Engineering Erasure ETL/ELT Extract Transform Load Ethics Exhaust Exploration Farming Format management Fusion Governance Cooperatives Infrastructure Integration Integrity Library Lineage Loss Management Meta Migration Mining Philanthropy Pre-processing Preservation Processing Protection (privacy) Publishing Open data Recovery Reduction Redundancy Re-identification Remanence Rescue Retention Quality Science Scraping Scrubbing Security Sharing Stewardship Storage Structure Synchronization Topological data analysis Type Validation Warehouse Wrangling/munging

Data

Retrieved from "https://en.wikipedia.org/w/index.php?title=Data_corruption&oldid=1321149983"

Categories:

Hidden categories:

[8]ページ先頭