Movatterモバイル変換


[0]ホーム

URL:


US10157199B2 - Data storage integrity validation - Google Patents

Data storage integrity validation
Download PDF

Info

Publication number
US10157199B2
US10157199B2US15/286,473US201615286473AUS10157199B2US 10157199 B2US10157199 B2US 10157199B2US 201615286473 AUS201615286473 AUS 201615286473AUS 10157199 B2US10157199 B2US 10157199B2
Authority
US
United States
Prior art keywords
data
storage
job
partition size
verification value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US15/286,473
Other versions
US20170024428A1 (en
Inventor
Kestutis Patiejunas
Colin L. Lazier
Mark C. Seigle
Bryan J. Donlan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Amazon Technologies Inc
Original Assignee
Amazon Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/569,994external-prioritypatent/US9213709B2/en
Priority claimed from US13/569,665external-prioritypatent/US8959067B1/en
Priority claimed from US13/569,984external-prioritypatent/US9779035B1/en
Priority claimed from US13/570,057external-prioritypatent/US10120579B1/en
Priority claimed from US13/570,074external-prioritypatent/US9225675B2/en
Priority claimed from US13/570,029external-prioritypatent/US9092441B1/en
Priority claimed from US13/569,714external-prioritypatent/US9830111B1/en
Priority claimed from US13/569,591external-prioritypatent/US9354683B2/en
Priority claimed from US13/570,005external-prioritypatent/US9250811B1/en
Priority claimed from US13/570,030external-prioritypatent/US9652487B1/en
Priority claimed from US13/570,088external-prioritypatent/US9767098B2/en
Priority claimed from US13/570,092external-prioritypatent/US9563681B1/en
Application filed by Amazon Technologies IncfiledCriticalAmazon Technologies Inc
Priority to US15/286,473priorityCriticalpatent/US10157199B2/en
Publication of US20170024428A1publicationCriticalpatent/US20170024428A1/en
Application grantedgrantedCritical
Publication of US10157199B2publicationCriticalpatent/US10157199B2/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Definitions

Landscapes

Abstract

Embodiments of the present disclosure are directed to, among other things, validating the integrity of received and/or stored data payloads. In some examples, a storage service may perform a first partitioning of a data object into first partitions based at least in part on a first operation. The storage service may also verify the data object, by utilizing a verification algorithm, to generate a first verification value. In some cases, the storage service may additionally perform a second partitioning of the data object into second partitions based at least in part on a second operation. The second partitions may be different from the first partitions. Additionally, the archival data storage service may verify the data object using the verification algorithm to generate a second verification value. Further, the storage service may determine whether the second verification value equals the first verification value.

Description

CROSS-REFERENCES TO RELATED APPLICATIONS
This application is a continuation application of U.S. patent application Ser. No. 14/456,844, filed on Aug. 11, 2014, entitled “DATA STORAGE INTEGRITY VALIDATION,” which is a continuation of U.S. patent application Ser. No. 13/570,151, filed Aug. 8, 2012, entitled “DATA STORAGE INTEGRITY VALIDATION,” the content of which are incorporated by reference herein in their entirety. This application also incorporates by reference for all purposes the full disclosure of co-pending U.S. patent application Ser. No. 13/569,984, filed Aug. 8, 2012, entitled “LOG-BASED DATA STORAGE ON SEQUENTIALLY WRITTEN MEDIA,” co-pending U.S. patent application Ser. No. 13/570,057, filed Aug. 8, 2012, entitled “DATA STORAGE MANAGEMENT FOR SEQUENTIALLY WRITTEN MEDIA,” co-pending U.S. patent application Ser. No. 13/570,005, filed Aug. 8, 2012, entitled “DATA WRITE CACHING FOR SEQUENTIALLY WRITTEN MEDIA,” co-pending U.S. patent application Ser. No. 13/570,030, filed Aug. 8, 2012, entitled “PROGRAMMABLE CHECKSUM CALCULATIONS ON DATA STORAGE DEVICES,” co-pending U.S. patent application Ser. No. 13/569,994, filed Aug. 8, 2012, entitled “ARCHIVAL DATA IDENTIFICATION,” co-pending U.S. patent application Ser. No. 13/570,029, filed Aug. 8, 2012, entitled “ARCHIVAL DATA ORGANIZATION AND MANAGEMENT,” co-pending U.S. patent application Ser. No. 13/570,092, filed Aug. 8, 2012, entitled “ARCHIVAL DATA FLOW MANAGEMENT,” co-pending U.S. patent application Ser. No. 13/570,088, filed Aug. 8, 2012, entitled “ARCHIVAL DATA STORAGE SYSTEM,” co-pending U.S. patent application Ser. No. 13/569,591, filed Aug. 8, 2012, entitled “DATA STORAGE POWER MANAGEMENT,” co-pending U.S. patent application Ser. No. 13/569,714, filed Aug. 8, 2012, entitled “DATA STORAGE SPACE MANAGEMENT,” co-pending U.S. patent application Ser. No. 13/570,074, filed Aug. 8, 2012, entitled “DATA STORAGE APPLICATION PROGRAMMING INTERFACE,” and co-pending U.S. patent application Ser. No. 13/569,665, filed Aug. 8, 2012, entitled “DATA STORAGE INVENTORY INDEXING.”
BACKGROUND
As more and more information is converted to digital form, the demand for durable and reliable data storage services is ever increasing. In particular, archive records, backup files, media files and the like may be maintained or otherwise managed by government entities, businesses, libraries, individuals, etc. However, the storage of digital information, especially for long periods of time, has presented some challenges. In some cases, the cost of long-term data storage may be prohibitive to many because of the potentially massive amounts of data to be stored, particularly when considering archival or backup data. Additionally, durability and reliability issues may be difficult to solve for such large amounts of data and/or for data that is expected to be stored for relatively long periods of time. Magnetic tapes have traditionally been used in data backup systems because of the low cost. However, tape-based storage systems have been unable to fully exploit storage technology advances. Additionally, drive-based storage systems may have difficulty in validating data integrity without preserving artifacts of information about how the data was broken up during upload.
BRIEF DESCRIPTION OF THE DRAWINGS
Various embodiments in accordance with the present disclosure will be described with reference to the drawings, in which:
FIG. 1 illustrates an example flow for describing an implementation of the data integrity validation described herein, according to at least one example.
FIG. 2 illustrates an example environment in which archival data storage services may be implemented, in accordance with at least one embodiment.
FIG. 3 illustrates an interconnection network in which components of an archival data storage system may be connected, in accordance with at least one embodiment.
FIG. 4 illustrates an interconnection network in which components of an archival data storage system may be connected, in accordance with at least one embodiment.
FIG. 5 illustrates an example process for storing data, in accordance with at least one embodiment.
FIG. 6 illustrates an example process for retrieving data, in accordance with at least one embodiment.
FIG. 7 illustrates an example process for deleting data, in accordance with at least one embodiment.
FIGS. 8 and 9 illustrate block diagrams for describing at least some features of the data integrity validation described here, according to at least some examples.
FIGS. 10-12 illustrate example flow diagrams of one or more processes for implementing at least some features of the data integrity validation described herein, according to at least some examples.
FIG. 13 illustrates an environment in which various embodiments can be implemented.
DETAILED DESCRIPTION
In the following description, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the embodiment being described.
Embodiments of the present disclosure are directed to, among other things, validating or otherwise verifying the integrity of data payloads or portions of data payloads intended for storage. In some examples, an archival data storage service may be configured to receive requests to store data, in some cases large amounts of data, in logical data containers or other archival storage devices for varying, and often relatively long, periods of time. In some examples, the archival data storage service may operate or otherwise utilize many different storage devices. For example, the archival storage service may include one or more disk drives that, when operating, utilize spinning magnetic media. Additionally, the archival data storage service may include one or more racks located in one or more geographic locations (e.g., potentially associated to different postal codes). Each rack may further include one or more, or even hundreds of, hard drives such as, but not limited to, disk drives or the like.
In some aspects, the archival data storage service may provide storage, access, and/or placement of one or more computing resources through a service such as, but not limited to, a web service, a remote program execution service or other network-based data management service. For example, a user, client entity, computing resource or other computing device may access, via the archival data storage service, data storage, and/or management such that access mechanisms may be implemented and/or provided to the user or computing device. In some examples, computing resource services, such as those provided by the archival data storage service, may include one or more computing resources accessible across one or more networks through user interfaces (UIs), application programming interfaces (APIs) and/or other interfaces where the one or more computing resources may be scalable and/or expandable as desired.
In some examples, the archival data storage service may enable users or client entities such as, but not limited to, third-party services that utilize the archival data storage service or other web services associated with the archival data storage service to upload data for potentially long and persistent storage. Unless otherwise contradicted explicitly or clearly by context, the term “user” is used herein to describe any entity utilizing the archival data storage service. The archival data storage service may also wish to ensure the integrity of the archived data and/or guarantee the integrity of the archived data. In order to accomplish these goals, in some examples, the archival data storage service may provide a data object identifier for identifying the data once uploaded. In some cases, the data object identifier may also include a top-level tree digest for validating the archived data even after some extended period of time. Additionally, the data object identifier may also be configured in such a way that its integrity may be validated as well.
The archival data storage service may be configured to perform operations on data to be stored, such that the data is broken into parts, portions or other demarcated groupings. Once separated into parts, the archival data storage service may perform one or more encryption functions and/or algorithms on the parts including, but not limited to, a hash function or other cryptographic or other method to produce a digest (referred to also as a hash value, a hash code, a checksum, etc.). In some examples, the data may be validated such that the archival storage service can ensure that the data being stored matches the data received. In this way, data integrity may be validated. Additionally, the sender of the data may be able to independently determine a data chunk size (i.e., the size of the parts of the payload) and not be requested to maintain or persist this information. For example, a user or client entity may request to upload a 1 gigabyte (GB) file in 2 megabyte (MB) portions (chunks). And without saving or otherwise persisting the fact that 2 MB chunks were used, the data payload may still be validated. That is, based at least in part on generating one or more digests, checksums, hash codes, etc., for the chunks and at least a digest for the payload (in some examples based at least in part on combinations of the digests), the data may later be partitioned into different sized chunks without losing the ability to be validated.
In some aspects, this may be accomplished by providing instructions or an algorithm to a user or client entity that indicates the way in which the data should be partitioned and/or hashed. Additionally, in some cases, the archival storage service may expose or otherwise provide one or more API method calls and/or a software development kit (SDK) to enable users to appropriately upload the data in chunks and to facilitate the requested order of operations and/or inclusion of appropriate checksum information. For example, the users may be requested to select a chunk size of at least 1 MB, or some other predefined size, for uploading the data. As used herein, the data being uploaded by a user or client entity and/or stored by the archival data storage service may be referred to as a payload. Additionally, the user may select or otherwise instruct the archival data storage service that the payload is to be partitioned into sizes including powers of two of 1 MB (e.g., 1 MB, 2 MB, 4 MB, 8 MB, 16 MB, etc.). In other examples, the payload may be partitioned into sizes including other multiples of a predefined size (e.g., 1 MB). The other multiples may be based at least in part on the degree of children of a tree used to represent the data. For example, if a binary tree is used, the integer multiple may include integer powers of two, if a trinary tree is used, the integer multiple may be integer powers of three, if a tree of degree four (i.e., each node may have four children) is used, the integer multiple may include integer powers of four, and so on. The user may then follow an algorithm for generating one or more hash trees (e.g., one hash tree per part when the payload is partitioned) and/or one or more hash values for each partition. Additionally, the user may generate a hash value (or digest) for each 1 MB chunk independent of the selected partition size. In some examples, these digests corresponding to the 1 MB chunks of a partition may be included in a hash tree for the partition. Further, in some examples, a root node of each partition's hash tree may be provided to the archival data storage service along with the digests for each 1 MB sub-part of the partition. In this way, each sub-part may be validated by the archival data storage service, based at least in part on the 1 MB digests, and each partition may be validated by the archival data storage service, based at least in part on the root digest for each part. Additionally, a final root hash associated with a hash of each root hash for the parts may be used by the archival data storage service, based at least in part on comparing the received final root hash and a top-level hash value determined by the archival data storage service.
In at least one example, generating a hash tree or other hierarchical data structure (e.g., other types of binary trees such as, but not limited to, B-trees and/or other types of data structures such as, but not limited to, arrays, records, etc.) may include concatenating digests and running a hash function on the concatenated digest. For example, in a binary hash tree, a root node may have two children represented by one hash value each. In some cases, generating the root hash value may be based at least in part on concatenating the two children hash values to form a new piece of data and further running the hash function on the new piece of data. The resulting hash value may represent the root hash. As such, each partition of a payload may have its own root hash, although its root hash may be used in calculating the top-level hash for the payload. In some examples, it may also be possible to validate the payload and/or portions of the payload without recalling or otherwise persisting the partition size chosen by the user.
FIG. 1 depicts anillustrative flow100 in which techniques for the validation of data integrity may be implemented. These techniques are described in more detail below in connection withFIGS. 8-12. Returning toFIG. 1, inillustrative flow100, operations may be performed by one or more processors of an archival data storage service and/or instructions that for performing the operations may be stored in one or more memories of the archival data storage service. As desired, theflow100 may begin at102, where the archival data storage service may receive one or more parts of adata payload104. In some examples, thedata payload104 may include any number of parts; however, in this example two parts are shown,Part1 andPart2. Each ofPart1 andPart2 may includedata106 and108, respectively. In some cases, the size ofPart1 andPart2 may be selected by the uploader and/or may be the same. However, in some examples, the last part of adata payload104 may be a different size from all the other consistently sized parts (e.g., as shown here inFIG. 1, whereinPart1 is bigger than Part2). At110, theflow100 may generate sub-parts of the parts of thepayload104. In some examples, the size of the sub-parts may be predefined by the archival data storage service (e.g., 1 MB).
In some examples, theflow100 may calculate a digest for each sub-part at112. The respective digests may be stored asnodes114 of a data structure such as, but not limited to, thedata structure116 generated at118. By way of example only, thedata structure116 may include one or more sub-part digests (e.g., at nodes114) and/or one or more part digests (e.g.,Part1 digest120 andPart2 digest122). Additionally, at124, theflow100 may determine a root digest126 for the root of thedata structure116. In some examples, the root digest126 may be determined or generated based at least in part on concatenating part digests and calculating a digest for the concatenated digests. Theflow100 may end at128 where the archival data storage service may verify that the receivedpayload104 matches a storedpayload130. The stored payload may, in some examples, containdata132 determined based at least in part on combining each of theparts106,108, and/or sub-parts, when received. In some examples, verifying the data payload may be based at least in part on comparing the root digest126 against a second root digest received from the uploader.
FIG. 2 illustrates anexample environment200 in which an archival data storage system may be implemented, in accordance with at least one embodiment. One ormore customers202 connect, via anetwork204, to an archivaldata storage system206. As implied above, unless otherwise clear from context, the term “customer” refers to the system(s) of a customer entity (such as an individual, company, or other organization) that utilizes data storage services described herein. Such systems may include datacenters, mainframes, individual computing devices, distributed computing environments and customer-accessible instances thereof or any other system capable of communicating with the archival data storage system. In some embodiments, a customer may refer to a machine instance (e.g., with direct hardware access) or virtual instance of a distributed computing system provided by a computing resource provider that also provides the archival data storage system. In some embodiments, the archival data storage system is integral to the distributed computing system and may include or be implemented by an instance, virtual or machine, of the distributed computing system. In various embodiments,network204 may include the Internet, a local area network (“LAN”), a wide area network (“WAN”), a cellular data network, and/or other data network.
In an embodiment, archivaldata storage system206 provides a multi-tenant or multi-customer environment where each tenant or customer may store, retrieve, delete or otherwise manage data in a data storage space allocated to the customer. In some embodiments, an archivaldata storage system206 comprises multiple subsystems or “planes” that each provides a particular set of services or functionalities. For example, as illustrated inFIG. 2, archivaldata storage system206 includesfront end208, control plane for direct I/O210, common control plane212,data plane214, and metadata plane216. Each subsystem or plane may comprise one or more components that collectively provide the particular set of functionalities. Each component may be implemented by one or more physical and/or logical computing devices, such as computers, data storage devices, and the like. Components within each subsystem may communicate with components within the same subsystem, components in other subsystems or external entities such as customers. At least some of such interactions are indicated by arrows inFIG. 2. In particular, the main bulk data transfer paths in and out of archivaldata storage system206 are denoted by bold arrows. It will be appreciated by those of ordinary skill in the art that various embodiments may have fewer or a greater number of systems, subsystems, and/or subcomponents than are illustrated inFIG. 2. Thus, the depiction ofenvironment200 inFIG. 2 should be taken as being illustrative in nature and not limiting to the scope of the disclosure.
In the illustrative embodiment,front end208 implements a group of services that provides an interface between the archivaldata storage system206 and external entities, such as one ormore customers202 described herein. In various embodiments,front end208 provides an application programming interface (“API”) to enable a user to programmatically interface with the various features, components and capabilities of the archival data storage system. Such APIs may be part of a user interface that may include graphical user interfaces (GUIs), Web-based interfaces, programmatic interfaces such as application programming interfaces (APIs) and/or sets of remote procedure calls (RPCs) corresponding to interface elements, messaging interfaces in which the interface elements correspond to messages of a communication protocol, and/or suitable combinations thereof.
Capabilities provided by archivaldata storage system206 may include data storage, data retrieval, data deletion, metadata operations, configuration of various operational parameters and the like. Metadata operations may include requests to retrieve catalogs of data stored for a particular customer, data recovery requests, job inquires and the like. Configuration APIs may allow customers to configure account information, audit logs, policies, notifications settings and the like. A customer may request the performance of any of the above operations by sending API requests to the archival data storage system. Similarly, the archival data storage system may provide responses to customer requests. Such requests and responses may be submitted over any suitable communications protocol, such as Hypertext Transfer Protocol (“HTTP”), File Transfer Protocol (“FTP”), and the like, in any suitable format, such as REpresentational State Transfer (“REST”), Simple Object Access Protocol (“SOAP”), and the like. The requests and responses may be encoded, for example, using Base64 encoding, encrypted with a cryptographic key or the like.
In some embodiments, archivaldata storage system206 allows customers to create one or more logical structures, such as a logical data containers, in which to store one or more archival data objects. As used herein, data object is used broadly and does not necessarily imply any particular structure or relationship to other data. A data object may be, for instance, simply a sequence of bits. Typically, such logical data structures may be created to meeting certain business requirements of the customers and are independently of the physical organization of data stored in the archival data storage system. As used herein, the term “logical data container” refers to a grouping of data objects. For example, data objects created for a specific purpose or during a specific period of time may be stored in the same logical data container. Each logical data container may include nested data containers or data objects and may be associated with a set of policies, such as size limit of the container, maximum number of data objects that may be stored in the container, expiration date, access control list, and the like. In various embodiments, logical data containers may be created, deleted or otherwise modified by customers via API requests, by a system administrator or by the data storage system, for example, based on configurable information. For example, the following HTTP PUT request may be used, in an embodiment, to create a logical data container with name “logical-container-name” associated with a customer identified by an account identifier “accountId”.
    • PUT /{accountId}/logical-container-name HTTP/1.1
In an embodiment, archivaldata storage system206 provides the APIs for customers to store data objects into logical data containers. For example, the following HTTP POST request may be used, in an illustrative embodiment, to store a data object into a given logical container. In an embodiment, the request may specify the logical path of the storage location, data length, reference to the data payload, a digital digest of the data payload and other information. In one embodiment, the APIs may allow a customer to upload multiple data objects to one or more logical data containers in one request. In another embodiment where the data object is large, the APIs may allow a customer to upload the data object in multiple parts, each with a portion of the data object.
    • POST /{accountId}/logical-container-name/data HTTP/1.1
    • Content-Length: 1128192
    • x-ABC-data-description: “annual-result-2012.xls”
    • x-ABC-md5-tree-hash: 634d9a0688aff95c
In response to a data storage request, in an embodiment, archivaldata storage system206 provides a data object identifier if the data object is stored successfully. Such data object identifier may be used to retrieve, delete or otherwise refer to the stored data object in subsequent requests. In some embodiments, such as data object identifier may be “self-describing” in that it includes (for example, with or without encryption) storage location information that may be used by the archival data storage system to locate the data object without the need for an additional data structures such as a global namespace key map. In addition, in some embodiments, data object identifiers may also encode other information, such as payload digest, error-detection code, access control data, and the other information that may be used to validate subsequent requests and data integrity. In some embodiments, the archival data storage system stores incoming data in a transient durable data store before moving it archival data storage. Thus, although customers may perceive that data is persisted durably at the moment when an upload request is completed, actual storage to a long-term persisted data store may not commence until sometime later (e.g., 12 hours later). In some embodiments, the timing of the actual storage may depend on the size of the data object, the system load during a diurnal cycle, configurable information, such as a service-level agreement between a customer and a storage service provider and other factors.
In some embodiments, archivaldata storage system206 provides the APIs for customers to retrieve data stored in the archival data storage system. In such embodiments, a customer may initiate a job to perform the data retrieval and may learn the completion of the job by a notification or by polling the system for the status of the job. As used herein, a “job” refers to a data-related activity corresponding to a customer request that may be performed temporally independently from the time the request is received. For example, a job may include retrieving, storing and deleting data, retrieving metadata and the like. A job may be identified by a job identifier that may be unique, for example, among all the jobs for a particular customer. For example, the following HTTP POST request may be used, in an illustrative embodiment, to initiate a job to retrieve a data object identified by a data object identifier “dataObjectId.” In other embodiments, a data retrieval request may request the retrieval of multiple data objects, data objects associated with a logical data container and the like.
    • POST /{accountId}/logical-data-container-name/data/{dataObjectId} HTTP/1.1
In response to the request, in an embodiment, archivaldata storage system206 provides a job identifier job-id,” that is assigned to the job in the following response. The response provides, in this example, a path to the storage location where the retrieved data will be stored.
    • HTTP/1.1 202 ACCEPTED
    • Location: /{accountId}/logical-data-container-name/jobs/{job-id}
At any given point in time, the archival data storage system may have many jobs pending for various data operations. In some embodiments, the archival data storage system may employ job planning and optimization techniques, such as batch processing, load balancing, job coalescence, and the like, to optimize system metrics, such as cost, performance, scalability, and the like. In some embodiments, the timing of the actual data retrieval depends on factors such as the size of the retrieved data, the system load and capacity, active status of storage devices, and the like. For example, in some embodiments, at least some data storage devices in an archival data storage system may be activated or inactivated according to a power management schedule, for example, to reduce operational costs. Thus, retrieval of data stored in a currently active storage device (such as a rotating hard drive) may be faster than retrieval of data stored in a currently inactive storage device (such as a spinned-down hard drive).
In an embodiment, when a data retrieval job is completed, the retrieved data is stored in a staging data store and made available for customer download. In some embodiments, a customer is notified of the change in status of a job by a configurable notification service. In other embodiments, a customer may learn of the status of a job by polling the system using a job identifier. The following HTTP GET request may be used, in an embodiment, to download data that is retrieved by a job identified by “job-id,” using a download path that has been previously provided.
    • GET /{accountId}/logical-data-container-name/jobs/{job-id}/output HTTP/1.1
In response to the GET request, in an illustrative embodiment, archivaldata storage system206 may provide the retrieved data in the following HTTP response, with a tree-hash of the data for verification purposes.
    • HTTP/1.1 200 OK
    • Content-Length: 1128192
    • x-ABC-archive-description: “retrieved stuff”
    • x-ABC-md5-tree-hash: 693d9a7838aff95c
    • [1112192 bytes of user data follows]
In an embodiment, a customer may request the deletion of a data object stored in an archival data storage system by specifying a data object identifier associated with the data object. For example, in an illustrative embodiment, a data object with data object identifier “dataObjectId” may be deleted using the following HTTP request. In another embodiment, a customer may request the deletion of multiple data objects, such as those associated with a particular logical data container.
    • DELETE /{accountId}/logical-data-container-name/data/{dataObjectId} HTTP/1.1
In various embodiments, data objects may be deleted in response to a customer request or may be deleted automatically according to a user-specified or default expiration date. In some embodiments, data objects may be rendered inaccessible to customers upon an expiration time but remain recoverable during a grace period beyond the expiration time. In various embodiments, the grace period may be based on configurable information, such as customer configuration, service-level agreement terms, and the like. In some embodiments, a customer may be provided the abilities to query or receive notifications for pending data deletions and/or cancel one or more of the pending data deletions. For example, in one embodiment, a customer may set up notification configurations associated with a logical data container such that the customer will receive notifications of certain events pertinent to the logical data container. Such events may include the completion of a data retrieval job request, the completion of metadata request, deletion of data objects or logical data containers and the like.
In an embodiment, archivaldata storage system206 also provides metadata APIs for retrieving and managing metadata, such as metadata associated with logical data containers. In various embodiments, such requests may be handled asynchronously (where results are returned later) or synchronously (where results are returned immediately).
Still referring toFIG. 2, in an embodiment, at least some of the API requests discussed above are handled byAPI request handler218 as part offront end208. For example,API request handler218 may decode and/or parse an incoming API request to extract information, such as uniform resource identifier (“URI”), requested action and associated parameters, identity information, data object identifiers and the like. In addition,API request handler218 invoke other services (described below), where necessary, to further process the API request.
In an embodiment,front end208 includes anauthentication service220 that may be invoked, for example, byAPI handler218, to authenticate an API request. For example, in some embodiments,authentication service220 may verify identity information submitted with the API request, such as username and password Internet Protocol (“IP) address, cookies, digital certificate, digital signature, and the like. In other embodiments,authentication service220 may require the customer to provide additional information or perform additional steps to authenticate the request, such as required in a multifactor authentication scheme, under a challenge-response authentication protocol, and the like.
In an embodiment,front end208 includes anauthorization service222 that may be invoked, for example, byAPI handler218, to determine whether a requested access is permitted according to one or more policies determined to be relevant to the request. For example, in one embodiment,authorization service222 verifies that a requested access is directed to data objects contained in the requestor's own logical data containers or which the requester is otherwise authorized to access. In some embodiments,authorization service222 or other services offront end208 may check the validity and integrity of a data request based at least in part on information encoded in the request, such as validation information encoded by a data object identifier.
In an embodiment,front end208 includes ametering service224 that monitors service usage information for each customer, such as data storage space used, number of data objects stored, data requests processed, and the like. In an embodiment,front end208 also includesaccounting service226 that performs accounting and billing-related functionalities based, for example, on the metering information collected by themetering service224, customer account information, and the like. For example, a customer may be charged a fee based on the storage space used by the customer, size and number of the data objects, types and number of requests submitted, customer account type, service level agreement, the like.
In an embodiment,front end208 batch processes some or all incoming requests. For example,front end208 may wait until a certain number of requests has been received before processing (e.g., authentication, authorization, accounting and the like) the requests. Such a batch processing of incoming requests may be used to gain efficiency.
In some embodiments,front end208 may invoke services provided by other subsystems of the archival data storage system to further process an API request. For example,front end208 may invoke services in metadata plane216 to fulfill metadata requests. For another example,front end208 may stream data in and out of control plane for direct I/O210 for data storage and retrieval requests, respectively.
Referring now to control plane for direct I/O210 illustrated inFIG. 2, in various embodiments, control plane for direct I/O210 provides services that create, track and manage jobs created as a result of customer requests. As discussed above, a job refers to a customer-initiated activity that may be performed asynchronously to the initiating request, such as data retrieval, storage, metadata queries or the like. In an embodiment, control plane for direct I/O210 includes ajob tracker230 that is configured to create job records or entries corresponding to customer requests, such as those received fromAPI request handler218, and monitor the execution of the jobs. In various embodiments, a job record may include information related to the execution of a job, such as a customer account identifier, job identifier, data object identifier, reference to payload data cache228 (described below), job status, data validation information, and the like. In some embodiments,job tracker230 may collect information necessary to construct a job record from multiple requests. For example, when a large amount of data is requested to be stored, data upload may be broken into multiple requests, each uploading a portion of the data. In such a case,job tracker230 may maintain information to keep track of the upload status to ensure that all data parts have been received before a job record is created. In some embodiments,job tracker230 also obtains a data object identifier associated with the data to be stored and provides the data object identifier, for example, to a front end service to be returned to a customer. In an embodiment, such data object identifier may be obtained fromdata plane214 services, such asstorage node manager244,storage node registrar248, and the like, described below.
In some embodiments, control plane for direct I/O210 includes ajob tracker store232 for storing job entries or records. In various embodiments,job tracker store232 may be implemented by a NoSQL data management system, such as a key-value data store, a relational database management system (“RDBMS”), or any other data storage system. In some embodiments, data stored injob tracker store232 may be partitioned to enable fast enumeration of jobs that belong to a specific customer, facilitate efficient bulk record deletion, parallel processing by separate instances of a service and the like. For example,job tracker store232 may implement tables that are partitioned according to customer account identifiers and that use job identifiers as range keys. In an embodiment,job tracker store232 is further sub-partitioned based on time (such as job expiration time) to facilitate job expiration and cleanup operations. In an embodiment, transactions againstjob tracker store232 may be aggregated to reduce the total number of transactions. For example, in some embodiments, ajob tracker230 may perform aggregate multiple jobs corresponding to multiple requests into one single aggregated job before inserting it intojob tracker store232.
In an embodiment,job tracker230 is configured to submit the job for further job scheduling and planning, for example, by services in common control plane212. Additionally,job tracker230 may be configured to monitor the execution of jobs and update corresponding job records injob tracker store232 as jobs are completed. In some embodiments,job tracker230 may be further configured to handle customer queries, such as job status queries. In some embodiments,job tracker230 also provides notifications of job status changes to customers or other services of the archival data storage system. For example, when a data retrieval job is completed,job tracker230 may cause a customer to be notified (for example, using a notification service) that data is available for download. As another example, when a data storage job is completed,job tracker230 may notify acleanup agent234 to remove payload data associated with the data storage job from a transientpayload data cache228, described below.
In an embodiment, control plane for direct I/O210 includes apayload data cache228 for providing transient data storage services for payload data transiting betweendata plane214 andfront end208. Such data includes incoming data pending storage and outgoing data pending customer download. As used herein, transient data store is used interchangeably with temporary or staging data store to refer to a data store that is used to store data objects before they are stored in an archival data storage described herein or to store data objects that are retrieved from the archival data storage. A transient data store may provide volatile or non-volatile (durable) storage. In most embodiments, while potentially usable for persistently storing data, a transient data store is intended to store data for a shorter period of time than an archival data storage system and may be less cost-effective than the data archival storage system described herein. In one embodiment, transient data storage services provided for incoming and outgoing data may be differentiated. For example, data storage for the incoming data, which is not yet persisted in archival data storage, may provide higher reliability and durability than data storage for outgoing (retrieved) data, which is already persisted in archival data storage. In another embodiment, transient storage may be optional for incoming data, that is, incoming data may be stored directly in archival data storage without being stored in transient data storage, such aspayload data cache228, for example, when there is the system has sufficient bandwidth and/or capacity to do so.
In an embodiment, control plane for direct I/O210 also includes acleanup agent234 that monitorsjob tracker store232 and/orpayload data cache228 and removes data that is no longer needed. For example, payload data associated with a data storage request may be safely removed frompayload data cache228 after the data is persisted in permanent storage (e.g., data plane214). On the reverse path, data staged for customer download may be removed frompayload data cache228 after a configurable period of time (e.g., 30 days since the data is staged) or after a customer indicates that the staged data is no longer needed.
In some embodiments,cleanup agent234 removes a job record fromjob tracker store232 when the job status indicates that the job is complete or aborted. As discussed above, in some embodiments,job tracker store232 may be partitioned to enable to enable faster cleanup. In one embodiment where data is partitioned by customer account identifiers,cleanup agent234 may remove an entire table that stores jobs for a particular customer account when the jobs are completed instead of deleting individual jobs one at a time. In another embodiment where data is further sub-partitioned based on job expirationtime cleanup agent234 may bulk-delete a whole partition or table of jobs after all the jobs in the partition expire. In other embodiments,cleanup agent234 may receive instructions or control messages (such as indication that jobs are completed) from other services such asjob tracker230 that cause thecleanup agent234 to remove job records fromjob tracker store232 and/orpayload data cache228.
Referring now to common control plane212 illustrated inFIG. 2. In various embodiments, common control plane212 provides a queue-based load leveling service to dampen peak to average load levels (jobs) coming from control plane for I/O210 and to deliver manageable workload todata plane214. In an embodiment, common control plane212 includes ajob request queue236 for receiving jobs created byjob tracker230 in control plane for direct I/O210, described above, a storage nodemanager job store240 from which services from data plane214 (e.g., storage node managers244) pick up work to execute and arequest balancer238 for transferring job items fromjob request queue236 to storage nodemanager job store240 in an intelligent manner.
In an embodiment,job request queue236 provides a service for inserting items into and removing items from a queue (e.g., first-in-first-out (FIFO) or first-in-last-out (FILO)), a set or any other suitable data structure. Job entries in thejob request queue236 may be similar to or different from job records stored injob tracker store232, described above.
In an embodiment, common control plane212 also provides a durable high efficiency job store, storage nodemanager job store240, that allows services from data plane214 (e.g.,storage node manager244, anti-entropy watcher252) to perform job planning optimization, check pointing and recovery. For example, in an embodiment, storage nodemanager job store240 allows the job optimization such as batch processing, operation coalescing, and the like, by supporting scanning, querying, sorting, or otherwise manipulating and managing job items stored in storage nodemanager job store240. In an embodiment, astorage node manager244 scans incoming jobs and sort the jobs by the type of data operation (e.g., read, write, or delete), storage locations (e.g., volume, disk), customer account identifier, and the like. Thestorage node manager244 may then reorder, coalesce, group in batches or otherwise manipulate and schedule the jobs for processing. For example, in one embodiment, thestorage node manager244 may batch process all the write operations before all the read and delete operations. In another embodiment, thestorage node manager244 may perform operation coalescing. For another example, thestorage node manager244 may coalesce multiple retrieval jobs for the same object into one job or cancel a storage job and a deletion job for the same data object where the deletion job comes after the storage job.
In an embodiment, storage nodemanager job store240 is partitioned, for example, based on job identifiers, so as to allow independent processing of multiplestorage node managers244 and to provide even distribution of the incoming workload to all participatingstorage node managers244. In various embodiments, storage nodemanager job store240 may be implemented by a NoSQL data management system, such as a key-value data store, a RDBMS, or any other data storage system.
In an embodiment,request balancer238 provides a service for transferring job items fromjob request queue236 to storage nodemanager job store240 so as to smooth out variation in workload and to increase system availability. For example,request balancer238 may transfer job items fromjob request queue236 at a lower rate or at a smaller granularity when there is a surge in job requests coming into thejob request queue236 and vice versa when there is a lull in incoming job requests so as to maintain a relatively sustainable level of workload in the storagenode manager store240. In some embodiments, such sustainable level of workload is around the same or below the average workload of the system.
In an embodiment, job items that are completed are removed from storage nodemanager job store240 and added to thejob result queue242. In an embodiment,data plane214 services (e.g., storage node manager244) are responsible for removing the job items from the storage nodemanager job store240 and adding them to job resultqueue242. In some embodiments,job request queue242 is implemented in a similar manner as job request queue235, discussed above.
Referring now todata plane214 illustrated inFIG. 2. In various embodiments,data plane214 provides services related to long-term archival data storage, retrieval and deletion, data management and placement, anti-entropy operations, and the like. In various embodiments,data plane214 may include any number and type of storage entities, such as data storage devices (such as tape drives, hard disk drives, solid state devices, and the like), storage nodes or servers, datacenters, and the like. Such storage entities may be physical, virtual, or any abstraction thereof (e.g., instances of distributed storage and/or computing systems) and may be organized into any topology, including hierarchical, or tiered topologies. Similarly, the components of the data plane may be dispersed, local or any combination thereof. For example, various computing or storage components may be local or remote to any number of datacenters, servers or data storage devices, which in turn may be local or remote relative to one another. In various embodiments, physical storage entities may be designed for minimizing power and cooling costs by controlling the portions of physical hardware that are active (e.g., the number of hard drives that are actively rotating). In an embodiment, physical storage entities implement techniques, such as Shingled Magnetic Recording (SMR), to increase storage capacity.
In an environment illustrated byFIG. 2, one or morestorage node managers244 each controls one ormore storage nodes246 by sending and receiving data and control messages. Eachstorage node246 in turn controls a (potentially large) collection of data storage devices, such as hard disk drives. In various embodiments, astorage node manager244 may communicate with one ormore storage nodes246 and astorage node246 may communicate with one or morestorage node managers244. In an embodiment,storage node managers244 are implemented by one or more computing devices that are capable of performing relatively complex computations, such as digest computation, data encoding and decoding, job planning and optimization, and the like. In some embodiments,storage nodes244 are implemented by one or more computing devices with less powerful computation capabilities thanstorage node managers244. Further, in some embodiments thestorage node manager244 may not be included in the data path. For example, data may be transmitted from thepayload data cache228 directly to thestorage nodes246 or from one ormore storage nodes246 to thepayload data cache228. In this way, thestorage node manager244 may transmit instructions to thepayload data cache228 and/or thestorage nodes246 without receiving the payloads directly from thepayload data cache228 and/orstorage nodes246. In various embodiments, astorage node manager244 may send instructions or control messages to any other components of the archivaldata storage system206 described herein to direct the flow of data.
In an embodiment, astorage node manager244 serves as an entry point for jobs coming into and out ofdata plane214 by picking job items from common control plane212 (e.g., storage node manager job store240), retrieving staged data frompayload data cache228 and performing necessary data encoding for data storage jobs and requestingappropriate storage nodes246 to store, retrieve or delete data. Once thestorage nodes246 finish performing the requested data operations, thestorage node manager244 may perform additional processing, such as data decoding and storing retrieved data inpayload data cache228 for data retrieval jobs, and update job records in common control plane212 (e.g., removing finished jobs from storage nodemanager job store240 and adding them to job result queue242).
In an embodiment,storage node manager244 performs data encoding according to one or more data encoding schemes before data storage to provide data redundancy, security and the like. Such data encoding schemes may include encryption schemes, redundancy encoding schemes, such as erasure encoding, redundant array of independent disks (RAID) encoding schemes, replication, and the like. Likewise, in an embodiment,storage node managers244 performs corresponding data decoding schemes, such as decryption, erasure-decoding, and the like, after data retrieval to restore the original data.
As discussed above in connection with storage nodemanager job store240,storage node managers244 may implement job planning and optimizations, such as batch processing, operation coalescing, and the like, to increase efficiency. In some embodiments, jobs are partitioned among storage node managers so that there is little or no overlap between the partitions. Such embodiments facilitate parallel processing by multiple storage node managers, for example, by reducing the probability of racing or locking.
In various embodiments,data plane214 is implemented to facilitate data integrity. For example, storage entities handling bulk data flows, such asstorage nodes managers244 and/orstorage nodes246, may validate the digest of data stored or retrieved, check the error-detection code to ensure integrity of metadata and the like.
In various embodiments,data plane214 is implemented to facilitate scalability and reliability of the archival data storage system. For example, in one embodiment,storage node managers244 maintain no or little internal state so that they can be added, removed or replaced with little adverse impact. In one embodiment, each storage device is a self-contained and self-describing storage unit capable of providing information about data stored thereon. Such information may be used to facilitate data recovery in case of data loss. Furthermore, in one embodiment, eachstorage node246 is capable of collecting and reporting information about the storage node including the network location of the storage node and storage information of connected storage devices to one or morestorage node registrars248 and/or storage node registrar stores250. In some embodiments,storage nodes246 perform such self-reporting at system start up time and periodically provide updated information. In various embodiments, such a self-reporting approach provides dynamic and up-to-date directory information without the need to maintain a global namespace key map or index which can grow substantially as large amounts of data objects are stored in the archival data system.
In an embodiment,data plane214 may also include one or morestorage node registrars248 that provide directory information for storage entities and data stored thereon, data placement services and the like.Storage node registrars248 may communicate with and act as a front end service to one or more storage node registrar stores250, which provide storage for thestorage node registrars248. In various embodiments, storage node registrar store250 may be implemented by a NoSQL data management system, such as a key-value data store, a RDBMS or any other data storage system. In some embodiments, storage node registrar stores250 may be partitioned to enable parallel processing by multiple instances of services. As discussed above, in an embodiment, information stored at storage node registrar store250 is based at least partially on information reported bystorage nodes246 themselves.
In some embodiments,storage node registrars248 provide directory service, for example, tostorage node managers244 that want to determine whichstorage nodes246 to contact for data storage, retrieval and deletion operations. For example, given a volume identifier provided by astorage node manager244,storage node registrars248 may provide, based on a mapping maintained in a storage node registrar store250, a list of storage nodes that host volume components corresponding to the volume identifier. Specifically, in one embodiment, storage node registrar store250 stores a mapping between a list of identifiers of volumes or volume components and endpoints, such as Domain Name System (DNS) names, of storage nodes that host the volumes or volume components.
As used herein, a “volume” refers to a logical storage space within a data storage system in which data objects may be stored. A volume may be identified by a volume identifier. A volume may reside in one physical storage device (e.g., a hard disk) or span across multiple storage devices. In the latter case, a volume comprises a plurality of volume components each residing on a different storage device. As used herein, a “volume component” refers a portion of a volume that is physically stored in a storage entity, such as a storage device. Volume components for the same volume may be stored on different storage entities. In one embodiment, when data is encoded by a redundancy encoding scheme (e.g., erasure coding scheme, RAID, replication), each encoded data component or “shard” may be stored in a different volume component to provide fault tolerance and isolation. In some embodiments, a volume component is identified by a volume component identifier that includes a volume identifier and a shard slot identifier. As used herein, a shard slot identifies a particular shard, row or stripe of data in a redundancy encoding scheme. For example, in one embodiment, a shard slot corresponds to an erasure coding matrix row. In some embodiments, storage node registrar store250 also stores information about volumes or volume components such as total, used, and free space, number of data objects stored, and the like.
In some embodiments,data plane214 also includes astorage allocator256 for allocating storage space (e.g., volumes) on storage nodes to store new data objects, based at least in part on information maintained by storage node registrar store250, to satisfy data isolation and fault tolerance constraints. In some embodiments,storage allocator256 requires manual intervention.
In some embodiments,data plane214 also includes ananti-entropy watcher252 for detecting entropic effects and initiating anti-entropy correction routines. For example,anti-entropy watcher252 may be responsible for monitoring activities and status of all storage entities such as storage nodes, reconciling live or actual data with maintained data, and the like. In various embodiments, entropic effects include, but are not limited to, performance degradation due to data fragmentation resulting from repeated write and rewrite cycles, hardware wear (e.g., of magnetic media), data unavailability and/or data loss due to hardware/software malfunction, environmental factors, physical destruction of hardware, random chance, or other causes.Anti-entropy watcher252 may detect such effects and in some embodiments may preemptively and/or reactively institute anti-entropy correction routines and/or policies.
In an embodiment,anti-entropy watcher252 causesstorage nodes246 to perform periodic anti-entropy scans on storage devices connected to the storage nodes.Anti-entropy watcher252 may also inject requests in job request queue236 (and subsequently job result queue242) to collect information, recover data and the like. In some embodiments,anti-entropy watcher252 may perform scans, for example, oncold index store262, described below, andstorage nodes246, to ensure referential integrity.
In an embodiment, information stored at storage node registrar store250 is used by a variety of services, such asstorage node registrar248,storage allocator256,anti-entropy watcher252, and the like. For example,storage node registrar248 may provide data location and placement services (e.g., to storage node managers244) during data storage, retrieval, and deletion. For example, given the size of a data object to be stored and information maintained by storage node registrar store250, astorage node registrar248 may determine where (e.g., volume) to store the data object and provides an indication of the storage location of the data object which may be used to generate a data object identifier associated with the data object. As another example, in an embodiment,storage allocator256 uses information stored in storage node registrar store250 to create and place volume components for new volumes in specific storage nodes to satisfy isolation and fault tolerance constraints. As yet another example, in an embodiment,anti-entropy watcher252 uses information stored in storage node registrar store250 to detect entropic effects, such as data loss, hardware failure, and the like.
In some embodiments,data plane214 also includes an orphancleanup data store254, which is used to track orphans in the storage system. As used herein, an orphan is a stored data object that is not referenced by any external entity. In various embodiments, orphancleanup data store254 may be implemented by a NoSQL data management system, such as a key-value data store, an RDBMS, or any other data storage system. In some embodiments,storage node registrars248 stores object placement information in orphancleanup data store254. Subsequently, information stored in orphancleanup data store254 may be compared, for example, by ananti-entropy watcher252, with information maintained in metadata plane216. If an orphan is detected, in some embodiments, a request is inserted in the common control plane212 to delete the orphan.
Referring now to metadata plane216 illustrated inFIG. 2. In various embodiments, metadata plane216 provides information about data objects stored in the system for inventory and accounting purposes, to satisfy customer metadata inquiries and the like. In the illustrated embodiment, metadata plane216 includes a metadatamanager job store258 which stores information about executed transactions based on entries fromjob result queue242 in common control plane212. In various embodiments, metadatamanager job store258 may be implemented by a NoSQL data management system, such as a key-value data store, a RDBMS, or any other data storage system. In some embodiments, metadatamanager job store258 is partitioned and sub-partitioned, for example, based on logical data containers, to facilitate parallel processing by multiple instances of services, such as metadata manager260.
In the illustrative embodiment, metadata plane216 also includes one or more metadata managers260 for generating a cold index of data objects (e.g., stored in cold index store262) based on records in metadatamanager job store258. As used herein, a “cold” index refers to an index that is updated infrequently. In various embodiments, a cold index is maintained to reduce cost overhead. In some embodiments, multiple metadata managers260 may periodically read and process records from different partitions in metadatamanager job store258 in parallel and store the result in acold index store262.
In some embodimentscold index store262 may be implemented by a reliable and durable data storage service. In some embodiments,cold index store262 is configured to handle metadata requests initiated by customers. For example, a customer may issue a request to list all data objects contained in a given logical data container. In response to such a request,cold index store262 may provide a list of identifiers of all data objects contained in the logical data container based on information maintained bycold index262. In some embodiments, an operation may take a relative long period of time and the customer may be provided a job identifier to retrieve the result when the job is done. In other embodiments,cold index store262 is configured to handle inquiries from other services, for example, fromfront end208 for inventory, accounting and billing purposes.
In some embodiments, metadata plane216 may also include a container metadata store264 that stores information about logical data containers, such as container ownership, policies, usage, and the like. Such information may be used, for example, byfront end208 services, to perform authorization, metering, accounting, and the like. In various embodiments, container metadata store264 may be implemented by a NoSQL data management system, such as a key-value data store, a RDBMS or any other data storage system.
As described herein, in various embodiments, the archivaldata storage system206 described herein is implemented to be efficient and scalable. For example, in an embodiment, batch processing and request coalescing is used at various stages (e.g., front end request handling, control plane job request handling, data plane data request handling) to improve efficiency. For another example, in an embodiment, processing of metadata, such as jobs, requests, and the like, are partitioned so as to facilitate parallel processing of the partitions by multiple instances of services.
In an embodiment, data elements stored in the archival data storage system (such as data components, volumes, described below) are self-describing so as to avoid the need for a global index data structure. For example, in an embodiment, data objects stored in the system may be addressable by data object identifiers that encode storage location information. For another example, in an embodiment, volumes may store information about which data objects are stored in the volume and storage nodes and devices storing such volumes may collectively report their inventory and hardware information to provide a global view of the data stored in the system (such as evidenced by information stored in storage node registrar store250). In such an embodiment, the global view is provided for efficiency only and not required to locate data stored in the system.
In various embodiments, the archival data storage system described herein is implemented to improve data reliability and durability. For example, in an embodiment, a data object is redundantly encoded into a plurality of data components and stored across different data storage entities to provide fault tolerance. For another example, in an embodiment, data elements have multiple levels of integrity checks. In an embodiment, parent/child relations always have additional information to ensure full referential integrity. For example, in an embodiment, bulk data transmission and storage paths are protected by having the initiator pre-calculate the digest on the data before transmission and subsequently supply the digest with the data to a receiver. The receiver of the data transmission is responsible for recalculation, comparing and then acknowledging to the sender that includes the recalculated the digest. Such data integrity checks may be implemented, for example, by front end services, transient data storage services, data plane storage entities and the like described above.
FIG. 3 illustrates aninterconnection network300 in which components of an archival data storage system may be connected, in accordance with at least one embodiment. In particular, the illustrated example shows how data plane components are connected to theinterconnection network300. In some embodiments, theinterconnection network300 may include a fat tree interconnection network where the link bandwidth grows higher or “fatter” towards the root of the tree. In the illustrated example, data plane includes one ormore datacenters301. Eachdatacenter301 may include one or more storage node manager server racks302 where each server rack hosts one or more servers that collectively provide the functionality of a storage node manager, such as described in connection withFIG. 2. In other embodiments, each storage node manager server rack may host more than one storage node manager. Configuration parameters, such as number of storage node managers per rack, number of storage node manager racks, and the like, may be determined based on factors such as cost, scalability, redundancy, and performance requirements, hardware, and software resources, and the like.
Each storage nodemanager server rack302 may have a storage nodemanager rack connection314 to aninterconnect308 used to connect to theinterconnection network300. In some embodiments, the storage nodemanager rack connection314 is implemented using anetwork switch303 that may include a top-of-rack Ethernet switch or any other type of network switch. In various embodiments,interconnect308 is used to enable high-bandwidth and low-latency bulk data transfers. For example, interconnect may include a Clos network, a fat tree interconnect, an Asynchronous Transfer Mode (ATM) network, a Fast or Gigabit Ethernet and the like.
In various embodiments, the bandwidth of storage nodemanager rack connection314 may be configured to enable high-bandwidth and low-latency communications between storage node managers and storage nodes located within the same or different data centers. For example, in an embodiment, the storage nodemanager rack connection314 has a bandwidth of 10 Gigabit per second (Gbps).
In some embodiments, eachdatacenter301 may also include one or more storagenode server racks304 where each server rack hosts one or more servers that collectively provide the functionalities of a number of storage nodes, such as described in connection withFIG. 2. Configuration parameters, such as number of storage nodes per rack, number of storage node racks, ration between storage node managers and storage nodes, and the like, may be determined based on factors such as cost, scalability, redundancy, and performance requirements, hardware and software resources, and the like. For example, in one embodiment, there are 3 storage nodes per storage node server rack, 30-80 racks per data center and a storage nodes/storage node manager ratio of 10 to 1.
Each storagenode server rack304 may have a storagenode rack connection316 to aninterconnection network switch308 used to connect to theinterconnection network300. In some embodiments, the storagenode rack connection316 is implemented using anetwork switch305 that may include a top-of-rack Ethernet switch or any other type of network switch. In various embodiments, the bandwidth of storagenode rack connection316 may be configured to enable high-bandwidth and low-latency communications between storage node managers and storage nodes located within the same or different data centers. In some embodiments, a storagenode rack connection316 has a higher bandwidth than a storage nodemanager rack connection314. For example, in an embodiment, the storagenode rack connection316 has a bandwidth of 20 Gbps while a storage nodemanager rack connection314 has a bandwidth of 10 Gbps.
In some embodiments, datacenters301 (including storage node managers and storage nodes) communicate, viaconnection310, with othercomputing resources services306, such aspayload data cache228, storage nodemanager job store240,storage node registrar248, storage node registrar store250, orphancleanup data store254, metadatamanager job store258, and the like, as described in connection withFIG. 2.
In some embodiments, one ormore datacenters301 may be connected viainter-datacenter connection312. In some embodiments,connections310 and312 may be configured to achieve effective operations and use of hardware resources. For example, in an embodiment,connection310 has a bandwidth of 30-100 Gbps per datacenter andinter-datacenter connection312 has a bandwidth of 100-250 Gbps.
FIG. 4 illustrates aninterconnection network400 in which components of an archival data storage system may be connected, in accordance with at least one embodiment. In particular, the illustrated example shows how non-data plane components are connected to theinterconnection network300. As illustrated, front end services, such as described in connection withFIG. 2, may be hosted by one or more front end server racks402. For example, each frontend server rack402 may host one or more web servers. The front end server racks402 may be connected to theinterconnection network400 via anetwork switch408. In one embodiment, configuration parameters, such as number of front end services, number of services per rack, bandwidth for storage node manager rack connection14, and the like, may roughly correspond to those for storage node managers, as described in connection withFIG. 3.
In some embodiments, control plane services, and metadata plane services, as described in connection withFIG. 2, may be hosted by one or more server racks404. Such services may includejob tracker230, metadata manager260,cleanup agent232,job request balancer238, and other services. In some embodiments, such services include services that do not handle frequent bulk data transfers. Finally, components described herein may communicate viaconnection410, with othercomputing resource services406, such aspayload data cache228,job tracker store232, metadatamanager job store258, and the like, as described in connection withFIG. 2.
FIG. 5 illustrates anexample process500 for storing data, in accordance with at least one embodiment. Some or all of process500 (or any other processes described herein or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs or one or more applications) executing collectively on one or more processors, by hardware or combinations thereof. The code may be stored on a computer-readable storage medium, for example, in the form of a computer program comprising a plurality of instructions executable by one or more processors. The computer-readable storage medium may be non-transitory. In an embodiment, one or more components of archivaldata storage system206, as described in connection withFIG. 2, may performprocess500.
In an embodiment,process500 includes receiving502 a data storage request to store archival data, such as a document, a video or audio file, or the like. Such a data storage request may include payload data and metadata, such as size and digest of the payload data, user identification information (e.g., user name, account identifier, and the like), a logical data container identifier, and the like. In some embodiments,process500 may include receiving502 multiple storage requests each including a portion of larger payload data. In other embodiments, a storage request may include multiple data objects to be uploaded. In an embodiment, step502 ofprocess500 is implemented by a service, such asAPI request handler218 offront end208, as described in connection withFIG. 2.
In an embodiment,process500 includes processing504 the storage request upon receiving502 the request. Such processing may include, for example, verifying the integrity of data received, authenticating the customer, authorizing requested access against access control policies, performing meter- and accounting-related activities and the like. In an embodiment, such processing may be performed by services offront end208, such as described in connection withFIG. 2. In an embodiment, such a request may be processed in connection with other requests, for example, in batch mode.
In an embodiment,process500 includes storing506 the data associated with the storage request in a staging data store. Such staging data store may include a transient data store, such as provided bypayload data cache228, as described in connection withFIG. 2. In some embodiments, only payload data is stored in the staging store. In other embodiments, metadata related to the payload data may also be stored in the staging store. In an embodiment, data integrity is validated (e.g., based on a digest) before being stored at a staging data store.
In an embodiment,process500 includes providing508 a data object identifier associated with the data to be stored, for example, in a response to the storage request. As described above, a data object identifier may be used by subsequent requests to retrieve, delete or otherwise reference data stored. In an embodiment, a data object identifier may encode storage location information that may be used to locate the stored data object, payload validation information, such as size, digest, timestamp, and the like, that may be used to validate the integrity of the payload data, metadata validation information, such as error-detection codes that may be used to validate the integrity of metadata, such as the data object identifier itself and information encoded in the data object identifier, and the like. In an embodiment, a data object identifier may also encode information used to validate or authorize subsequent customer requests. For example, a data object identifier may encode the identifier of the logical data container that the data object is stored in. In a subsequent request to retrieve this data object, the logical data container identifier may be used to determine whether the requesting entity has access to the logical data container and hence the data objects contained therein. In some embodiments, the data object identifier may encode information based on information supplied by a customer (e.g., a global unique identifier, GUID, for the data object and the like) and/or information collected or calculated by the system performing process500 (e.g., storage location information). In some embodiments, generating a data object identifier may include encrypting some or all of the information described above using a cryptographic private key. In some embodiments, the cryptographic private key may be periodically rotated. In some embodiments, a data object identifier may be generated and/or provided at a different time than described above. For example, a data object identifier may be generated and/or provided after a storage job (described below) is created and/or completed.
In an embodiment, providing508 a data object identifier may include determining a storage location for the before the data is actually stored there. For example, such determination may be based at least in part on inventory information about existing data storage entities, such as operational status (e.g., active or inactive), available storage space, data isolation requirement, and the like. In an environment, such asenvironment200 illustrated byFIG. 2, such determination may be implemented by a service, such asstorage node registrar248, as described above in connection withFIG. 2. In some embodiments, such determination may include allocating new storage space (e.g., volume) on one or more physical storage devices by a service, such asstorage allocator256, as described in connection withFIG. 2.
In an embodiment, a storage location identifier may be generated to represent the storage location determined above. Such a storage location identifier may include, for example, a volume reference object which comprises a volume identifier component and data object identifier component. The volume reference component may identify the volume the data is stored on and the data object identifier component may identify where in the volume the data is stored. In general, the storage location identifier may comprise components that identify various levels within a logical or physical data storage topology (such as a hierarchy) in which data is organized. In some embodiments, the storage location identifier may point to where actual payload data is stored or a chain of reference to where the data is stored.
In an embodiments, a data object identifier encodes a digest (e.g., a hash) of at least a portion of the data to be stored, such as the payload data. In some embodiments, the digest may be based at least in part on a customer-provided digest. In other embodiments, the digest may be calculated from scratch based on the payload data.
In an embodiment,process500 includes creating510 a storage job for persisting data to a long-term data store andscheduling512 the storage job for execution. Inenvironment200, as described in connection withFIG. 2,steps508,510, and512 may be implemented at least in part by components of control plane for direct I/O210 and common control plane212, as described above. Specifically, in an embodiment,job tracker230 creates a job record and stores the job record injob tracker store232. As described above,job tracker230 may perform batch processing to reduce the total number of transactions againstjob tracker store232. Additionally,job tracker store232 may be partitioned or otherwise optimized to facilitate parallel processing, cleanup operations and the like. A job record, as described above, may include job-related information, such as a customer account identifier, job identifier, storage location identifier, reference to data stored inpayload data cache228, job status, job creation, and/or expiration time, and the like. In some embodiments, a storage job may be created before a data object identifier is generated and/or provided. For example, a storage job identifier, instead of or in addition to a data object identifier, may be provided in response to a storage request atstep508 above.
In an embodiment, scheduling512 the storage job for execution includes performing job planning and optimization, such as queue-based load leveling or balancing, job partitioning and the like, as described in connection with common control plane212 ofFIG. 2. For example, in an embodiment,job request balancer238 transfers job items fromjob request queue236 to storage nodemanager job store240 according to a scheduling algorithm so as to dampen peak to average load levels (jobs) coming from control plane for I/O210 and to deliver manageable workload todata plane214. As another example, storage nodemanager job store240 may be partitioned to facilitate parallel processing of the jobs by multiple workers, such asstorage node managers244. As yet another example, storage nodemanager job store240 may provide querying, sorting and other functionalities to facilitate batch processing and other job optimizations.
In an embodiment,process500 includes selecting514 the storage job for execution, for example, by astorage node manager244 from storage node manager job stored240, as described in connection withFIG. 2. The storage job may be selected514 with other jobs for batch processing or otherwise selected as a result of job planning and optimization described above.
In an embodiment,process500 includes obtaining516 data from a staging store, such aspayload data cache228 described above in connection withFIG. 2. In some embodiments, the integrity of the data may be checked, for example, by verifying the size, digest, an error-detection code and the like.
In an embodiment,process500 includes obtaining518 one or more data encoding schemes, such as an encryption scheme, a redundancy encoding scheme such as erasure encoding, redundant array of independent disks (RAID) encoding schemes, replication, and the like. In some embodiments, such encoding schemes evolve to adapt to different requirements. For example, encryption keys may be rotated periodically and stretch factor of an erasure coding scheme may be adjusted over time to different hardware configurations, redundancy requirements and the like.
In an embodiment,process500 includes encoding520 with the obtained encoding schemes. For example, in an embodiment, data is encrypted and the encrypted data is erasure-encoded. In an embodiment,storage node managers244 described in connection withFIG. 2 may be configured to perform the data encoding described herein. In an embodiment, application of such encoding schemes generates a plurality of encoded data components or shards, which may be stored across different storage entities, such as storage devices, storage nodes, datacenters, and the like, to provide fault tolerance. In an embodiment where data may comprise multiple parts (such as in the case of a multi-part upload), each part may be encoded and stored, as described herein.
In an embodiment,process500 includes determining522 the storage entities for such encoded data components. For example, in anenvironment200 illustrated byFIG. 2, astorage node manager244 may determine the plurality ofstorage nodes246 to store the encoded data components by querying astorage node registrar248 using a volume identifier. Such a volume identifier may be part of a storage location identifier associated with the data to be stored. In response to the query with a given volume identifier, in an embodiment,storage node registrar248 returns a list of network locations (including endpoints, DNS names, IP addresses and the like) ofstorage nodes246 to store the encoded data components. As described in connection withFIG. 2,storage node registrar248 may determine such a list based on self-reported and dynamically provided and/or updated inventory information fromstorage nodes246 themselves. In some embodiments, such determination is based on data isolation, fault tolerance, load balancing, power conservation, data locality, and other considerations. In some embodiments,storage registrar248 may cause new storage space to be allocated, for example, by invokingstorage allocator256, as described in connection withFIG. 2.
In an embodiment,process500 includes causing524 storage of the encoded data component(s) at the determined storage entities. For example, in anenvironment200 illustrated byFIG. 2, astorage node manager244 may request each of thestorage nodes246 determined above to store a data component at a given storage location. Each of thestorage nodes246, upon receiving the storage request fromstorage node manager244 to store a data component, may cause the data component to be stored in a connected storage device. In some embodiments, at least a portion of the data object identifier is stored with all or some of the data components in either encoded or unencoded form. For example, the data object identifier may be stored in the header of each data component and/or in a volume component index stored in a volume component. In some embodiments, astorage node246 may perform batch processing or other optimizations to process requests fromstorage node managers244.
In an embodiment, astorage node246 sends an acknowledgement to the requestingstorage node manager244 indicating whether data is stored successfully. In some embodiments, astorage node246 returns an error message, when for some reason, the request cannot be fulfilled. For example, if a storage node receives two requests to store to the same storage location, one or both requests may fail. In an embodiment, astorage node246 performs validation checks prior to storing the data and returns an error if the validation checks fail. For example, data integrity may be verified by checking an error-detection code or a digest. As another example,storage node246 may verify, for example, based on a volume index, that the volume identified by a storage request is stored by the storage node and/or that the volume has sufficient space to store the data component.
In some embodiments, data storage is considered successful whenstorage node manager244 receives positive acknowledgement from at least a subset (a storage quorum) of requestedstorage nodes246. In some embodiments, astorage node manager244 may wait until the receipt of a quorum of acknowledgement before removing the state necessary to retry the job. Such state information may include encoded data components for which an acknowledgement has not been received. In other embodiments, to improve the throughput, astorage node manager244 may remove the state necessary to retry the job before receiving a quorum of acknowledgement.
In an embodiment,process500 includes updating526 metadata information including, for example, metadata maintained by data plane214 (such as index and storage space information for a storage device, mapping information stored at storage node registrar store250 and the like), metadata maintained bycontrol planes210 and212 (such as job-related information), metadata maintained by metadata plane216 (such as a cold index), and the like. In various embodiments, some of such metadata information may be updated via batch processing and/or on a periodic basis to reduce performance and cost impact. For example, indata plane214, information maintained by storage node registrar store250 may be updated to provide additional mapping of the volume identifier of the newly stored data and thestorage nodes246 on which the data components are stored, if such a mapping is not already there. For another example, volume index on storage devices may be updated to reflect newly added data components.
In common control plane212, job entries for completed jobs may be removed from storage nodemanager job store240 and added to job resultqueue242, as described in connection withFIG. 2. In control plane for direct I/O210, statuses of job records injob tracker store232 may be updated, for example, byjob tracker230 which monitors thejob result queue242. In various embodiments, a job that fails to complete may be retried for a number of times. For example, in an embodiment, a new job may be created to store the data at a different location. As another example, an existing job record (e.g., in storage nodemanager job store240,job tracker store232 and the like) may be updated to facilitate retry of the same job.
In metadata plane216, metadata may be updated to reflect the newly stored data. For example, completed jobs may be pulled fromjob result queue242 into metadatamanager job store258 and batch-processed by metadata manager260 to generate an updated index, such as stored incold index store262. For another example, customer information may be updated to reflect changes for metering and accounting purposes.
Finally, in some embodiments, once a storage job is completed successfully, job records, payload data and other data associated with a storage job may be removed, for example, by acleanup agent234, as described in connection withFIG. 2. In some embodiments, such removal may be processed by batch processing, parallel processing or the like.
FIG. 6 illustrates anexample process500 for retrieving data, in accordance with at least one embodiment. In an embodiment, one or more components of archivaldata storage system206, as described in connection withFIG. 2, collectively performprocess600.
In an embodiment,process600 includes receiving602 a data retrieval request to retrieve data, such as stored byprocess500, described above. Such a data retrieval request may include a data object identifier, such as provided bystep508 ofprocess500, described above, or any other information that may be used to identify the data to be retrieved.
In an embodiment,process600 includes processing604 the data retrieval request upon receiving602 the request. Such processing may include, for example, authenticating the customer, authorizing requested access against access control policies, performing meter and accounting related activities and the like. In an embodiment, such processing may be performed by services offront end208, such as described in connection withFIG. 2. In an embodiment, such request may be processed in connection with other requests, for example, in batch mode.
In an embodiment, processing604 the retrieval request may be based at least in part on the data object identifier that is included in the retrieval request. As described above, data object identifier may encode storage location information, payload validation information, such as size, creation timestamp, payload digest, and the like, metadata validation information, policy information and the like. In an embodiment, processing604 the retrieval request includes decoding the information encoded in the data object identifier, for example, using a private cryptographic key and using at least some of the decoded information to validate the retrieval request. For example, policy information may include access control information that may be used to validate that the requesting entity of the retrieval request has the required permission to perform the requested access. As another example, metadata validation information may include an error-detection code, such as a cyclic redundancy check (“CRC”) that may be used to verify the integrity of data object identifier or a component of it.
In an embodiment,process600 includes creating606 a data retrieval job corresponding to the data retrieval request and providing608 a job identifier associated with the data retrieval job, for example, in a response to the data retrieval request. In some embodiments, creating606 a data retrieval job is similar to creating a data storage job, as described in connection withstep510 ofprocess500 illustrated inFIG. 5. For example, in an embodiment, ajob tracker230 may create a job record that includes at least some information encoded in the data object identifier and/or additional information, such as a job expiration time and the like, and store the job record injob tracker store232. As described above,job tracker230 may perform batch processing to reduce the total number of transactions againstjob tracker store232. Additionally,job tracker store232 may be partitioned or otherwise optimized to facilitate parallel processing, cleanup operations and the like.
In an embodiment,process600 includesscheduling610 the data retrieval job created above. In some embodiments, scheduling610 the data retrieval job for execution includes performing job planning and optimization, such as described in connection withstep512 ofprocess500 ofFIG. 5. For example, the data retrieval job may be submitted into a job queue and scheduled for batch processing with other jobs based at least in part on costs, power management schedules and the like. For another example, the data retrieval job may be coalesced with other retrieval jobs based on data locality and the like.
In an embodiment,process600 includes selecting612 the data retrieval job for execution, for example, by astorage node manager244 from storage node manager job stored240, as described in connection withFIG. 2. The retrieval job may be selected612 with other jobs for batch processing or otherwise selected as a result of job planning and optimization described above.
In an embodiment,process600 includes determining614 the storage entities that store the encoded data components that are generated by a storage process, such asprocess500 described above. In an embodiment, astorage node manager244 may determine a plurality ofstorage nodes246 to retrieve the encoded data components in a manner similar to that discussed in connection withstep522 ofprocess500, above. For example, such determination may be based on load balancing, power conservation, efficiency and other considerations.
In an embodiment,process600 includes determining616 one or more data decoding schemes that may be used to decode retrieved data. Typically, such decoding schemes correspond to the encoding schemes applied to the original data when the original data is previously stored. For example, such decoding schemes may include decryption with a cryptographic key, erasure-decoding and the like.
In an embodiment,process600 includes causing618 retrieval of at least some of the encoded data components from the storage entities determined instep614 ofprocess600. For example, in anenvironment200 illustrated byFIG. 2, astorage node manager244 responsible for the data retrieval job may request a subset ofstorage nodes246 determined above to retrieve their corresponding data components. In some embodiments, a minimum number of encoded data components is needed to reconstruct the original data where the number may be determined based at least in part on the data redundancy scheme used to encode the data (e.g., stretch factor of an erasure coding). In such embodiments, the subset of storage nodes may be selected such that no less than the minimum number of encoded data components is retrieved.
Each of the subset ofstorage nodes246, upon receiving a request fromstorage node manager244 to retrieve a data component, may validate the request, for example, by checking the integrity of a storage location identifier (that is part of the data object identifier), verifying that the storage node indeed holds the requested data component and the like. Upon a successful validation, the storage node may locate the data component based at least in part on the storage location identifier. For example, as described above, the storage location identifier may include a volume reference object which comprises a volume identifier component and a data object identifier component where the volume reference component to identify the volume the data is stored and a data object identifier component may identify where in the volume the data is stored. In an embodiment, the storage node reads the data component, for example, from a connected data storage device and sends the retrieved data component to the storage node manager that requested the retrieval. In some embodiments, the data integrity is checked, for example, by verifying the data component identifier or a portion thereof is identical to that indicated by the data component identifier associated with the retrieval job. In some embodiments, a storage node may perform batching or other job optimization in connection with retrieval of a data component.
In an embodiment,process600 includesdecoding620, at least the minimum number of the retrieved encoded data components with the one or more data decoding schemes determined atstep616 ofprocess600. For example, in one embodiment, the retrieved data components may be erasure decoded and then decrypted. In some embodiments, a data integrity check is performed on the reconstructed data, for example, using payload integrity validation information encoded in the data object identifier (e.g., size, timestamp, digest). In some cases, the retrieval job may fail due to a less-than-minimum number of retrieved data components, failure of data integrity check and the like. In such cases, the retrieval job may be retried in a fashion similar to that described in connection withFIG. 5. In some embodiments, the original data comprises multiple parts of data and each part is encoded and stored. In such embodiments, during retrieval, the encoded data components for each part of the data may be retrieved and decoded (e.g., erasure-decoded and decrypted) to form the original part and the decoded parts may be combined to form the original data.
In an embodiment,process600 includes storing reconstructed data in a staging store, such aspayload data cache228 described in connection withFIG. 2. In some embodiments, data stored622 in the staging store may be available for download by a customer for a period of time or indefinitely. In an embodiment, data integrity may be checked (e.g., using a digest) before the data is stored in the staging store.
In an embodiment,process600 includes providing624 a notification of the completion of the retrieval job to the requestor of the retrieval request or another entity or entities otherwise configured to receive such a notification. Such notifications may be provided individually or in batches. In other embodiments, the status of the retrieval job may be provided upon a polling request, for example, from a customer.
FIG. 7 illustrates anexample process700 for deleting data, in accordance with at least one embodiment. In an embodiment, one or more components of archivaldata storage system206, as described in connection withFIG. 2, collectively performprocess700.
In an embodiment,process700 includes receiving702 a data deletion request to delete data, such as stored byprocess500, described above. Such a data retrieval request may include a data object identifier, such as provided bystep508 ofprocess500, described above, or any other information that may be used to identify the data to be deleted.
In an embodiment,process700 includes processing704 the data deletion request upon receiving702 the request. In some embodiments, theprocessing704 is similar to that forstep504 ofprocess500 and step604 ofprocess600, described above. For example, in an embodiment, theprocessing704 is based at least in part on the data object identifier that is included in the data deletion request.
In an embodiment,process700 includes creating706 a data retrieval job corresponding to the data deletion request. Such a retrieval job may be created similar to the creation of storage job described in connection withstep510 ofprocess500 and the creation of the retrieval job described in connection withstep606 ofprocess600.
In an embodiment,process700 includes providing708 an acknowledgement that the data is deleted. In some embodiments, such acknowledgement may be provided in response to the data deletion request so as to provide a perception that the data deletion request is handled synchronously. In other embodiments, a job identifier associated with the data deletion job may be provided similar to the providing of job identifiers for data retrieval requests.
In an embodiment,process700 includesscheduling710 the data deletion job for execution. In some embodiments, scheduling710 of data deletion jobs may be implemented similar to that described in connection withstep512 ofprocess500 and in connection withstep610 ofprocess600, described above. For example, data deletion jobs for closely-located data may be coalesced and/or batch processed. For another example, data deletion jobs may be assigned a lower priority than data retrieval jobs.
In some embodiments, data stored may have an associated expiration time that is specified by a customer or set by default. In such embodiments, a deletion job may be created706 andschedule710 automatically on or near the expiration time of the data. In some embodiments, the expiration time may be further associated with a grace period during which data is still available or recoverable. In some embodiments, a notification of the pending deletion may be provided before, on or after the expiration time.
In some embodiments,process700 includes selecting712 the data deletion job for execution, for example, by astorage node manager244 from storage node manager job stored240, as described in connection withFIG. 2. The deletion job may be selected712 with other jobs for batch processing or otherwise selected as a result of job planning and optimization described above.
In some embodiments,process700 includes determining714 the storage entities for data components that store the data components that are generated by a storage process such asprocess500 described above. In an embodiment, astorage node manager244 may determine a plurality ofstorage nodes246 to retrieve the encoded data components in a manner similar to that discussed in connection withstep614 ofprocess600 described above.
In some embodiments,process700 includes causing716 the deletion of at least some of the data components. For example, in anenvironment200 illustrated byFIG. 2, astorage node manager244 responsible for the data deletion job may identify a set of storage nodes that store the data components for the data to be deleted and requests at least a subset of those storage nodes to delete their respective data components. Each of the subset ofstorage node246, upon receiving a request fromstorage node manager244 to delete a data component, may validate the request, for example, by checking the integrity of a storage location identifier (that is part of the data object identifier), verifying that the storage node indeed holds the requested data component and the like. Upon a successful validation, the storage node may delete the data component from a connected storage device and sends an acknowledgement tostorage node manager244 indicating whether the operation was successful. In an embodiment, multiple data deletion jobs may be executed in a batch such that data objects located close together may be deleted as a whole. In some embodiments, data deletion is considered successful whenstorage node manager244 receives positive acknowledgement from at least a subset ofstorage nodes246. The size of the subset may be configured to ensure that data cannot be reconstructed later on from undeleted data components. Failed or incomplete data deletion jobs may be retried in a manner similar to the retrying of data storage jobs and data retrieval jobs, described in connection withprocess500 andprocess600, respectively.
In an embodiment,process700 includes updating718 metadata information, such as that described in connection withstep526 ofprocess500. For example, storage nodes executing the deletion operation may update storage information including index, free space information and the like. In an embodiment, storage nodes may provide updates to storage node registrar or storage node registrar store. In various embodiments, some of such metadata information may be updated via batch processing and/or on a periodic basis to reduce performance and cost impact.
FIG. 8 depicts anillustrative data structure800 in which additional techniques for the validation of data integrity may be implemented.Illustrative data structure800 is but one of many different types of data structures that may be utilized to implement the techniques described herein. By way of example only, a user or client entity may wish to upload adata payload802 to the archival data storage service. The archival data storage service may then be configured to receive the data payload802 (in one or more parts) and allow the user to verify, at some point (e.g., immediately after upload or after some time, in some cases, after a relatively long time), that the data stored in the archival data storage service is, in fact, the same as thedata payload802 that was uploaded without requesting any size partitioning information from the user. In other words, the archival data storage service may provide a data object identifier that the user may return in order to retrieve stored data; however, the user may not need to store any information other than the data object identifier.
In some examples, in order to accept data from the user, the archival data storage service may request that the user provide a tree digest like thedata structure800 ofFIG. 8. Providing thedata structure800 may be performed in multiple ways in accordance with various embodiments. For example, all of the data illustrated in in thedata structure800 may be provided. As an alternative, in embodiments where thedata structure800 is constructible solely from the data for the leaf nodes, data for the leaf nodes may be provided without providing information for other, higher-level nodes. Additionally, the archival data storage service may provide instructions in the form of an algorithm, API, and/or SDK for generating thedata structure800. In some instances, limitations on the size of upload chunks and their respective offsets may be imposed. For example, the chunks or parts of thedata payload802 may be limited to powers of two of 1 MB. Additionally, in some examples, the determined size of each chunk may not be changed within a particular upload. Further, for each part received, the archival data storage service may calculate its own digest, based at least in part on the same algorithm used by the user, and provide the digest for each part. Upon completion of the storage job, the archival data storage service may provide the top-level digest value in the form of a data object identifier. Retrieval of the data may, in some examples, be implemented in a similar fashion, with restrictions on chunk sizes and offsets limited to powers of two by 1 MB, messages prepended with the digest of the data that is in the message and the top-level digest available upon completion of the job. However, based at least in part on this implementation, thedata payload802 should be able to be verified or validated independent of the chunk size selected by the user. A digest may be calculated by applying a cryptographic hash function, such as those associated with SHA-1, SHA-2, MD5, MD6, and the like, a checksum or error-detection code, such as cyclic redundancy check and the like to at least a portion of the payload data.
Thedata structure800 ofFIG. 8 may illustrate an appropriate digest tree for adata payload802 where the user has chosen to upload the data payload in a single part. As such, there is no part size for the user to select in this example. However, the resulting root digest806 should be calculable using the techniques described herein even if the user had selected to upload thedata payload802 in multiple parts, and even if the user had selected a part size unknown to the archival data storage service and/or not recorded by the user. In this example, for the sake of simplicity, it will be assumed that thedata payload802 is 7 MBs in size. As such, and since the user has requested to upload theentire payload802 in one part, thedata payload802 may be partitioned into seven 1 MB chunks, Sub1-Sub7. In some examples, however, if the size of thepayload802 were not divisible by 1 MB, the last chunk,Sub7, may be smaller than 1 MB. The archival data storage service may, based at least in part on the hash tree algorithm, generate a hash value (or digest) for each 1 MB chunk (i.e., Sub1-Sub7). Each of these hash values may be represented at the lowestchild node level808 of thedata structure800. In order to generate the nodes of the secondchild node level810, the archival data storage service may concatenate each pair of second-level node children and run the hash function on the concatenated data. In other words, thelowest level808 of the data structure may include a digest of payload data, while parent nodes may include digests of digests. Moving up the data structure, the described operations may be repeated until a root digest806 is generated.
As described, in some cases, the archival data storage service may provide intermediary root digests for individual parts of thepayload802. However, in this example, since the payload was not broken into parts, the archival data storage service may only provide the root digest806 to the user. In some cases, though, the archival data storage service may also provide each 1 MB digest generated. As such, either the user or the archival data storage service should be able to verify that the data was uploaded correctly (including at the 1 MB sub-part level) based at least in part on comparing each other's generated root digest806.
FIG. 9 depicts anotherillustrative data structure900 in which additional techniques for the validation of data integrity may be implemented. As noted with reference toFIG. 8, theillustrative data structure900 is but one of many different types of data structures that may be utilized to implement the techniques described herein. By way of example only, a user or client entity may wish to upload adata payload902 to the archival data storage service. The archival data storage service may then be configured to receive the data payload902 (in this example, in two parts) and allow the user to verify that the data stored in the archival data storage service is, in fact, the same as thedata payload902 that was uploaded. This validation may be done without requesting any size partitioning information from the user. In other words, the archival data storage service may provide a data object identifier that the user may return in order to retrieve stored data; however, the user may not need to store any information other than the data object identifier in order to request and/or validate the stored data.
In generating thedata structure900, the user or the archival data storage service may once again break the data into sub-parts; however, in this example, eachpart Part1 orPart2 may be broken up separately (e.g., Sub1-Sub4 ofPart1 and Sub1-Sub3). Again, a digest for each sub-part may be generated and included in the data structure at thechild level904 and digests of concatenated digests may be generated and included in the data structure at afirst parent level906. In this example, however, since thepayload902 has been broken into two parts, a top-level digest may be generated for each part. As such,Part1 digest908 andPart2 digest910 may be generated and included in thedata structure900. Additionally, as thepayload902 is uploaded, each of the sub-part digests (e.g., those at904) and the part digests (e.g., those at908) may be included in the upload. Further, a root digest912 may be generated in the same fashion that the other parent nodes are generated. That is, based at least in part on concatenating the children digests, and running the hash function on the concatenated information. In this example, this process would entail concatenatingPart1 digest908 andPart2 digest910 to generate a part--level digest. The archival data storage service may then run the hash function on the part-level digest to generate the root digest912. In some examples, the root digest may be received at the beginning of upload and once the upload is completed. Additionally, the archival data storage service may generate its own version of thedata structure900 and/or the root digest912 in order to validate the integrity of the data. Further, in some examples, the root digest912 generated by the archival data storage service may be provided to the user as part of a data object identifier that the user may utilize to make read, delete or index viewing requests.
In some examples, the archival data storage service may assume that data corruptions can occur anywhere in the system and/or may be caused by hardware bugs, bit flips, and/or due to bugs in software code implemented by the archival data storage service or the user. For at least this reason, the archival data storage service may review all, or a subset, of the data paths and operations to ensure that data integrity is provided throughout the system and that corrupt data is detected. In some cases, this may apply to the data payload (e.g., that stored in the archival data storage service) and to the metadata. As such, data integrity validation may be performed on the data object identifiers as well to ensure that requests to delete data are not pointing at the wrong data.
In some aspects, the archivaldata storage service206 may be configured to expect that a selected or otherwise determined digest function may be acceptable for the validation of data integrity. In some examples, the digest function may not be used for some cases related to data transformation. Otherwise, it may be selected and/or provided for use with validating some, al, or portions of the data and/or metadata of the archival data storage service. Additionally, as noted, in some examples, the initiator (i.e., the user) may pre-calculate the digest of thedata payload902 before transmission and then later may supply the digest again with the data to the archival data storage service. The archival data storage service may then recalculate the digest (e.g., the top-level digest), compare with the digest received from the initiator and/or acknowledge that the integrity of the data was validated by providing the archival data storage service-generated digest to the user. Additionally, each data subdivision and/or aggregation (e.g., the sub-parts, the parts, the part-level digests, and/or the root digests) may be re-validated by calculating the independent digest on the split of the aggregate data and comparing the digest or even by performing a bit by bit comparison. In other words, given any data payload, of any size, calculations may be performed to generate any number or type of the split or aggregated digests and, thus, validate the data and/or the parts.
Additionally, in some aspects, data transformation such as, but not limited to, erasure coding or encryption can be re-validated by performing the reverse transformation. The results of the reverse transformation may then be cross checked by comparing the digest and/or by bit by bit comparison. As such, the transformed data may include two digests. One of the two may testify to the integrity of the transformed data and the other may testify to the integrity of the original data. In some examples, referential items such as, but not limited to, the data object identifier that may reference the content may include the digest of the data being referenced. Additionally, the archival data storage service may also include information about the parent node that is being referenced. In some cases, messages from the control plane that are persisted in the storage node registrar store250, the data object identifier and/or other data structures may include digests that self-validate. These digests may be produced after the structures are created and/or verified upon retrieval or before the action. In some examples, this prevent things such as bugs in the code, memory corruption or bit rot from flipping data object retrieve commands into delete commands. Further, on the return path, the archival data storage service may be configured to re-validate that the data that is being returned to the customer is matched against the request and/or that no substitution during execution happens due to a bug in the code or other the like.
FIGS. 10-12 illustrate example flow diagrams showing respective processes1000-1200 for providing validation of data integrity. These processes are illustrated as logical flow diagrams, each operation of which represents a sequence of operations that can be implemented in hardware, computer instructions, or a combination thereof. In the context of computer instructions, the operations represent computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures and the like that perform particular functions or implement particular data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes.
Additionally, some, any, or all of the processes may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executing collectively on one or more processors, by hardware, or combinations thereof. As noted above, the code may be stored on a computer-readable storage medium, for example, in the form of a computer program comprising a plurality of instructions executable by one or more processors. The computer-readable storage medium may be non-transitory.
In some aspects, theAPI request handler218, thepayload data cache228, thestorage node manager244 and/or thestorage nodes246 of the one or more archivaldata storage service206, shown inFIG. 2, may perform theprocess1000 ofFIG. 10. Theprocess1000 may begin by providing (e.g., to a user or client entity of the archival data storage service) a function call for requesting data storage at1002. The function call may be part of an API or an SDK for interacting with and/or interfacing with the archival data storage service. At1004, theprocess1000 may include receiving a plurality of portions of a data payload from a remote computing device (i.e., the user). In some cases, the size of each portion may be consistent. In other cases, the size of each portion may be consistent except that the last portion may be different. Additionally, the size may be selected or otherwise determined by the user. At1006, theprocess1000 may include receiving an indication of the size of each portion. In some instances, the actions performed at1004 and1006 may be performed together as a single action. However, some restrictions may apply regarding portion size. Such as, in some examples, the portions may be limited to a consistent size (i.e., they may be required to be the same size); however, the last portion may be a remainder of the data payload (i.e., the payload minus each other consistently-sized portion). For example, the size selection may be limited to 1 MB or an integer multiple of 1 MB. In other examples, the size may be limited to 1 MB or a power of two of 1 MB.
Theprocess1000 may also include generating one or more sub-portions of a predefined size for at least some of the portions of the payload at1008. As noted above, regarding the portion size, while sub-portion size may be predefined and constant, the last sub-portion may be a different size. The predefined size may be 1 MB or any other size. At1010, theprocess1000 may include calculating one or more digests or hash values based at least in part on the sub-portions. The digests may be calculated based at least in part on a published and/or otherwise provided algorithm. In some examples, at1012, theprocess1000 may include generating a root node of a data structure. The data structure may be based at least in part on the sub-portion digests and/or aggregated digests, as described above. At1014, theprocess1000 may include determining a top-level digest of the data structure. The top-level digest may be based at least in part on the root node of the data structure and/or on a parent node associated with one of the portions of data. At1016, theprocess1000 may include providing instructions configured to enable the remote computing device to generate the data structure. In this way, the user may generate the data structure along with the archival data storage service. Theprocess1000 may then include receiving a top-level digest generated by the remote computing device at1018. Theprocess1000 may end at1020 by verifying that the stored data payload matches a received data payload. In other words, theprocess1000 may validate or verify the integrity of the data.
FIG. 11 illustrates another example flowdiagram showing process1100 for validating the integrity of data. In some aspects, theAPI request handler218, thepayload data cache228, thestorage node manager244, and/or thestorage nodes246 of the one or more archivaldata storage service206 shown inFIG. 2 may perform theprocess1100 ofFIG. 11. Theprocess1100 may begin by receiving one or more parts of a data payload at1102. As noted above, the parts may be any size. However, in some examples, the part size may be limited to 1 MB or multiples of 1 MB. In this way, a data structure may be composed independent of the chosen size. At1104, theprocess1100 may include generating a sub-part for the one or more parts. Again, these sub-parts may be any size or may be limited to 1 MB or other size limitation such as, but not limited to, 2 MB, 10 MB, etc. theprocess1100 may include calculating a value based on the sub-part at1106. The value may, in some cases, be a hash value, a digest, or other result of encryption. In some examples, theprocess1100 may include generating a root node of a data structure at1108. At1110, theprocess1100 may include determining a top-level value of the data structure based at least in part on traversing the data structure to the root node.
In some examples, theprocess1100 may also include storing the data payload at1112. The payload may be stored based at least in part on combining each of the one or more parts received at1102. As such, in some cases, the archivaldata storage service206 may not be able to store the payload at1112 until the data transmission of all the parts is complete. At1114, theprocess1100 may include validating that the stored data payload matches the received data payload. This may be performed by comparing the received top-level value with a calculated top-level value. At1116, theprocess1100 may include providing a data object identifier including the top-level value. The identifier may later be utilized by the user to retrieve and/or delete the stored data payload. In some examples, theprocess1100 may include receiving a request for the stored payload at1118. The stored payload may be provided back to the user in a similar fashion (with the integrity of the data being validated each step of the way). However, in some cases, theprocess1100 may end at1120 by verifying that the stored data payload has not changed prior to providing the payload.
FIG. 12 illustrates another example flowdiagram showing process1200 for validating the integrity of data. In some aspects, theAPI request handler218, thepayload data cache228, thestorage node manager244, and/or thestorage nodes246 of the one or more archivaldata storage service206 shown inFIG. 2 may perform theprocess1200 ofFIG. 12. Theprocess1200 may begin by providing instructions for making method calls to perform operations on data at1202. In some examples, these method calls may be exposed via one or more APIs or provided in one or more SDKs. At1204, theprocess1200 may include performing a first operation, using a verification algorithm, based on a first partitioning of a data object into first partitions. The first partitions, in some examples, may include 1 MB or other consistent sized chunks that may be utilized to generate a data structure such as, but not limited to, a hash tree or other binary tree of digests. In some examples, the first operation may include receiving the data from a user over a network. At1206, theprocess1200 may include verifying the data object to generate a first verification value (e.g., a hash code, checksum, etc.) based on the first partitions. Theprocess1200 may also include performing a second operation on a data object, utilizing the same verification algorithm, based at least in part on a second partitioning of the data object into second partitions at1208. The second partitions may be a different size from the first partitions. Based at least in part on the second partitions, theprocess1200 may include verifying the data object to generate a second value at1210. Here, the second operation may also include transmitting data to the archival data storage service. The second verification value, like the first may include, but is not limited to, a digest for a partition, a digest for digests formed by aggregating partition digests and/or a top-level digest of a data structure. At1212, theprocess1200 may end by determining whether the second verification value equals the first verification value. This may be determined based at least in part on comparing the two values. In some examples, if the verification algorithm is properly performed, and the data has maintained its integrity, the two values are expected to be equal. That is, independent of the size of the two sets of partitions (i.e., the first partitioning and the second partitioning), the verification values should be equal.
Illustrative methods and systems for validating the integrity of data are described above. Some or all of these systems and methods may, but need not, be implemented at least partially by architectures such as those shown above.
FIG. 13 illustrates aspects of anexample environment1300 for implementing aspects in accordance with various embodiments. As will be appreciated, although a Web-based environment is used for purposes of explanation, different environments may be used, as appropriate, to implement various embodiments. The environment includes anelectronic client device1302, which can include any appropriate device operable to send and receive requests, messages or information over anappropriate network1304 and convey information back to a user of the device. Examples of such client devices include personal computers, cell phones, handheld messaging devices, laptop computers, set top boxes, personal data assistants, electronic book readers, and the like. The network can include any appropriate network, including an intranet, the Internet, a cellular network, a local area network or any other such network or combination thereof. Components used for such a system can depend at least in part upon the type of network and/or environment selected. Protocols and components for communicating via such a network are well known and will not be discussed herein in detail. Communication over the network can be enabled by wired or wireless connections and combinations thereof. In this example, the network includes the Internet, as the environment includes aWeb server1306 for receiving requests and serving content in response thereto, although for other networks an alternative device serving a similar purpose could be used as would be apparent to one of ordinary skill in the art.
The illustrative environment includes at least oneapplication server1308 and adata store1310. It should be understood that there can be several application servers, layers, or other elements, processes, or components, which may be chained or otherwise configured, which can interact to perform tasks such as obtaining data from an appropriate data store. As used herein the term “data store” refers to any device or combination of devices capable of storing, accessing, and retrieving data, which may include any combination and number of data servers, databases, data storage devices, and data storage media, in any standard, distributed or clustered environment. The application server can include any appropriate hardware and software for integrating with the data store as needed to execute aspects of one or more applications for the client device, handling a majority of the data access and business logic for an application. The application server provides access control services in cooperation with the data store, and is able to generate content such as text, graphics, audio, and/or video to be transferred to the user, which may be served to the user by the Web server in the form of HTML, XML, or another appropriate structured language in this example. The handling of all requests and responses, as well as the delivery of content between theclient device1302 and theapplication server1308, can be handled by the Web server. It should be understood that the Web and application servers are not required and are merely example components, as structured code discussed herein can be executed on any appropriate device or host machine as discussed elsewhere herein.
Thedata store1310 can include several separate data tables, databases or other data storage mechanisms and media for storing data relating to a particular aspect. For example, the data store illustrated includes mechanisms for storingproduction data1312 anduser information1316, which can be used to serve content for the production side. The data store also is shown to include a mechanism for storinglog data1314, which can be used for reporting, analysis or other such purposes. It should be understood that there can be many other aspects that may need to be stored in the data store, such as for page image information and to access right information, which can be stored in any of the above listed mechanisms as appropriate or in additional mechanisms in thedata store1310. Thedata store1310 is operable, through logic associated therewith, to receive instructions from theapplication server1308 and obtain, update, or otherwise process data in response thereto. In one example, a user might submit a search request for a certain type of item. In this case, the data store might access the user information to verify the identity of the user, and can access the catalog detail information to obtain information about items of that type. The information then can be returned to the user, such as in a results listing on a Web page that the user is able to view via a browser on theuser device1302. Information for a particular item of interest can be viewed in a dedicated page or window of the browser.
Each server typically will include an operating system that provides executable program instructions for the general administration and operation of that server, and typically will include a computer-readable storage medium (e.g., a hard disk, random access memory, read only memory, etc.) storing instructions that, when executed by a processor of the server, allow the server to perform its intended functions. Suitable implementations for the operating system and general functionality of the servers are known or commercially available, and are readily implemented by persons having ordinary skill in the art, particularly in light of the disclosure herein.
The environment in one embodiment is a distributed computing environment utilizing several computer systems and components that are interconnected via communication links, using one or more computer networks or direct connections. However, it will be appreciated by those of ordinary skill in the art that such a system could operate equally well in a system having fewer or a greater number of components than are illustrated inFIG. 13. Thus, the depiction of thesystem1300 inFIG. 13 should be taken as being illustrative in nature, and not limiting to the scope of the disclosure.
The various embodiments further can be implemented in a wide variety of operating environments, which in some cases can include one or more user computers, computing devices or processing devices which can be used to operate any of a number of applications. User or client devices can include any of a number of general purpose personal computers, such as desktop or laptop computers running a standard operating system, as well as cellular, wireless, and handheld devices running mobile software and capable of supporting a number of networking and messaging protocols. Such a system also can include a number of workstations running any of a variety of commercially available operating systems and other known applications for purposes such as development and database management. These devices also can include other electronic devices, such as dummy terminals, thin-clients, gaming systems, and other devices capable of communicating via a network.
Most embodiments utilize at least one network that would be familiar to those skilled in the art for supporting communications using any of a variety of commercially available protocols, such as TCP/IP, OSI, FTP, UPnP, NFS, CIFS, and AppleTalk. The network can be, for example, a local area network, a wide-area network, a virtual private network, the Internet, an intranet, an extranet, a public switched telephone network, an infrared network, a wireless network, and any combination thereof.
In embodiments utilizing a Web server, the Web server can run any of a variety of server or mid-tier applications, including HTTP servers, FTP servers, CGI servers, data servers, Java servers, and business application servers. The server(s) also may be capable of executing programs or scripts in response requests from user devices, such as by executing one or more Web applications that may be implemented as one or more scripts or programs written in any programming language, such as Java®, C, C#, or C++, or any scripting language, such as Perl, Python, or TCL, as well as combinations thereof. The server(s) may also include database servers, including without limitation those commercially available from Oracle®, Microsoft®, Sybase®, and IBM®.
The environment can include a variety of data stores and other memory and storage media as discussed above. These can reside in a variety of locations, such as on a storage medium local to (and/or resident in) one or more of the computers or remote from any or all of the computers across the network. In a particular set of embodiments, the information may reside in a storage-area network (“SAN”) familiar to those skilled in the art. Similarly, any necessary files for performing the functions attributed to the computers, servers or other network devices may be stored locally and/or remotely, as appropriate. Where a system includes computerized devices, each such device can include hardware elements that may be electrically coupled via a bus, the elements including, for example, at least one central processing unit (CPU), at least one input device (e.g., a mouse, keyboard, controller, touch screen, or keypad), and at least one output device (e.g., a display device, printer, or speaker). Such a system may also include one or more storage devices, such as disk drives, optical storage devices, and solid-state storage devices, such as random access memory (“RAM”) or read-only memory (“ROM”), as well as removable media devices, memory cards, flash cards, etc.
Such devices also can include a computer-readable storage media reader, a communications device (e.g., a modem, a network card (wireless or wired), an infrared communication device, etc.), and working memory, as described above. The computer-readable storage media reader can be connected with, or configured to receive, a computer-readable storage medium, representing remote, local, fixed and/or removable storage devices as well as storage media for temporarily and/or more permanently containing, storing, transmitting, and retrieving computer-readable information. The system and various devices also typically will include a number of software applications, modules, services, or other elements located within at least one working memory device, including an operating system and application programs, such as a client application or Web browser. It should be appreciated that alternate embodiments may have numerous variations from that described above. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software (including portable software, such as applets), or both. Further, connection to other computing devices such as network input/output devices may be employed.
Storage media and computer readable media for containing code, or portions of code, can include any appropriate media known or used in the art, including storage media and communication media, such as but not limited to volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage and/or transmission of information, such as computer readable instructions, data structures, program modules, or other data, including RAM, ROM, EEPROM, flash memory, or other memory technology, CD-ROM, digital versatile disk (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the a system device. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and/or methods to implement the various embodiments.
The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made thereunto without departing from the broader spirit and scope of the invention as set forth in the claims.
Other variations are within the spirit of the present disclosure. Thus, while the disclosed techniques are susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific form or forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions and equivalents falling within the spirit and scope of the invention, as defined in the appended claims.
The use of the terms “a” and “an” and “the” and similar referents in the context of describing the disclosed embodiments (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. The term “connected” is to be construed as partly or wholly contained within, attached to, or joined together, even if there is something intervening. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate embodiments of the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
Preferred embodiments of this disclosure are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those preferred embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.
All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.

Claims (20)

What is claimed is:
1. A computer-implemented method, comprising:
receiving a request to store a data object comprising a first plurality of partitions, each partition of the first plurality of partitions having a first partition size selected by a requestor associated with the request;
verifying the data object using a data verification algorithm to generate a first verification value based at least in part on the first partition size, wherein the first verification value is derived from a second verification value generated based at least in part on at least two of the first plurality of partitions;
storing the data object and the first verification value;
partitioning the data object into a second plurality of partitions, each partition of the second plurality of partitions having a second partition size different than the first partition size; and
verifying the data object using the data verification algorithm by at least:
generating a third verification value based at least on the second partition size; and
comparing the third verification value with the first verification value.
2. The computer-implemented method ofclaim 1, wherein the data verification algorithm generates the second plurality of partitions from the data object.
3. The computer-implemented method ofclaim 2, wherein the first partition size is an integer multiple of the second partition size.
4. The computer-implemented method ofclaim 3, wherein the integer multiple is based at least in part on integer exponentiation of a degree of a data structure associated with the data object.
5. The computer-implemented method ofclaim 3, wherein the data verification algorithm determines the integer multiple.
6. The computer-implemented method ofclaim 1, wherein the first partition size selected by the requestor is part of the request.
7. A system, comprising at least one computing device configured to implement one or more services, wherein the one or more services at least:
receive a selection for a first partition size for a data object;
generate, based at least in part on a first verification value generated based at least in part on at least two partitions of the data object partitioned according to the first partition size, a second verification value;
generate a third verification value for the data object according to a second partition size for the data object, the second partition size differing from the first partition size; and
provide, to an entity from which the selection was received, a verification of the data object based at least in part on a comparison of the second verification value and the third verification value.
8. The system ofclaim 7, wherein the one or more services further generate a first plurality of partitions according to the first partition size.
9. The system ofclaim 7, wherein the one or more services further generate a second plurality of partitions according to the second partition size.
10. The system ofclaim 7, wherein the second partition size is an integer multiple of the first partition size.
11. The system ofclaim 7, wherein the first partition size is an integer multiple of the second partition size.
12. The system ofclaim 7, wherein the one or more services further:
perform a first data operation based at least in part on the first partition size;
use the second verification value to verify successful completion of the first data operation; and
perform a second data operation based at least in part on the second partition size.
13. The system ofclaim 12, wherein the first data operation includes storing the data object in a first manner and the second data operation includes persistently storing the data object in a second manner different from the first manner.
14. A non-transitory computer-readable storage medium having stored thereon executable instructions that, as a result of being executed by one or more processors of a computer system, cause the computer system to at least:
obtain a first verification value for a data object according to a first partition size selected by a requestor;
use a data verification algorithm to generate a second verification value, the second verification value being derived from a third verification value generated based at least in part on at least two partitions of the data object partitioned based at least in part on a second partition size for the data object, the second partition size being different than the first partition size;
verify the data object based at least on part whether the second verification value matches the first verification value; and
provide an outcome of verifying the data object to the requestor.
15. The non-transitory computer-readable storage medium ofclaim 14, wherein the second partition size is an integer multiple of the first partition size.
16. The non-transitory computer-readable storage medium ofclaim 15, wherein the integer multiple of the first partition size is based at least in part on integer exponentiation of a degree of a data structure associated with the data object.
17. The non-transitory computer-readable storage medium ofclaim 15, wherein the executable instructions further comprise instructions that, when executed by the one or more processors, cause the computer system to perform a data storage operation based at least in part on the second partition size, wherein the second verification value matching the first verification value indicates the data storage operation was successful.
18. The non-transitory computer-readable storage medium ofclaim 14, wherein the second verification value matches the first verification value as a result of the second verification value being equal to the first verification value.
19. The non-transitory computer-readable storage medium ofclaim 14, wherein the first verification value and the second verification value are top-level tree digests.
20. The non-transitory computer-readable storage medium ofclaim 14, wherein the first partition size is a uniform size of each partition of a plurality of partitions of the data object.
US15/286,4732012-08-082016-10-05Data storage integrity validationActiveUS10157199B2 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
US15/286,473US10157199B2 (en)2012-08-082016-10-05Data storage integrity validation

Applications Claiming Priority (15)

Application NumberPriority DateFiling DateTitle
US13/570,029US9092441B1 (en)2012-08-082012-08-08Archival data organization and management
US13/569,714US9830111B1 (en)2012-08-082012-08-08Data storage space management
US13/569,591US9354683B2 (en)2012-08-082012-08-08Data storage power management
US13/570,030US9652487B1 (en)2012-08-082012-08-08Programmable checksum calculations on data storage devices
US13/570,057US10120579B1 (en)2012-08-082012-08-08Data storage management for sequentially written media
US13/569,665US8959067B1 (en)2012-08-082012-08-08Data storage inventory indexing
US13/570,092US9563681B1 (en)2012-08-082012-08-08Archival data flow management
US13/569,994US9213709B2 (en)2012-08-082012-08-08Archival data identification
US13/570,151US8805793B2 (en)2012-08-082012-08-08Data storage integrity validation
US13/570,088US9767098B2 (en)2012-08-082012-08-08Archival data storage system
US13/570,005US9250811B1 (en)2012-08-082012-08-08Data write caching for sequentially written media
US13/570,074US9225675B2 (en)2012-08-082012-08-08Data storage application programming interface
US13/569,984US9779035B1 (en)2012-08-082012-08-08Log-based data storage on sequentially written media
US14/456,844US9465821B1 (en)2012-08-082014-08-11Data storage integrity validation
US15/286,473US10157199B2 (en)2012-08-082016-10-05Data storage integrity validation

Related Parent Applications (1)

Application NumberTitlePriority DateFiling Date
US14/456,844ContinuationUS9465821B1 (en)2012-08-082014-08-11Data storage integrity validation

Publications (2)

Publication NumberPublication Date
US20170024428A1 US20170024428A1 (en)2017-01-26
US10157199B2true US10157199B2 (en)2018-12-18

Family

ID=50066958

Family Applications (3)

Application NumberTitlePriority DateFiling Date
US13/570,151Expired - Fee RelatedUS8805793B2 (en)2012-08-082012-08-08Data storage integrity validation
US14/456,844ActiveUS9465821B1 (en)2012-08-082014-08-11Data storage integrity validation
US15/286,473ActiveUS10157199B2 (en)2012-08-082016-10-05Data storage integrity validation

Family Applications Before (2)

Application NumberTitlePriority DateFiling Date
US13/570,151Expired - Fee RelatedUS8805793B2 (en)2012-08-082012-08-08Data storage integrity validation
US14/456,844ActiveUS9465821B1 (en)2012-08-082014-08-11Data storage integrity validation

Country Status (1)

CountryLink
US (3)US8805793B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10853166B2 (en)*2017-04-282020-12-01Netapp Inc.Object format resilient to remote object store errors
US11128535B2 (en)*2019-03-192021-09-21Hitachi, Ltd.Computer system and data management method
US11263349B2 (en)2019-12-232022-03-01Bank Of America CorporationSystem for discovery and analysis of software distributed across an electronic network platform
RU2785484C1 (en)*2021-12-072022-12-08федеральное государственное казенное военное образовательное учреждение высшего образования "Краснодарское высшее военное орденов Жукова и Октябрьской Революции Краснознаменное училище имени генерала армии С.М. Штеменко" Министерства обороны Российской ФедерацииMethod for cryptographic recursive integrity control of a relational database

Families Citing this family (143)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9207964B1 (en)2012-11-152015-12-08Google Inc.Distributed batch matching of videos with dynamic resource allocation based on global score and prioritized scheduling score in a heterogeneous computing environment
US9992269B1 (en)*2013-02-252018-06-05EMC IP Holding Company LLCDistributed complex event processing
US9020893B2 (en)*2013-03-012015-04-28Datadirect Networks, Inc.Asynchronous namespace maintenance
US9329881B2 (en)*2013-04-232016-05-03Sap SeOptimized deployment of data services on the cloud
US9594766B2 (en)2013-07-152017-03-14International Business Machines CorporationReducing activation of similarity search in a data deduplication system
US10339109B2 (en)2013-07-152019-07-02International Business Machines CorporationOptimizing hash table structure for digest matching in a data deduplication system
US10229131B2 (en)*2013-07-152019-03-12International Business Machines CorporationDigest block segmentation based on reference segmentation in a data deduplication system
US10229132B2 (en)*2013-07-152019-03-12International Business Machines CorporationOptimizing digest based data matching in similarity based deduplication
US9836474B2 (en)2013-07-152017-12-05International Business Machines CorporationData structures for digests matching in a data deduplication system
US10789213B2 (en)2013-07-152020-09-29International Business Machines CorporationCalculation of digest segmentations for input data using similar data in a data deduplication system
US10296598B2 (en)2013-07-152019-05-21International Business Machines CorporationDigest based data matching in similarity based deduplication
US9529834B2 (en)*2014-02-262016-12-27International Business Machines CorporationConcatenating data objects for storage in a dispersed storage network
US10289713B2 (en)*2014-03-242019-05-14Ca, Inc.Logical validation for metadata builder
US9735967B2 (en)*2014-04-302017-08-15International Business Machines CorporationSelf-validating request message structure and operation
US9575800B2 (en)2014-05-132017-02-21International Business Machines CorporationUsing queues corresponding to attribute values and priorities associated with units of work and sub-units of the unit of work to select the units of work and their sub-units to process
US9563366B2 (en)*2014-05-132017-02-07International Business Machines CorporationUsing queues corresponding to attribute values associated with units of work and sub-units of the unit of work to select the units of work and their sub-units to process
US9921879B2 (en)2014-05-132018-03-20International Business Machines CorporationUsing queues corresponding to attribute values associated with units of work to select the units of work to process
US9361937B2 (en)*2014-08-262016-06-07Seagate Technology LlcShingled magnetic recording data store
US20160117318A1 (en)*2014-10-282016-04-28Salesforce.Com, Inc.Facilitating dynamically unified system of record in an on-demand services environment
US20160171071A1 (en)*2014-12-112016-06-16International Business Machines CorporationDynamic creation and configuration of partitioned index through analytics based on existing data population
US11188665B2 (en)*2015-02-272021-11-30Pure Storage, Inc.Using internal sensors to detect adverse interference and take defensive actions
US9710320B2 (en)2015-03-232017-07-18Microsoft Technology Licensing, LlcData processing validation
US10963430B2 (en)2015-04-012021-03-30Dropbox, Inc.Shared workspaces with selective content item synchronization
US9922201B2 (en)2015-04-012018-03-20Dropbox, Inc.Nested namespaces for selective content sharing
US10528530B2 (en)2015-04-082020-01-07Microsoft Technology Licensing, LlcFile repair of file stored across multiple data stores
EP3289483A1 (en)2015-05-012018-03-07Entit Software LLCSecure multi-party information retrieval
US10262024B1 (en)*2015-05-192019-04-16Amazon Technologies, Inc.Providing consistent access to data objects transcending storage limitations in a non-relational data store
US10270476B1 (en)2015-06-162019-04-23Amazon Technologies, Inc.Failure mode-sensitive layered redundancy coding techniques
US10977128B1 (en)2015-06-162021-04-13Amazon Technologies, Inc.Adaptive data loss mitigation for redundancy coding systems
US9998150B1 (en)*2015-06-162018-06-12Amazon Technologies, Inc.Layered data redundancy coding techniques for layer-local data recovery
US10270475B1 (en)2015-06-162019-04-23Amazon Technologies, Inc.Layered redundancy coding for encoded parity data
US10298259B1 (en)2015-06-162019-05-21Amazon Technologies, Inc.Multi-layered data redundancy coding techniques
US9838041B1 (en)*2015-06-172017-12-05Amazon Technologies, Inc.Device type differentiation for redundancy coded data storage systems
US10009044B1 (en)*2015-06-172018-06-26Amazon Technologies, Inc.Device type differentiation for redundancy coded data storage systems
US9825652B1 (en)2015-06-172017-11-21Amazon Technologies, Inc.Inter-facility network traffic optimization for redundancy coded data storage systems
US9866242B1 (en)2015-06-172018-01-09Amazon Technologies, Inc.Throughput optimization for redundancy coded data storage systems
US9853662B1 (en)2015-06-172017-12-26Amazon Technologies, Inc.Random access optimization for redundancy coded data storage systems
US9838042B1 (en)2015-06-172017-12-05Amazon Technologies, Inc.Data retrieval optimization for redundancy coded data storage systems with static redundancy ratios
US10311020B1 (en)2015-06-172019-06-04Amazon Technologies, Inc.Locality-sensitive data retrieval for redundancy coded data storage systems
US10162704B1 (en)2015-07-012018-12-25Amazon Technologies, Inc.Grid encoded data storage systems for efficient data repair
US10089176B1 (en)2015-07-012018-10-02Amazon Technologies, Inc.Incremental updates of grid encoded data storage systems
US10198311B1 (en)2015-07-012019-02-05Amazon Technologies, Inc.Cross-datacenter validation of grid encoded data storage systems
US9959167B1 (en)2015-07-012018-05-01Amazon Technologies, Inc.Rebundling grid encoded data storage systems
US9998539B1 (en)2015-07-012018-06-12Amazon Technologies, Inc.Non-parity in grid encoded data storage systems
US10108819B1 (en)2015-07-012018-10-23Amazon Technologies, Inc.Cross-datacenter extension of grid encoded data storage systems
US10394762B1 (en)2015-07-012019-08-27Amazon Technologies, Inc.Determining data redundancy in grid encoded data storage systems
US9904589B1 (en)2015-07-012018-02-27Amazon Technologies, Inc.Incremental media size extension for grid encoded data storage systems
CN106549990A (en)*2015-09-182017-03-29阿里巴巴集团控股有限公司A kind of processing method and system of distributed data
US9928141B1 (en)2015-09-212018-03-27Amazon Technologies, Inc.Exploiting variable media size in grid encoded data storage systems
US11386060B1 (en)2015-09-232022-07-12Amazon Technologies, Inc.Techniques for verifiably processing data in distributed computing systems
US9940474B1 (en)*2015-09-292018-04-10Amazon Technologies, Inc.Techniques and systems for data segregation in data storage systems
US9571573B1 (en)2015-10-292017-02-14Dropbox, Inc.Peer-to-peer synchronization protocol for multi-premises hosting of digital content items
US10691718B2 (en)2015-10-292020-06-23Dropbox, Inc.Synchronization protocol for multi-premises hosting of digital content items
WO2017082875A1 (en)2015-11-102017-05-18Hewlett Packard Enterprise Development LpData allocation based on secure information retrieval
US10346424B2 (en)*2015-12-012019-07-09International Business Machines CorporationObject processing
US10394789B1 (en)2015-12-072019-08-27Amazon Technologies, Inc.Techniques and systems for scalable request handling in data processing systems
US9785495B1 (en)2015-12-142017-10-10Amazon Technologies, Inc.Techniques and systems for detecting anomalous operational data
US10642813B1 (en)2015-12-142020-05-05Amazon Technologies, Inc.Techniques and systems for storage and processing of operational data
US10248793B1 (en)2015-12-162019-04-02Amazon Technologies, Inc.Techniques and systems for durable encryption and deletion in data storage systems
US10324790B1 (en)2015-12-172019-06-18Amazon Technologies, Inc.Flexible data storage device mapping for data storage systems
US10235402B1 (en)2015-12-172019-03-19Amazon Technologies, Inc.Techniques for combining grid-encoded data storage systems
US10127105B1 (en)2015-12-172018-11-13Amazon Technologies, Inc.Techniques for extending grids in data storage systems
US10102065B1 (en)2015-12-172018-10-16Amazon Technologies, Inc.Localized failure mode decorrelation in redundancy encoded data storage systems
US10180912B1 (en)2015-12-172019-01-15Amazon Technologies, Inc.Techniques and systems for data segregation in redundancy coded data storage systems
US11468053B2 (en)*2015-12-302022-10-11Dropbox, Inc.Servicing queries of a hybrid event index
US10114550B2 (en)2016-01-072018-10-30Samsung Electronics Co., Ltd.Data storage device and data processing system including the data storage device
US9537952B1 (en)2016-01-292017-01-03Dropbox, Inc.Apparent cloud access for hosted content items
KR101772554B1 (en)*2016-02-022017-08-30주식회사 코인플러그Method and server for providing notary service with respect to file and verifying the recorded file by using the notary service
US10791109B2 (en)*2016-02-102020-09-29Red Hat, Inc.Certificate based expiration of file system objects
US10649846B2 (en)*2016-02-122020-05-12Red Hat, Inc.Disassembly and reassembly of a tar archive
US10592336B1 (en)2016-03-242020-03-17Amazon Technologies, Inc.Layered indexing for asynchronous retrieval of redundancy coded data
US10621041B2 (en)2016-03-252020-04-14Intel CorporationMethods and apparatus to assign indices and relocate object fragments in distributed storage systems
US10678664B1 (en)2016-03-282020-06-09Amazon Technologies, Inc.Hybridized storage operation for redundancy coded data storage systems
US10061668B1 (en)2016-03-282018-08-28Amazon Technologies, Inc.Local storage clustering for redundancy coded data storage system
US10366062B1 (en)2016-03-282019-07-30Amazon Technologies, Inc.Cycled clustering for redundancy coded data storage systems
US10511598B2 (en)*2016-03-292019-12-17Intel CorporationTechnologies for dynamic loading of integrity protected modules into secure enclaves
CN106230880B (en)*2016-07-122019-05-28何晓行A kind of storage method and application server of data
AT518910B1 (en)*2016-08-042018-10-15Ait Austrian Inst Tech Gmbh Method for checking the availability and integrity of a distributed data object
US11137980B1 (en)2016-09-272021-10-05Amazon Technologies, Inc.Monotonic time-based data storage
US11080301B2 (en)2016-09-282021-08-03Hewlett Packard Enterprise Development LpStorage allocation based on secure data comparisons via multiple intermediaries
US11204895B1 (en)2016-09-282021-12-21Amazon Technologies, Inc.Data payload clustering for data storage systems
US10657097B1 (en)2016-09-282020-05-19Amazon Technologies, Inc.Data payload aggregation for data storage systems
US11281624B1 (en)2016-09-282022-03-22Amazon Technologies, Inc.Client-based batching of data payload
US10810157B1 (en)2016-09-282020-10-20Amazon Technologies, Inc.Command aggregation for data storage operations
US10437790B1 (en)2016-09-282019-10-08Amazon Technologies, Inc.Contextual optimization for data storage systems
US10496327B1 (en)2016-09-282019-12-03Amazon Technologies, Inc.Command parallelization for data storage systems
US10614239B2 (en)2016-09-302020-04-07Amazon Technologies, Inc.Immutable cryptographically secured ledger-backed databases
US10505729B2 (en)*2016-11-092019-12-10Sap SeSecure database featuring separate operating system user
US10296764B1 (en)2016-11-182019-05-21Amazon Technologies, Inc.Verifiable cryptographically secured ledgers for human resource systems
US11269888B1 (en)2016-11-282022-03-08Amazon Technologies, Inc.Archival data storage for structured data
CN107016041B (en)*2017-01-192020-05-05阿里巴巴集团控股有限公司Method and device for controlling single data to be exported
US10909097B2 (en)2017-02-052021-02-02Veritas Technologies LlcMethod and system for dependency analysis of workloads for orchestration
US12326841B1 (en)*2017-03-152025-06-10Amazon Technologies, Inc.Background incremental deletion cleanup techniques at storage services
US10621055B2 (en)2017-03-282020-04-14Amazon Technologies, Inc.Adaptive data recovery for clustered data devices
US11356445B2 (en)2017-03-282022-06-07Amazon Technologies, Inc.Data access interface for clustered devices
US10530752B2 (en)2017-03-282020-01-07Amazon Technologies, Inc.Efficient device provision
US10552389B2 (en)*2017-04-282020-02-04Oath Inc.Object and sequence number management
US10545826B2 (en)*2017-05-252020-01-28Scality, S.A.Layered error correction encoding for large scale distributed object storage system
US10853307B2 (en)*2017-07-282020-12-01Dell Products, L.P.System and method for a host application to access and verify contents within non-volatile storage of an information handling system
CN107480076A (en)*2017-07-312017-12-15北京小米移动软件有限公司Protection processing method, device and the terminal of system partitioning
US11258824B1 (en)2017-08-022022-02-22Styra, Inc.Method and apparatus for authorizing microservice APIs
US10922303B1 (en)*2017-08-172021-02-16Amazon Technologies, Inc.Early detection of corrupt data partition exports
US10740306B1 (en)*2017-12-042020-08-11Amazon Technologies, Inc.Large object partitioning system
US10715184B2 (en)*2017-12-112020-07-14Rubrik, Inc.Techniques for fast IO and low memory consumption while using erasure codes
CN110019363A (en)*2017-12-112019-07-16北京京东尚科信息技术有限公司A kind of method and apparatus verifying data
US10713231B2 (en)*2017-12-192020-07-14Mastercard International IncorporatedSystems and methods for evaluating data included in disparate databases and/or data structures
US11120052B1 (en)*2018-06-282021-09-14Amazon Technologies, Inc.Dynamic distributed data clustering using multi-level hash trees
US10719373B1 (en)*2018-08-232020-07-21Styra, Inc.Validating policies and data in API authorization system
US11853463B1 (en)2018-08-232023-12-26Styra, Inc.Leveraging standard protocols to interface unmodified applications and services
US11080410B1 (en)2018-08-242021-08-03Styra, Inc.Partial policy evaluation
US11470121B1 (en)2018-10-162022-10-11Styra, Inc.Deducing policies for authorizing an API
US10754736B2 (en)*2018-10-252020-08-25EMC IP Holding Company LLCStorage system with scanning and recovery of internal hash metadata structures
US11294805B2 (en)*2019-04-112022-04-05EMC IP Holding Company LLCFast and safe storage space reclamation for a data storage system
US11593525B1 (en)2019-05-102023-02-28Styra, Inc.Portable policy execution using embedded machines
US11182097B2 (en)*2019-05-142021-11-23International Business Machines CorporationLogical deletions in append only storage devices
US11055112B2 (en)2019-09-272021-07-06Amazon Technologies, Inc.Inserting executions of owner-specified code into input/output path of object storage service
US10908927B1 (en)2019-09-272021-02-02Amazon Technologies, Inc.On-demand execution of object filter code in output path of object storage service
US11106477B2 (en)*2019-09-272021-08-31Amazon Technologies, Inc.Execution of owner-specified code during input/output path to object storage service
US11360948B2 (en)2019-09-272022-06-14Amazon Technologies, Inc.Inserting owner-specified data processing pipelines into input/output path of object storage service
US11263220B2 (en)2019-09-272022-03-01Amazon Technologies, Inc.On-demand execution of object transformation code in output path of object storage service
US11656892B1 (en)2019-09-272023-05-23Amazon Technologies, Inc.Sequential execution of user-submitted code and native functions
US11386230B2 (en)2019-09-272022-07-12Amazon Technologies, Inc.On-demand code obfuscation of data in input path of object storage service
US11023311B2 (en)2019-09-272021-06-01Amazon Technologies, Inc.On-demand code execution in input path of data uploaded to storage service in multiple data portions
US11416628B2 (en)2019-09-272022-08-16Amazon Technologies, Inc.User-specific data manipulation system for object storage service based on user-submitted code
US10996961B2 (en)2019-09-272021-05-04Amazon Technologies, Inc.On-demand indexing of data in input path of object storage service
US11250007B1 (en)2019-09-272022-02-15Amazon Technologies, Inc.On-demand execution of object combination code in output path of object storage service
US11394761B1 (en)2019-09-272022-07-19Amazon Technologies, Inc.Execution of user-submitted code on a stream of data
US11550944B2 (en)2019-09-272023-01-10Amazon Technologies, Inc.Code execution environment customization system for object storage service
CN110719522B (en)*2019-10-312021-12-24广州视源电子科技股份有限公司Video display method and device, storage medium and electronic equipment
US11290531B2 (en)2019-12-042022-03-29Dropbox, Inc.Immediate cloud content item creation from local file system interface
US11582235B1 (en)2020-01-272023-02-14Styra, Inc.Local controller for local API authorization method and apparatus
WO2021188604A1 (en)*2020-03-172021-09-23Centerboard, LlcDigital file forensic accounting and management system
US20210303633A1 (en)*2020-03-302021-09-30International Business Machines CorporationShard hashing
US11928030B2 (en)*2020-03-312024-03-12Veritas Technologies LlcOptimize backup from universal share
US11626998B2 (en)*2020-07-212023-04-11Servicenow, Inc.Validated payload execution
US12003543B1 (en)2020-07-242024-06-04Styra, Inc.Method and system for modifying and validating API requests
US11593363B1 (en)2020-09-232023-02-28Styra, Inc.Comprehension indexing feature
US12216549B2 (en)*2020-10-232025-02-04EMC IP Holding Company LLCCloud-based processing of backup data for storage onto various types of object storage systems
US11520579B1 (en)2020-11-302022-12-06Styra, Inc.Automated asymptotic analysis
US20220237191A1 (en)*2021-01-252022-07-28Salesforce.Com, Inc.System and method for supporting very large data sets in databases
CN114237967A (en)*2022-02-222022-03-25阿里云计算有限公司Data reconstruction method and device
DE102022205719B3 (en)*2022-06-032023-11-09Siemens Healthcare Gmbh Method and device for trustworthy provision of data elements and method for checking a data record with multiple data elements
US20240354005A1 (en)*2023-04-242024-10-24Micron Technology, Inc.Managed non-volatile memory device with data verification

Citations (197)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JPH05113963A (en)1991-10-231993-05-07Nec CorpJob state display system
US5239640A (en)1991-02-011993-08-24International Business Machines CorporationData storage system and method including data and checksum write staging storage
JPH06149739A (en)1992-11-111994-05-31Hitachi Ltd How to confirm acceptance of job execution
US5506809A (en)1994-06-291996-04-09Sharp Kabushiki KaishaPredictive status flag generation in a first-in first-out (FIFO) memory device method and apparatus
US5586291A (en)1994-12-231996-12-17Emc CorporationDisk controller with volatile and non-volatile cache memories
US5701407A (en)1992-08-261997-12-23Mitsubishi Denki Kabushiki KaishaRedundant array of disks with improved data reconstruction
US5751997A (en)1993-01-211998-05-12Apple Computer, Inc.Method and apparatus for transferring archival data among an arbitrarily large number of computer devices in a networked computer environment
JPH10261075A (en)1997-03-211998-09-29Matsushita Electric Ind Co Ltd Image buffer control device
JPH1124997A (en)1997-06-301999-01-29Hitachi Haisofuto:KkSecurity method for recording computer generated file and computer readable recording medium to store security program
US5900007A (en)1992-12-171999-05-04International Business Machines CorporationData storage disk array having a constraint function for spatially dispersing disk files in the disk array
JPH11259321A (en)1997-10-311999-09-24Hewlett Packard Co <Hp>Method for redundantly encoding data
JP2000023075A (en)1998-07-032000-01-21Hitachi Ltd Digital video / audio recording / reproducing device
US6023710A (en)1997-12-232000-02-08Microsoft CorporationSystem and method for long-term administration of archival storage
US6138126A (en)1995-05-312000-10-24Network Appliance, Inc.Method for allocating files in a file system integrated with a raid disk sub-system
US6208999B1 (en)1996-12-122001-03-27Network Associates, Inc.Recoverable computer file system with a signature area containing file integrity information located in the storage blocks
WO2002027489A2 (en)2000-09-282002-04-04Curl CorporationPersistent data storage for client computer software programs
US6374264B1 (en)1998-09-042002-04-16Lucent Technologies Inc.Method and apparatus for detecting and recovering from data corruption of a database via read prechecking and deferred maintenance of codewords
US20020055942A1 (en)2000-10-262002-05-09Reynolds Mark L.Creating, verifying, managing, and using original digital files
US20020091903A1 (en)2001-01-092002-07-11Kabushiki Kaisha ToshibaDisk control system and method
US20020103815A1 (en)2000-12-122002-08-01Fresher Information CorporationHigh speed data updates implemented in an information storage and retrieval system
US20020122203A1 (en)2001-03-022002-09-05Hiroshi MatsudaImage processing device, information processing method, and control program
JP2002278844A (en)2001-01-192002-09-27Xerox CorpMethod/system for protecting electronic document, and confidential connect object
US20020161972A1 (en)2001-04-302002-10-31Talagala Nisha D.Data storage array employing block checksums and dynamic striping
KR20020088574A (en)2001-05-182002-11-29엘지전자 주식회사Memory card to adapt digital player and file write/read method thereof
US20020186844A1 (en)2000-12-182002-12-12Levy Kenneth L.User-friendly rights management systems and methods
US20030033308A1 (en)2001-08-032003-02-13Patel Sujal M.System and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system
US6543029B1 (en)1999-09-292003-04-01Emc CorporationError corrector
US6578127B1 (en)1996-04-022003-06-10Lexar Media, Inc.Memory devices
US6604224B1 (en)1999-03-312003-08-05Diva Systems CorporationMethod of performing content integrity analysis of a data stream
US20030149717A1 (en)2002-02-052003-08-07William HeinzmanBatch processing job streams using and/or precedence logic
US6606629B1 (en)2000-05-172003-08-12Lsi Logic CorporationData structures containing sequence and revision number metadata used in mass storage data integrity-assuring technique
US20040003272A1 (en)2002-06-282004-01-01International Business Machines CorporationDistributed autonomic backup
CN1487451A (en)2003-08-122004-04-07上海交通大学 Streaming Media Retrieval System in the Field of Distance Education Based on MPEG-7
US20040098565A1 (en)1998-12-302004-05-20Joseph RohlmanInstruction queue for an instruction pipeline
US6747825B1 (en)1999-08-272004-06-08Jpmorgan Chase Bank, As Collateral AgentDisc drive with fake defect entries
US6768863B2 (en)1999-02-182004-07-27Kabushiki Kaisha ToshibaRecording medium of stream data, and recording method and playback method of the same
CN1534949A (en)2003-03-272004-10-06国际商业机器公司Method and equipment used for obtaining state information in network
US20040243737A1 (en)2003-05-282004-12-02International Business Machines CorporationMethod, apparatus and program storage device for providing asynchronous status messaging in a data storage system
US20050050342A1 (en)2003-08-132005-03-03International Business Machines CorporationSecure storage utility
JP2005122311A (en)2003-10-142005-05-12Nippon Telegraph & Telephone East Corp Advertisement presenting method, apparatus and program
CN1619479A (en)2003-07-312005-05-25株式会社理光Printing processing device and method thereof
US20050114338A1 (en)2003-11-262005-05-26Veritas Operating Corp.System and method for determining file system data integrity
US20050160427A1 (en)2003-12-162005-07-21Eric UstarisSystem and method for managing log files
US20050187897A1 (en)2004-02-112005-08-25Microsoft CorporationSystem and method for switching a data partition
US20050203976A1 (en)2004-02-112005-09-15University Of CaliforniaSystems, tools and methods for transferring files and metadata to and from a storage means
US6950967B1 (en)2001-09-262005-09-27Maxtor CorporationMethod and apparatus for manufacture test processing a disk drive installed in a computer system
US6959326B1 (en)2000-08-242005-10-25International Business Machines CorporationMethod, system, and program for gathering indexable metadata on content at a data repository
US20050262378A1 (en)2004-05-032005-11-24Microsoft CorporationSystems and methods for automatic maintenance and repair of enitites in a data model
US20050267935A1 (en)1999-06-112005-12-01Microsoft CorporationData driven remote device control model with general programming interface-to-network messaging adaptor
US20060005074A1 (en)1993-04-232006-01-05Moshe YanaiRemote data mirroring
US20060015529A1 (en)2004-07-152006-01-19Hitachi, Ltd.Method and apparatus of hierarchical storage management based on data value
US20060020594A1 (en)*2004-07-212006-01-26Microsoft CorporationHierarchical drift detection of data sets
US20060095741A1 (en)2004-09-102006-05-04Cavium NetworksStore instruction ordering for multi-core processor
US20060107266A1 (en)2003-12-042006-05-18The Mathworks, Inc.Distribution of job in a portable format in distributed computing environments
CN1799051A (en)2003-06-032006-07-05株式会社爱可信Method for browsing contents using page storing file
US7076604B1 (en)2002-12-242006-07-11Western Digital Technologies, Inc.Disk drive employing a disk command data structure for tracking a write verify status of a write command
US20060190510A1 (en)2005-02-232006-08-24Microsoft CorporationWrite barrier for data storage integrity
JP2006285969A (en)2005-03-042006-10-19Sharp Corp Authentication method, authentication system, remote computing device, communication program and recording medium thereof
US20060272023A1 (en)1998-11-162006-11-30Yonah SchmeidlerMethod and apparatus for secure content delivery over broadband access networks
US20070011472A1 (en)2005-07-052007-01-11Ju-Soft Co., LtdSecure power-saving harddisk storage system and method
US20070050479A1 (en)2005-08-242007-03-01Sony CorporationContent receiving apparatus and content receiving method
US20070079087A1 (en)2005-09-292007-04-05Copan Systems, Inc.User interface for archival storage of data
US20070101095A1 (en)2005-10-272007-05-03Sandisk CorporationMethods for adaptively handling data writes in non-volatile memories
KR20070058281A (en)2004-03-312007-06-08마이크로소프트 코포레이션 System and method for consistency check of database backup
US20070156842A1 (en)2005-12-292007-07-05Vermeulen Allan HDistributed storage system with web services client interface
US20070174362A1 (en)2006-01-182007-07-26Duc PhamSystem and methods for secure digital data archiving and access auditing
US20070198789A1 (en)2003-05-062007-08-23Aptare, Inc.System to capture, transmit and persist backup and recovery meta data
US7269733B1 (en)2003-04-102007-09-11Cisco Technology, Inc.Reliable embedded file content addressing
CN101043372A (en)2006-03-222007-09-26比特福恩公司Equipment simple document search of management network
JP2007257566A (en)2006-03-272007-10-04Fujitsu Ltd Hash value generation program, storage management program, and storage system
US20070250674A1 (en)2006-04-252007-10-25Fineberg Samuel AMethod and system for scaleable, distributed, differential electronic-data backup and archiving
JP2007299308A (en)2006-05-022007-11-15Ricoh Co Ltd Job processing system, job processing method, program, and recording medium
US20070266037A1 (en)2004-11-052007-11-15Data Robotics IncorporatedFilesystem-Aware Block Storage System, Apparatus, and Method
US20070282969A1 (en)2006-06-012007-12-06Bradley DietrichMethods and apparatus for transferring media across a network using a network interface device
US7340490B2 (en)2001-07-132008-03-04Sun Microsystems, Inc.Storage network data replicator
US20080059483A1 (en)2006-08-312008-03-06Realnetworks, Inc.Api-accessible media distribution system
US20080068899A1 (en)2006-09-152008-03-20Fujitsu LimitedStorage management process, storage management apparatus, and computer-readable medium storing storage management program
US20080109478A1 (en)2006-11-022008-05-08Fujitsu LimitedDigital-content retrieval apparatus, digital-content retrieval method, and computer product
US20080120164A1 (en)2006-11-172008-05-22Avaya Technology LlcContact center agent work awareness algorithm
US20080168108A1 (en)2007-01-042008-07-10Hitachi Global Storage Technologies Netherlands, B.V.Techniques For Improving The Reliability of File Systems
US20080177697A1 (en)2005-03-172008-07-24International Business Machines CorporationMonitoring usage of components in a database index
US7409495B1 (en)2004-12-222008-08-05Symantec Operating CorporationMethod and apparatus for providing a temporal storage appliance with block virtualization in storage networks
US20080212225A1 (en)2007-02-212008-09-04Sony CorporationInformation processing apparatus, information recording medium, and information processing method, and computer program
US20080235485A1 (en)2007-03-202008-09-25Advanced Micro Devices, Inc.ECC implementation in non-ECC components
US20080285366A1 (en)2006-04-142008-11-20Advantest CorporationTest apparatus, program, and test method
US20080294764A1 (en)2007-05-212008-11-27Fujitsu LimitedStorage medium bearing hba information provision program, hba information provision method and hba information provision apparatus
JP2008299396A (en)2007-05-292008-12-11On Site:KkIntroduction support device, program for introduction support device and introduction support method
US20090013123A1 (en)2007-07-022009-01-08Chun-Yu HsiehStorage Bridge and Storage Device and Method Applying the Storage Bridge
US7487316B1 (en)2001-09-172009-02-03Rockwell Automation Technologies, Inc.Archive and restore system and methodology for on-line edits utilizing non-volatile buffering
US7487385B2 (en)2004-11-012009-02-03Netapp, Inc.Apparatus and method for recovering destroyed data volumes
US20090070537A1 (en)2007-09-102009-03-12Cho Chung HeeMethod and apparatus for formatting storage medium
US20090083476A1 (en)2007-09-212009-03-26Phison Electronics Corp.Solid state disk storage system with parallel accesssing architecture and solid state disck controller
US20090113167A1 (en)2007-10-252009-04-30Peter Thomas CambleData processing apparatus and method of processing data
US20090132676A1 (en)2007-11-202009-05-21Mediatek, Inc.Communication device for wireless virtual storage and method thereof
US20090150641A1 (en)2007-12-062009-06-11David FlynnApparatus, system, and method for efficient mapping of virtual and physical addresses
US20090157700A1 (en)2007-12-122009-06-18International Business Machines CorporationGenerating unique object identifiers for network management objects
US20090164506A1 (en)2007-12-192009-06-25Casdex, Inc.System and Method for Content-Based Email Authentication
CN101477543A (en)2008-01-032009-07-08埃森哲环球服务有限公司System and method for automating ETL application
CN101496005A (en)2005-12-292009-07-29亚马逊科技公司Distributed storage system with web services client interface
US20090193223A1 (en)2008-01-242009-07-30George SalibaMethods and systems for vectored data de-duplication
US20090198736A1 (en)*2008-01-312009-08-06Jinmei ShenTime-Based Multiple Data Partitioning
US20090198889A1 (en)2008-02-052009-08-06Sony CorporationRecording apparatus, recording method, program for recording method, and storage medium that stores program for recording method
US7577689B1 (en)2005-06-152009-08-18Adobe Systems IncorporatedMethod and system to archive data
US20090213487A1 (en)2008-02-222009-08-27International Business Machines CorporationEfficient method to detect disk write errors
US20090234883A1 (en)2008-03-112009-09-17Hurst Lawrence AFlexible and resilient information collaboration management infrastructure
US20090240750A1 (en)2008-03-242009-09-24Samsung Electronics Co., Ltd.Memory system and data access method
US20090254572A1 (en)2007-01-052009-10-08Redlich Ron MDigital information infrastructure and method
US20090265568A1 (en)2008-04-212009-10-22Cluster Resources, Inc.System and method for managing energy consumption in a compute environment
US20090300403A1 (en)2008-05-302009-12-03Mark Cameron LittleFine grained failure detection in distributed computing
US7644061B1 (en)2003-12-042010-01-05Sprint Communications Company L.P.Managing resource indexes in a networking environment
US20100017446A1 (en)2008-07-172010-01-21Samsung Electronics Co., Ltd.File system configuration method and apparatus for data security and for accessing same, and storage device accessed by same
US20100037056A1 (en)2008-08-072010-02-11Follis Benjamin DMethod to support privacy preserving secure data management in archival systems
US7685309B2 (en)1999-12-102010-03-23Sun Microsystems, Inc.System and method for separating addresses from the delivery scheme in a virtual private network
US20100094819A1 (en)2008-10-102010-04-15Sap AgConcurrent collaborative process for data management and retrieval
US7730071B2 (en)2006-09-132010-06-01Hitachi, Ltd.Data management system and data management method
US20100169544A1 (en)2008-12-312010-07-01Eom Young-IkMethods for distributing log block associativity for real-time system and flash memory devices performing the same
US7774466B2 (en)2002-10-172010-08-10Intel CorporationMethods and apparatus for load balancing storage nodes in a distributed storage area network system
US7783600B1 (en)2006-02-272010-08-24Symantec Operating CorporationRedundancy management service for peer-to-peer networks
US20100217927A1 (en)2004-12-212010-08-26Samsung Electronics Co., Ltd.Storage device and user device including the same
US20100223259A1 (en)2007-10-052010-09-02Aharon Ronen MizrahiSystem and method for enabling search of content
US20100228711A1 (en)2009-02-242010-09-09Microsoft CorporationEnterprise Search Method and System
US20100235409A1 (en)2009-03-102010-09-16Global Relay Communications Inc.System and method for managing data stored in a data network
US20100242096A1 (en)2009-03-202010-09-23Prakash VaradharajanManaging connections in a data storage system
US7814078B1 (en)2005-06-202010-10-12Hewlett-Packard Development Company, L.P.Identification of files with similar content
US7827201B1 (en)2007-04-272010-11-02Network Appliance, Inc.Merging containers in a multi-container system
JP2010251877A (en)2009-04-132010-11-04Hitachi Kokusai Electric Inc Signature device
US7840878B1 (en)2006-04-112010-11-23Marvell International Ltd.Systems and methods for data-path protection
WO2010151813A1 (en)2009-06-262010-12-29Simplivt CorporationFile system
US20110026942A1 (en)2009-07-282011-02-03Canon Kabushiki KaishaMonitoring apparatus and method for the same
US20110035757A1 (en)2006-04-282011-02-10Michael ComerSystem and method for management of jobs in a cluster environment
JP2011043968A (en)2009-08-202011-03-03Hitachi Solutions LtdBatch job processing device and batch job processing system
US20110058277A1 (en)2009-09-092011-03-10De La Fuente Anton RAsymmetric writer for shingled magnetic recording
US20110060775A1 (en)2007-12-282011-03-10Nokia CorporationFile storage method and system
US20110071988A1 (en)2007-10-092011-03-24Cleversafe, Inc.Data revision synchronization in a dispersed storage network
US20110078407A1 (en)2009-09-252011-03-31Russell Lee LewisDetermining an end of valid log in a log of write records
US20110099324A1 (en)2009-10-282011-04-28Phison Electronics Corp.Flash memory storage system and flash memory controller and data processing method thereof
US7937369B1 (en)2005-09-302011-05-03Emc CorporationData mover discovery of object extent
JP2011518381A (en)2008-04-062011-06-23フュージョン−アイオー・インコーポレーテッド Apparatus, system and method for verifying that a correct data segment is read from a data storage device
US20110161679A1 (en)2009-12-292011-06-30Cleversafe, Inc.Time based dispersed storage access
US8006125B1 (en)2005-04-292011-08-23Microsoft CorporationAutomatic detection and recovery of corrupt disk metadata
JP2011170667A (en)2010-02-192011-09-01Nec CorpFile-synchronizing system, file synchronization method, and file synchronization program
US8015158B1 (en)2007-04-232011-09-06Netapp, Inc.Copy-less restoring of transaction files of a database system
US8019925B1 (en)2004-05-062011-09-13Seagate Technology LlcMethods and structure for dynamically mapped mass storage device
US20110225417A1 (en)2006-12-132011-09-15Kavi MaharajhDigital rights management in a mobile environment
US20110231597A1 (en)2010-03-172011-09-22Phison Electronics Corp.Data access method, memory controller and memory storage system
US20110247074A1 (en)2010-03-302011-10-06Manring Bradley A CMetadata-based access, security, and compliance control of software generated files
JP2011197977A (en)2010-03-192011-10-06Nec CorpStorage system
US20110246716A1 (en)2010-03-302011-10-06Lenovo (Singapore) Pte, Ltd.Concatenating a first raid with a second raid
US20110258630A1 (en)2010-04-202011-10-20Salesforce.Com, Inc.Methods and systems for batch processing in an on-demand service environment
US20110264717A1 (en)2010-04-262011-10-27Cleversafe, Inc.Storage and retrieval of required slices in a dispersed storage network
US20110276656A1 (en)2010-05-052011-11-10The Go Daddy Group, Inc.Writing a file to a cloud storage solution
US8060473B1 (en)2006-01-172011-11-15Symantec Operating CorporationSystem and method for conveying backup and restore data via email
US20110282839A1 (en)2010-05-142011-11-17Mustafa PaksoyMethods and systems for backing up a search index in a multi-tenant database environment
US20110289383A1 (en)2010-05-192011-11-24Cleversafe, Inc.Retrieving data from a dispersed storage network in accordance with a retrieval threshold
US20110307657A1 (en)2010-06-142011-12-15Veeam Software International Ltd.Selective Processing of File System Objects for Image Level Backups
US20120030411A1 (en)2010-07-292012-02-02Phison Electronics Corp.Data protecting method, memory controller and portable memory storage apparatus
US8130554B1 (en)2008-09-292012-03-06Emc CorporationSecurely erasing flash-based memory
US20120079562A1 (en)2010-09-242012-03-29Nokia CorporationMethod and apparatus for validating resource identifier
US8156381B2 (en)2007-12-202012-04-10Fujitsu LimitedStorage management apparatus and storage system
US20120137062A1 (en)2010-11-302012-05-31International Business Machines CorporationLeveraging coalesced memory
US20120143830A1 (en)2010-12-022012-06-07At&T Intellectual Property I, L.P.Interactive proof to validate outsourced data stream processing
US20120150528A1 (en)2008-10-152012-06-14Oracle International CorporationBatch processing system
US20120166576A1 (en)2010-08-122012-06-28Orsini Rick LSystems and methods for secure remote storage
US20120173392A1 (en)2010-12-312012-07-05Sean KirbyMethod, system and apparatus for managing inventory
US20120210092A1 (en)2011-02-142012-08-16Seagate Technology LlcDynamic storage regions
US20120233432A1 (en)2011-03-092012-09-13Seagate Techonology LlcDynamic guarding of a storage media
US8291170B1 (en)2010-08-192012-10-16Symantec CorporationSystem and method for event driven backup data storage
US20120284719A1 (en)2011-05-032012-11-08Microsoft CorporationDistributed multi-phase batch job processing
US20120306912A1 (en)2011-06-022012-12-06Microsoft CorporationGlobal Composition System
US20120311260A1 (en)2011-06-022012-12-06Hitachi, Ltd.Storage managing system, computer system, and storage managing method
US8352439B1 (en)2004-06-032013-01-08Emc CorporationDatabase verification following database write
US8370315B1 (en)2010-05-282013-02-05Symantec CorporationSystem and method for high performance deduplication indexing
US20130046974A1 (en)2011-08-162013-02-21Microsoft CorporationDynamic symmetric searchable encryption
US20130145371A1 (en)2011-12-012013-06-06Sap AgBatch processing of business objects
US8464133B2 (en)2009-10-302013-06-11Cleversafe, Inc.Media content distribution in a social network utilizing dispersed storage
US8473816B2 (en)2011-09-012013-06-25International Business Machines CorporationData verification using checksum sidefile
US20130254166A1 (en)2005-12-192013-09-26Commvault Systems, Inc.Systems and methods for performing replication copy storage operations
US8554918B1 (en)2011-06-082013-10-08Emc CorporationData migration with load balancing and optimization
US20130290263A1 (en)2009-06-262013-10-31Simplivity CorporationFile system
US8595596B2 (en)2009-10-052013-11-26Cleversafe, Inc.Method and apparatus for dispersed storage of streaming data
US8620870B2 (en)2010-09-302013-12-31Commvault Systems, Inc.Efficient data management improvements, such as docking limited-feature data management modules to a full-featured data management system
US20140052706A1 (en)2011-04-292014-02-20Prateep MisraArchival storage and retrieval system
US20140068208A1 (en)2012-08-282014-03-06Seagate Technology LlcSeparately stored redundancy
US8671076B2 (en)2007-05-082014-03-11Bmc Software, Inc.Database recovery using logs applied to consistent copies
US8699159B1 (en)2012-06-182014-04-15Western Digital Technologies, Inc.Reducing effects of wide area track erasure in a disk drive
US20140149794A1 (en)2011-12-072014-05-29Sachin ShettySystem and method of implementing an object storage infrastructure for cloud-based services
US20140161123A1 (en)2010-03-112014-06-12Microsoft CorporationMulti-stage large send offload
US8806502B2 (en)2010-09-152014-08-12Qualcomm IncorporatedBatching resource requests in a portable computing device
US8838911B1 (en)2011-03-092014-09-16Verint Systems Inc.Systems, methods, and software for interleaved data stream storage
US8898114B1 (en)2010-08-272014-11-25Dell Software Inc.Multitier deduplication systems and methods
US8959067B1 (en)2012-08-082015-02-17Amazon Technologies, Inc.Data storage inventory indexing
US8972677B1 (en)2009-06-292015-03-03Symantec CorporationSystems and methods for implementing a storage interface specific to an archiving platform
US20150082458A1 (en)2009-07-092015-03-19Apple Inc.Methods and systems for upgrade and synchronization of securely installed applications on a computing device
US8990215B1 (en)2007-05-212015-03-24Amazon Technologies, Inc.Obtaining and verifying search indices
US9047306B1 (en)2005-10-172015-06-02Hewlett-Packard Development Company, L.P.Method of writing data
US9053212B2 (en)2008-08-062015-06-09Intelli-Services, Inc.Multi-dimensional metadata in research recordkeeping
US9372854B2 (en)2010-11-082016-06-21Hewlett Packard Enterprise Development LpLoad balancing backup jobs in a virtualized storage system having a plurality of physical nodes

Patent Citations (204)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5239640A (en)1991-02-011993-08-24International Business Machines CorporationData storage system and method including data and checksum write staging storage
JPH05113963A (en)1991-10-231993-05-07Nec CorpJob state display system
US5701407A (en)1992-08-261997-12-23Mitsubishi Denki Kabushiki KaishaRedundant array of disks with improved data reconstruction
US5737745A (en)1992-08-261998-04-07Mitsubishi Denki Kabushiki KaishaRedundant array of disks with host notification process for improved storage and recovery speed
JPH06149739A (en)1992-11-111994-05-31Hitachi Ltd How to confirm acceptance of job execution
US5900007A (en)1992-12-171999-05-04International Business Machines CorporationData storage disk array having a constraint function for spatially dispersing disk files in the disk array
US5751997A (en)1993-01-211998-05-12Apple Computer, Inc.Method and apparatus for transferring archival data among an arbitrarily large number of computer devices in a networked computer environment
US20060005074A1 (en)1993-04-232006-01-05Moshe YanaiRemote data mirroring
US5506809A (en)1994-06-291996-04-09Sharp Kabushiki KaishaPredictive status flag generation in a first-in first-out (FIFO) memory device method and apparatus
US5586291A (en)1994-12-231996-12-17Emc CorporationDisk controller with volatile and non-volatile cache memories
US6138126A (en)1995-05-312000-10-24Network Appliance, Inc.Method for allocating files in a file system integrated with a raid disk sub-system
US6578127B1 (en)1996-04-022003-06-10Lexar Media, Inc.Memory devices
US6208999B1 (en)1996-12-122001-03-27Network Associates, Inc.Recoverable computer file system with a signature area containing file integrity information located in the storage blocks
JPH10261075A (en)1997-03-211998-09-29Matsushita Electric Ind Co Ltd Image buffer control device
JPH1124997A (en)1997-06-301999-01-29Hitachi Haisofuto:KkSecurity method for recording computer generated file and computer readable recording medium to store security program
JPH11259321A (en)1997-10-311999-09-24Hewlett Packard Co <Hp>Method for redundantly encoding data
US6023710A (en)1997-12-232000-02-08Microsoft CorporationSystem and method for long-term administration of archival storage
JP2000023075A (en)1998-07-032000-01-21Hitachi Ltd Digital video / audio recording / reproducing device
US6374264B1 (en)1998-09-042002-04-16Lucent Technologies Inc.Method and apparatus for detecting and recovering from data corruption of a database via read prechecking and deferred maintenance of codewords
US20060272023A1 (en)1998-11-162006-11-30Yonah SchmeidlerMethod and apparatus for secure content delivery over broadband access networks
US20040098565A1 (en)1998-12-302004-05-20Joseph RohlmanInstruction queue for an instruction pipeline
US6768863B2 (en)1999-02-182004-07-27Kabushiki Kaisha ToshibaRecording medium of stream data, and recording method and playback method of the same
US6604224B1 (en)1999-03-312003-08-05Diva Systems CorporationMethod of performing content integrity analysis of a data stream
US20050267935A1 (en)1999-06-112005-12-01Microsoft CorporationData driven remote device control model with general programming interface-to-network messaging adaptor
US6747825B1 (en)1999-08-272004-06-08Jpmorgan Chase Bank, As Collateral AgentDisc drive with fake defect entries
US6543029B1 (en)1999-09-292003-04-01Emc CorporationError corrector
US7685309B2 (en)1999-12-102010-03-23Sun Microsystems, Inc.System and method for separating addresses from the delivery scheme in a virtual private network
US6606629B1 (en)2000-05-172003-08-12Lsi Logic CorporationData structures containing sequence and revision number metadata used in mass storage data integrity-assuring technique
US6959326B1 (en)2000-08-242005-10-25International Business Machines CorporationMethod, system, and program for gathering indexable metadata on content at a data repository
WO2002027489A2 (en)2000-09-282002-04-04Curl CorporationPersistent data storage for client computer software programs
US20020055942A1 (en)2000-10-262002-05-09Reynolds Mark L.Creating, verifying, managing, and using original digital files
US20020103815A1 (en)2000-12-122002-08-01Fresher Information CorporationHigh speed data updates implemented in an information storage and retrieval system
US20020186844A1 (en)2000-12-182002-12-12Levy Kenneth L.User-friendly rights management systems and methods
US20020091903A1 (en)2001-01-092002-07-11Kabushiki Kaisha ToshibaDisk control system and method
JP2002278844A (en)2001-01-192002-09-27Xerox CorpMethod/system for protecting electronic document, and confidential connect object
US20020122203A1 (en)2001-03-022002-09-05Hiroshi MatsudaImage processing device, information processing method, and control program
US20020161972A1 (en)2001-04-302002-10-31Talagala Nisha D.Data storage array employing block checksums and dynamic striping
KR20020088574A (en)2001-05-182002-11-29엘지전자 주식회사Memory card to adapt digital player and file write/read method thereof
US7340490B2 (en)2001-07-132008-03-04Sun Microsystems, Inc.Storage network data replicator
US20030033308A1 (en)2001-08-032003-02-13Patel Sujal M.System and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system
US7487316B1 (en)2001-09-172009-02-03Rockwell Automation Technologies, Inc.Archive and restore system and methodology for on-line edits utilizing non-volatile buffering
US6950967B1 (en)2001-09-262005-09-27Maxtor CorporationMethod and apparatus for manufacture test processing a disk drive installed in a computer system
US20030149717A1 (en)2002-02-052003-08-07William HeinzmanBatch processing job streams using and/or precedence logic
US20040003272A1 (en)2002-06-282004-01-01International Business Machines CorporationDistributed autonomic backup
US7774466B2 (en)2002-10-172010-08-10Intel CorporationMethods and apparatus for load balancing storage nodes in a distributed storage area network system
US7076604B1 (en)2002-12-242006-07-11Western Digital Technologies, Inc.Disk drive employing a disk command data structure for tracking a write verify status of a write command
US7120737B1 (en)2002-12-242006-10-10Western Digital Technologies, Inc.Disk drive employing a disk command data structure for tracking a write verify status of a write command
CN1534949A (en)2003-03-272004-10-06国际商业机器公司Method and equipment used for obtaining state information in network
US7269733B1 (en)2003-04-102007-09-11Cisco Technology, Inc.Reliable embedded file content addressing
US20070198789A1 (en)2003-05-062007-08-23Aptare, Inc.System to capture, transmit and persist backup and recovery meta data
US20040243737A1 (en)2003-05-282004-12-02International Business Machines CorporationMethod, apparatus and program storage device for providing asynchronous status messaging in a data storage system
JP2006526837A (en)2003-06-032006-11-24株式会社Access How to browse content using page save file
CN1799051A (en)2003-06-032006-07-05株式会社爱可信Method for browsing contents using page storing file
CN1619479A (en)2003-07-312005-05-25株式会社理光Printing processing device and method thereof
CN1487451A (en)2003-08-122004-04-07上海交通大学 Streaming Media Retrieval System in the Field of Distance Education Based on MPEG-7
US20050050342A1 (en)2003-08-132005-03-03International Business Machines CorporationSecure storage utility
JP2005122311A (en)2003-10-142005-05-12Nippon Telegraph & Telephone East Corp Advertisement presenting method, apparatus and program
JP2007515002A (en)2003-11-262007-06-07ヴェリタス・オペレーティング・コーポレーション System and method for creating extensible file system metadata and processing file system content
US20050114338A1 (en)2003-11-262005-05-26Veritas Operating Corp.System and method for determining file system data integrity
US7644061B1 (en)2003-12-042010-01-05Sprint Communications Company L.P.Managing resource indexes in a networking environment
US20060107266A1 (en)2003-12-042006-05-18The Mathworks, Inc.Distribution of job in a portable format in distributed computing environments
US20050160427A1 (en)2003-12-162005-07-21Eric UstarisSystem and method for managing log files
US20050203976A1 (en)2004-02-112005-09-15University Of CaliforniaSystems, tools and methods for transferring files and metadata to and from a storage means
US20050187897A1 (en)2004-02-112005-08-25Microsoft CorporationSystem and method for switching a data partition
KR20070058281A (en)2004-03-312007-06-08마이크로소프트 코포레이션 System and method for consistency check of database backup
US20050262378A1 (en)2004-05-032005-11-24Microsoft CorporationSystems and methods for automatic maintenance and repair of enitites in a data model
US8019925B1 (en)2004-05-062011-09-13Seagate Technology LlcMethods and structure for dynamically mapped mass storage device
US8352439B1 (en)2004-06-032013-01-08Emc CorporationDatabase verification following database write
US20060015529A1 (en)2004-07-152006-01-19Hitachi, Ltd.Method and apparatus of hierarchical storage management based on data value
US20060020594A1 (en)*2004-07-212006-01-26Microsoft CorporationHierarchical drift detection of data sets
US20060095741A1 (en)2004-09-102006-05-04Cavium NetworksStore instruction ordering for multi-core processor
US7487385B2 (en)2004-11-012009-02-03Netapp, Inc.Apparatus and method for recovering destroyed data volumes
US20070266037A1 (en)2004-11-052007-11-15Data Robotics IncorporatedFilesystem-Aware Block Storage System, Apparatus, and Method
US20100217927A1 (en)2004-12-212010-08-26Samsung Electronics Co., Ltd.Storage device and user device including the same
US7409495B1 (en)2004-12-222008-08-05Symantec Operating CorporationMethod and apparatus for providing a temporal storage appliance with block virtualization in storage networks
US20060190510A1 (en)2005-02-232006-08-24Microsoft CorporationWrite barrier for data storage integrity
JP2006285969A (en)2005-03-042006-10-19Sharp Corp Authentication method, authentication system, remote computing device, communication program and recording medium thereof
US20080177697A1 (en)2005-03-172008-07-24International Business Machines CorporationMonitoring usage of components in a database index
US8006125B1 (en)2005-04-292011-08-23Microsoft CorporationAutomatic detection and recovery of corrupt disk metadata
US7577689B1 (en)2005-06-152009-08-18Adobe Systems IncorporatedMethod and system to archive data
US7814078B1 (en)2005-06-202010-10-12Hewlett-Packard Development Company, L.P.Identification of files with similar content
US20070011472A1 (en)2005-07-052007-01-11Ju-Soft Co., LtdSecure power-saving harddisk storage system and method
US20070050479A1 (en)2005-08-242007-03-01Sony CorporationContent receiving apparatus and content receiving method
US20070079087A1 (en)2005-09-292007-04-05Copan Systems, Inc.User interface for archival storage of data
US7937369B1 (en)2005-09-302011-05-03Emc CorporationData mover discovery of object extent
US9047306B1 (en)2005-10-172015-06-02Hewlett-Packard Development Company, L.P.Method of writing data
US20070101095A1 (en)2005-10-272007-05-03Sandisk CorporationMethods for adaptively handling data writes in non-volatile memories
US20130254166A1 (en)2005-12-192013-09-26Commvault Systems, Inc.Systems and methods for performing replication copy storage operations
US20070156842A1 (en)2005-12-292007-07-05Vermeulen Allan HDistributed storage system with web services client interface
CN101496005A (en)2005-12-292009-07-29亚马逊科技公司Distributed storage system with web services client interface
US8060473B1 (en)2006-01-172011-11-15Symantec Operating CorporationSystem and method for conveying backup and restore data via email
US20070174362A1 (en)2006-01-182007-07-26Duc PhamSystem and methods for secure digital data archiving and access auditing
US7783600B1 (en)2006-02-272010-08-24Symantec Operating CorporationRedundancy management service for peer-to-peer networks
CN101043372A (en)2006-03-222007-09-26比特福恩公司Equipment simple document search of management network
JP2007257566A (en)2006-03-272007-10-04Fujitsu Ltd Hash value generation program, storage management program, and storage system
US7840878B1 (en)2006-04-112010-11-23Marvell International Ltd.Systems and methods for data-path protection
US20080285366A1 (en)2006-04-142008-11-20Advantest CorporationTest apparatus, program, and test method
US20070250674A1 (en)2006-04-252007-10-25Fineberg Samuel AMethod and system for scaleable, distributed, differential electronic-data backup and archiving
US20110035757A1 (en)2006-04-282011-02-10Michael ComerSystem and method for management of jobs in a cluster environment
JP2007299308A (en)2006-05-022007-11-15Ricoh Co Ltd Job processing system, job processing method, program, and recording medium
US7929551B2 (en)2006-06-012011-04-19Rovi Solutions CorporationMethods and apparatus for transferring media across a network using a network interface device
US20070283046A1 (en)2006-06-012007-12-06Bradley DietrichMethods and apparatus for providing media from content providers using a network interface device
US20070282969A1 (en)2006-06-012007-12-06Bradley DietrichMethods and apparatus for transferring media across a network using a network interface device
US20080059483A1 (en)2006-08-312008-03-06Realnetworks, Inc.Api-accessible media distribution system
US7730071B2 (en)2006-09-132010-06-01Hitachi, Ltd.Data management system and data management method
US20080068899A1 (en)2006-09-152008-03-20Fujitsu LimitedStorage management process, storage management apparatus, and computer-readable medium storing storage management program
US20080109478A1 (en)2006-11-022008-05-08Fujitsu LimitedDigital-content retrieval apparatus, digital-content retrieval method, and computer product
US20080120164A1 (en)2006-11-172008-05-22Avaya Technology LlcContact center agent work awareness algorithm
US20110225417A1 (en)2006-12-132011-09-15Kavi MaharajhDigital rights management in a mobile environment
US20080168108A1 (en)2007-01-042008-07-10Hitachi Global Storage Technologies Netherlands, B.V.Techniques For Improving The Reliability of File Systems
US20090254572A1 (en)2007-01-052009-10-08Redlich Ron MDigital information infrastructure and method
US20080212225A1 (en)2007-02-212008-09-04Sony CorporationInformation processing apparatus, information recording medium, and information processing method, and computer program
US20080235485A1 (en)2007-03-202008-09-25Advanced Micro Devices, Inc.ECC implementation in non-ECC components
US8015158B1 (en)2007-04-232011-09-06Netapp, Inc.Copy-less restoring of transaction files of a database system
US7827201B1 (en)2007-04-272010-11-02Network Appliance, Inc.Merging containers in a multi-container system
US8671076B2 (en)2007-05-082014-03-11Bmc Software, Inc.Database recovery using logs applied to consistent copies
US20080294764A1 (en)2007-05-212008-11-27Fujitsu LimitedStorage medium bearing hba information provision program, hba information provision method and hba information provision apparatus
US8990215B1 (en)2007-05-212015-03-24Amazon Technologies, Inc.Obtaining and verifying search indices
JP2008299396A (en)2007-05-292008-12-11On Site:KkIntroduction support device, program for introduction support device and introduction support method
US20090013123A1 (en)2007-07-022009-01-08Chun-Yu HsiehStorage Bridge and Storage Device and Method Applying the Storage Bridge
US20090070537A1 (en)2007-09-102009-03-12Cho Chung HeeMethod and apparatus for formatting storage medium
US20090083476A1 (en)2007-09-212009-03-26Phison Electronics Corp.Solid state disk storage system with parallel accesssing architecture and solid state disck controller
US20100223259A1 (en)2007-10-052010-09-02Aharon Ronen MizrahiSystem and method for enabling search of content
US20110071988A1 (en)2007-10-092011-03-24Cleversafe, Inc.Data revision synchronization in a dispersed storage network
US20090113167A1 (en)2007-10-252009-04-30Peter Thomas CambleData processing apparatus and method of processing data
US20090132676A1 (en)2007-11-202009-05-21Mediatek, Inc.Communication device for wireless virtual storage and method thereof
US20090150641A1 (en)2007-12-062009-06-11David FlynnApparatus, system, and method for efficient mapping of virtual and physical addresses
US20090157700A1 (en)2007-12-122009-06-18International Business Machines CorporationGenerating unique object identifiers for network management objects
US20090164506A1 (en)2007-12-192009-06-25Casdex, Inc.System and Method for Content-Based Email Authentication
US8156381B2 (en)2007-12-202012-04-10Fujitsu LimitedStorage management apparatus and storage system
US20110060775A1 (en)2007-12-282011-03-10Nokia CorporationFile storage method and system
CN101477543A (en)2008-01-032009-07-08埃森哲环球服务有限公司System and method for automating ETL application
US20090193223A1 (en)2008-01-242009-07-30George SalibaMethods and systems for vectored data de-duplication
US20090198736A1 (en)*2008-01-312009-08-06Jinmei ShenTime-Based Multiple Data Partitioning
US20090198889A1 (en)2008-02-052009-08-06Sony CorporationRecording apparatus, recording method, program for recording method, and storage medium that stores program for recording method
US20090213487A1 (en)2008-02-222009-08-27International Business Machines CorporationEfficient method to detect disk write errors
US20090234883A1 (en)2008-03-112009-09-17Hurst Lawrence AFlexible and resilient information collaboration management infrastructure
US20090240750A1 (en)2008-03-242009-09-24Samsung Electronics Co., Ltd.Memory system and data access method
JP2011518381A (en)2008-04-062011-06-23フュージョン−アイオー・インコーポレーテッド Apparatus, system and method for verifying that a correct data segment is read from a data storage device
US20090265568A1 (en)2008-04-212009-10-22Cluster Resources, Inc.System and method for managing energy consumption in a compute environment
US20090300403A1 (en)2008-05-302009-12-03Mark Cameron LittleFine grained failure detection in distributed computing
US20100017446A1 (en)2008-07-172010-01-21Samsung Electronics Co., Ltd.File system configuration method and apparatus for data security and for accessing same, and storage device accessed by same
US9053212B2 (en)2008-08-062015-06-09Intelli-Services, Inc.Multi-dimensional metadata in research recordkeeping
US20100037056A1 (en)2008-08-072010-02-11Follis Benjamin DMethod to support privacy preserving secure data management in archival systems
US8130554B1 (en)2008-09-292012-03-06Emc CorporationSecurely erasing flash-based memory
US20100094819A1 (en)2008-10-102010-04-15Sap AgConcurrent collaborative process for data management and retrieval
US20120150528A1 (en)2008-10-152012-06-14Oracle International CorporationBatch processing system
US20100169544A1 (en)2008-12-312010-07-01Eom Young-IkMethods for distributing log block associativity for real-time system and flash memory devices performing the same
US20100228711A1 (en)2009-02-242010-09-09Microsoft CorporationEnterprise Search Method and System
US20100235409A1 (en)2009-03-102010-09-16Global Relay Communications Inc.System and method for managing data stored in a data network
US20100242096A1 (en)2009-03-202010-09-23Prakash VaradharajanManaging connections in a data storage system
JP2010251877A (en)2009-04-132010-11-04Hitachi Kokusai Electric Inc Signature device
WO2010151813A1 (en)2009-06-262010-12-29Simplivt CorporationFile system
US20130290263A1 (en)2009-06-262013-10-31Simplivity CorporationFile system
US8972677B1 (en)2009-06-292015-03-03Symantec CorporationSystems and methods for implementing a storage interface specific to an archiving platform
US20150082458A1 (en)2009-07-092015-03-19Apple Inc.Methods and systems for upgrade and synchronization of securely installed applications on a computing device
US20110026942A1 (en)2009-07-282011-02-03Canon Kabushiki KaishaMonitoring apparatus and method for the same
JP2011043968A (en)2009-08-202011-03-03Hitachi Solutions LtdBatch job processing device and batch job processing system
US20110058277A1 (en)2009-09-092011-03-10De La Fuente Anton RAsymmetric writer for shingled magnetic recording
US20110078407A1 (en)2009-09-252011-03-31Russell Lee LewisDetermining an end of valid log in a log of write records
US8595596B2 (en)2009-10-052013-11-26Cleversafe, Inc.Method and apparatus for dispersed storage of streaming data
US20110099324A1 (en)2009-10-282011-04-28Phison Electronics Corp.Flash memory storage system and flash memory controller and data processing method thereof
US8464133B2 (en)2009-10-302013-06-11Cleversafe, Inc.Media content distribution in a social network utilizing dispersed storage
US20110161679A1 (en)2009-12-292011-06-30Cleversafe, Inc.Time based dispersed storage access
JP2011170667A (en)2010-02-192011-09-01Nec CorpFile-synchronizing system, file synchronization method, and file synchronization program
US20140161123A1 (en)2010-03-112014-06-12Microsoft CorporationMulti-stage large send offload
US20110231597A1 (en)2010-03-172011-09-22Phison Electronics Corp.Data access method, memory controller and memory storage system
JP2011197977A (en)2010-03-192011-10-06Nec CorpStorage system
US20110246716A1 (en)2010-03-302011-10-06Lenovo (Singapore) Pte, Ltd.Concatenating a first raid with a second raid
US20110247074A1 (en)2010-03-302011-10-06Manring Bradley A CMetadata-based access, security, and compliance control of software generated files
US20110258630A1 (en)2010-04-202011-10-20Salesforce.Com, Inc.Methods and systems for batch processing in an on-demand service environment
US20110265143A1 (en)2010-04-262011-10-27Cleversafe, Inc.Slice retrieval in accordance with an access sequence in a dispersed storage network
US20110264717A1 (en)2010-04-262011-10-27Cleversafe, Inc.Storage and retrieval of required slices in a dispersed storage network
US20110276656A1 (en)2010-05-052011-11-10The Go Daddy Group, Inc.Writing a file to a cloud storage solution
US20110282839A1 (en)2010-05-142011-11-17Mustafa PaksoyMethods and systems for backing up a search index in a multi-tenant database environment
US20110289383A1 (en)2010-05-192011-11-24Cleversafe, Inc.Retrieving data from a dispersed storage network in accordance with a retrieval threshold
US8370315B1 (en)2010-05-282013-02-05Symantec CorporationSystem and method for high performance deduplication indexing
US20110307657A1 (en)2010-06-142011-12-15Veeam Software International Ltd.Selective Processing of File System Objects for Image Level Backups
US20120030411A1 (en)2010-07-292012-02-02Phison Electronics Corp.Data protecting method, memory controller and portable memory storage apparatus
US20120166576A1 (en)2010-08-122012-06-28Orsini Rick LSystems and methods for secure remote storage
US8291170B1 (en)2010-08-192012-10-16Symantec CorporationSystem and method for event driven backup data storage
US8898114B1 (en)2010-08-272014-11-25Dell Software Inc.Multitier deduplication systems and methods
US8806502B2 (en)2010-09-152014-08-12Qualcomm IncorporatedBatching resource requests in a portable computing device
US20120079562A1 (en)2010-09-242012-03-29Nokia CorporationMethod and apparatus for validating resource identifier
US8620870B2 (en)2010-09-302013-12-31Commvault Systems, Inc.Efficient data management improvements, such as docking limited-feature data management modules to a full-featured data management system
US9372854B2 (en)2010-11-082016-06-21Hewlett Packard Enterprise Development LpLoad balancing backup jobs in a virtualized storage system having a plurality of physical nodes
US20120137062A1 (en)2010-11-302012-05-31International Business Machines CorporationLeveraging coalesced memory
US20120143830A1 (en)2010-12-022012-06-07At&T Intellectual Property I, L.P.Interactive proof to validate outsourced data stream processing
US20120173392A1 (en)2010-12-312012-07-05Sean KirbyMethod, system and apparatus for managing inventory
US20120210092A1 (en)2011-02-142012-08-16Seagate Technology LlcDynamic storage regions
US8838911B1 (en)2011-03-092014-09-16Verint Systems Inc.Systems, methods, and software for interleaved data stream storage
US20120233432A1 (en)2011-03-092012-09-13Seagate Techonology LlcDynamic guarding of a storage media
US20140052706A1 (en)2011-04-292014-02-20Prateep MisraArchival storage and retrieval system
US20120284719A1 (en)2011-05-032012-11-08Microsoft CorporationDistributed multi-phase batch job processing
US20120306912A1 (en)2011-06-022012-12-06Microsoft CorporationGlobal Composition System
US20120311260A1 (en)2011-06-022012-12-06Hitachi, Ltd.Storage managing system, computer system, and storage managing method
US8554918B1 (en)2011-06-082013-10-08Emc CorporationData migration with load balancing and optimization
US20130046974A1 (en)2011-08-162013-02-21Microsoft CorporationDynamic symmetric searchable encryption
US8473816B2 (en)2011-09-012013-06-25International Business Machines CorporationData verification using checksum sidefile
US20130145371A1 (en)2011-12-012013-06-06Sap AgBatch processing of business objects
US20140149794A1 (en)2011-12-072014-05-29Sachin ShettySystem and method of implementing an object storage infrastructure for cloud-based services
US8699159B1 (en)2012-06-182014-04-15Western Digital Technologies, Inc.Reducing effects of wide area track erasure in a disk drive
US8959067B1 (en)2012-08-082015-02-17Amazon Technologies, Inc.Data storage inventory indexing
US20140068208A1 (en)2012-08-282014-03-06Seagate Technology LlcSeparately stored redundancy

Non-Patent Citations (44)

* Cited by examiner, † Cited by third party
Title
"Decision of Patent Grant, dated Nov. 1, 2017," Korean Patent Application No. 10-2017-7021593, filed Aug. 6, 2013, 3 pages.
"Notice on Grant of Patent Right for Invention, dated Nov. 17, 2017," Chinese Patent Application No. 201380042169.7, filed Aug. 6, 2013, 2 pages.
"Office Action dated Nov. 20, 2017," Canadian Patent Application No. 2881567, filed Aug. 6, 2013, 7 pages.
"Second Office Action dated Jan. 12, 2018," Chinese Patent Application No. 201380042170.X, filed Aug. 6, 2013, 8 pages.
Advanced Computer & Network Corporation, "RAID Level 5: Independent Data Disks With Distributed Parity Blocks", May 12, 2011, from https://web.archive.org/web/20110512213916/http://www.acnc.com/raidedu/5, 2 pages.
Advanced Computer & Network Corporation, "RAID level 6: Independent Data Disks With Two Independent Parity Schemes", May 7, 2011, from https://web.archive.org/web/20110507215950/http://www.acnc.com/raidedu/6, 2 pages.
Amazon Web Services, "Amazon Elastic MapReduce Developer Guide," API Version Nov. 30, 2009, dated Jun. 12, 2012, retrieved on Jun. 22, 2015, from https://web.archive.org/web/20120612043953/http://s3.amazonaws.com/awsdocs/ElasticMapReduce/latest/emr-dg.pdf, 318 pages.
Amazon Web Services, "Amazon Glacier Developer Guide," API Version Jun. 1, 2012, dated Aug. 20, 2012, retrieved Jun. 22, 2015, from https://web.archive.org/web/20120908043705/http://awsdocs.s3.amazonaws.com/glacier/latest/glacier-dg.pdf, 209 pages.
Amazon Web Services, "AWS Import/Export Developer Guide," API Version Jun. 3, 2010, dated Jun. 12, 2012, retrieved Jun. 22, 2015, from https://web.archive.org/web/20120612051330/http://s3.amazonaws.com/awsdocs/ImportExpert/latest/AWSImportExport-dg.pdf, 104 pages.
Amer et al., "Design Issues for a Shingled Write Disk System," 26th IEEE Symposium on Massive Storage Systems and Technologies: Research Track (MSST 2010):1-12, May 2010.
Canadian Office Action dated Apr. 17, 2018, Patent Application No. 2881490, filed Aug. 6, 2013, 4 pages.
Canadian Office Action dated Apr. 25, 2018, Patent Application No. 2881475, filed Aug. 6, 2013, 5 pages.
Chen et al., "RAID: High-Performance, Reliable Secondary Storage," ACM Computing Surveys 1994, 26:145-185, retrieved on Jan. 11, 2016, from https://web.archive.org/web/20040721062927/http://meseec.ce.rit.edu/eecc722-fall2002/papers/io/3/chen94raid.pdf, 69 pages.
Chinese Decision on Rejection dated Sep. 5, 2018, Patent Application No. 201380042170.X, filed Aug. 6, 2013, 7 pages.
Chinese Notice on Grant of Patent Right for Invention dated Sep. 26, 2018, Patent Application No. 201380042166.3, filed August 6, 2013, 2 pages.
Chinese Notice on the Third Office Action dated Mar. 19, 2018, Patent Application No. 201380042166.3, filed Aug. 6, 2013, 5 pages.
Cisco, "Cisco Standalone HDD Firmware Update Version 3.0-IBM Servers," Nov. 16, 2010, 5 pages.
Cisco, "Cisco Standalone HDD Firmware Update Version 3.0—IBM Servers," Nov. 16, 2010, 5 pages.
Duan, "Research and Application of Distributed Parallel Search Hadoop Algorithm," 2012 International Conference on Systems and Informatics (ICSAI 2012), IEEE, May 19, 2012, pp. 2462-2465.
European Communication Pursuant to Article 94(3) EPC dated Feb. 19, 2018, Patent Application No. 13827419.6, filed Aug. 6, 2013, 3 pages.
Extended European Search Report dated Mar. 5, 2018, European Patent Application No. 17196030.5, filed Oct. 11, 2017, 9 pages.
Gibson et al., "Directions for Shingled-Write and Two-Dimensional Magnetic Recording System Architectures: Synergies with Solid-State Disks (CMU-PDL-09-104)," Carnegie Mellon University Research Showcase, Parallel Data Laboratory, Research Centers and Institutes:1-3, May 2009.
IEEE, "The Authoritative Dictionary of IEEE Standards Terms," Seventh Edition, 2000, p. 836.
International Search Report and Written Opinion dated Feb. 14, 2014, in International Patent Application No. PCT/US2013/053828, filed Aug. 6, 2013.
International Search Report and Written Opinion dated Feb. 14, 2014, in International Patent Application No. PCT/US2013/053853, filed Aug. 6, 2013.
International Search Report and Written Opinion dated Mar. 6, 2014, in International Patent Application No. PCT/US2013/053852, filed Aug. 6, 2013.
Jacobs et al., "Memory Systems, Cache, DRAM, Disk," Copyright 2007, Morgan Kaufman, 9 pages.
Japanese Notice of Rejection dated Jul. 24, 2018, Patent Application No. 2017-080044, filed Aug. 6, 2013, 2 pages.
Japanese Official Notice of Rejection dated Aug. 7, 2018, Patent Application No. 2017-152756, filed Aug. 6, 2013, 4 pages.
Japanese Official Notice of Rejection dated Jun. 5, 2018, Patent Application No. 2017-094235, filed Aug. 6, 2013, 3 pages.
Kozierok, "File Allocation Tables," The PC Guide, Apr. 17, 2001, from http://www.pcguide.com/ref/hdd/tile/fatFATs-c.html, 2 pages.
Massiglia, "The RAID Book: The Storage System Technology Handbook", 6th Edition, 1997, pp. 26-27, 84-91, 136-143, and 270-271.
Merriam-Webster, "Predetermine," Current Edition of Dictionary, retrieved Dec. 15, 2014, from www.merriam-webster.com/dictionary.
Micheloni et al., "Inside NAND Flash Memories," Springer First Edition (ISBN 978-90-481-9430-8):40-42, Aug. 2010.
Roos, "How to Leverage an API for Conferencing," Jan. 2012, from http://money.howstuffworks.com/businesscommunications/ how-to-leverage-an-api-for-conferencing1.htm.
Rosenblum et al., "The Design and Implementation of a Log-Structured File System," University of California at Berkley, ACM Transactions on Computer Systems Volume/Issue 10(1):26-52, Feb. 1992.
Seagate, "Firmware Updates for Seagate Products," Feb. 2012, retrieved from http://knowledge.seagate.com/articles/en US/FAQ/207931en, 1 page.
Singaporean Second Invitation to Respond to Written Opinion and Second Written Opinion dated Aug. 17, 2018, Patent Application No. 10201600997Y, filed Aug. 6, 2013, 6 pages.
Wikipedia, "Checksum," retrieved Mar. 2011, from Wayback/Wikipedia at en.wikipedia.org/wiki/checksum, 5 pages.
Wikipedia, "Error Correction," retrieved Sep. 2010, from Wayback/Wikipedia.org at en.wikipedia.org/wiki/Error-correcting_code, 7 pages.
Wikipedia, "Hash Tree," retrieved Jul. 12, 2012, from http://en.wikipedia.org/wiki/Hash_tree, 1 page.
Wikipedia, "Process identifier," dated Sep. 3, 2010, retrieved Jul. 9, 2015, from https://en.wikipedia.org/w/index.php?title=Process_identifier&amp;oldid=382695536, 2 pages.
Wikipedia, "Process identifier," dated Sep. 3, 2010, retrieved Jul. 9, 2015, from https://en.wikipedia.org/w/index.php?title=Process_identifier&oldid=382695536, 2 pages.
Yu et al., "Exploiting sequential access when declustering data over disks and MEMS-based storage," Distributed and Parallel Databases 19(2-3):147-168, May 25, 2006.

Cited By (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10853166B2 (en)*2017-04-282020-12-01Netapp Inc.Object format resilient to remote object store errors
US11573855B2 (en)2017-04-282023-02-07Netapp, Inc.Object format resilient to remote object store errors
US11934262B2 (en)2017-04-282024-03-19Netapp, Inc.Object format resilient to remote object store errors
US11128535B2 (en)*2019-03-192021-09-21Hitachi, Ltd.Computer system and data management method
US11263349B2 (en)2019-12-232022-03-01Bank Of America CorporationSystem for discovery and analysis of software distributed across an electronic network platform
RU2785484C1 (en)*2021-12-072022-12-08федеральное государственное казенное военное образовательное учреждение высшего образования "Краснодарское высшее военное орденов Жукова и Октябрьской Революции Краснознаменное училище имени генерала армии С.М. Штеменко" Министерства обороны Российской ФедерацииMethod for cryptographic recursive integrity control of a relational database
RU2840783C1 (en)*2024-12-052025-05-28федеральное государственное казенное военное образовательное учреждение высшего образования "Краснодарское высшее военное орденов Жукова и Октябрьской Революции Краснознаменное училище имени генерала армии С.М. Штеменко" Министерства обороны Российской ФедерацииMethod of forming and monitoring integrity of multidimensional structure of electronic documents

Also Published As

Publication numberPublication date
US9465821B1 (en)2016-10-11
US8805793B2 (en)2014-08-12
US20140046909A1 (en)2014-02-13
US20170024428A1 (en)2017-01-26

Similar Documents

PublicationPublication DateTitle
US10157199B2 (en)Data storage integrity validation
AU2018204309B2 (en)Archival data storage system
US20200327113A1 (en)Data storage application programming interface
US9767129B2 (en)Data storage inventory indexing
US9092441B1 (en)Archival data organization and management
US9767098B2 (en)Archival data storage system
US9563681B1 (en)Archival data flow management
US9213709B2 (en)Archival data identification
US9830111B1 (en)Data storage space management
US9354683B2 (en)Data storage power management
US10558581B1 (en)Systems and techniques for data recovery in a keymapless data storage system
US9652487B1 (en)Programmable checksum calculations on data storage devices
US9779035B1 (en)Log-based data storage on sequentially written media

Legal Events

DateCodeTitleDescription
ASAssignment

Owner name:AMAZON TECHNOLOGIES, INC., WASHINGTON

Free format text:ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PATIEJUNAS, KESTUTIS;LAZIER, COLIN L.;SEIGLE, MARK C.;AND OTHERS;SIGNING DATES FROM 20140827 TO 20160118;REEL/FRAME:039949/0580

STCFInformation on status: patent grant

Free format text:PATENTED CASE

MAFPMaintenance fee payment

Free format text:PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment:4


[8]ページ先頭

©2009-2025 Movatter.jp