Disclosure of Invention
The embodiment of the invention provides a method and a system for synchronizing an aggregate object in an object storage system, which are used for solving the problem that the synchronization characteristic and the aggregation characteristic are incompatible during the synchronization between sites in the object storage system.
The embodiment of the invention discloses the following technical scheme:
 the first aspect of the present invention provides a method for synchronizing aggregate objects in an object storage system, the method comprising:
 the slave station sends an http request and acquires barrel fragment information in data_log of the master station;
 Analyzing whether the objects need to be synchronized or not by the slave station through index information of the objects in the barrel slicing information;
 If synchronization is needed, a get request is sent from a slave station, and the object of the master station is obtained;
 the master station reads the data information of the object according to the aggregation characteristic of the object, and returns the data information of the object to the slave station;
 And the slave station synchronizes the data information of the object into a storage cluster of the slave station according to a preset storage rule.
Further, index information of the objects is distributed in the bucket fragment information according to a hash rule.
Further, the master site reads the data information of the object according to the aggregation characteristic of the object specifically:
 if the object marks the aggregation characteristic, reading the data information of the object from an SSD storage pool of a master site;
 And if the object is not marked with the aggregation characteristic, reading the metadata information of the object from an SSD storage pool of the master site, and acquiring the data information and the offset of the object from an HDD storage pool of the master site through the metadata information of the object.
Further, the preset storage rules comprise grading characteristics and aggregation characteristics;
 A hierarchical property for storing objects in an SSD storage pool of the slave site, marking the objects with an aggregate property;
 and the aggregation characteristic is used for aggregating the data information of the objects marked with the aggregation characteristic into one aggregation object and storing the aggregation object in an HDD storage pool of the site.
Further, the grading characteristic specifically includes:
 storing metadata information for the object and data information for the object in an SSD storage pool of the secondary site;
 if the capacity value of the object is smaller than a first preset value, marking aggregation characteristics for the object;
 And if the capacity value of the object is larger than or equal to a first preset value, marking the aggregation characteristic for the object.
Further, the polymerization characteristics specifically include:
 Reading the data information of a plurality of objects, the quantity of which is equal to a second preset value and the aggregation characteristic of which is marked, in the SSD storage pool according to a preset period;
 Aggregating data information of a plurality of the objects into an aggregate object;
 storing the data information and the offset of the aggregate object in an HDD storage pool of the slave site;
 and deleting the data information of the plurality of objects in the SSD storage pool.
The second aspect of the present invention provides a synchronization system for aggregating objects in an object storage system, implemented based on the method, the system comprising:
 The request sending module is used for sending http requests and get requests;
 The main site is used for storing the object of the main site and the barrel fragment information in the data_log;
 The synchronization judging module is used for judging whether the objects on the main site need synchronization or not;
 A slave station for storing data information of an object to be synchronized to the slave station;
 The preset storage rule module is used for presetting storage rules of the objects in the master site and the slave site;
 And the object analysis module is used for analyzing index information in the barrel fragment information by reading the object in the master site and the barrel fragment information in the data_log through the request sending module and writing the object to be synchronized into the slave site.
Further, the preset storage rule module includes:
 a hierarchical property unit for storing the object in an SSD storage pool of the slave station, marking the object with an aggregate property;
 and the aggregation characteristic unit is used for aggregating the data information of the objects marked with the aggregation characteristic into one aggregation object and storing the aggregation object in the HDD storage pool of the slave site.
Further, the object analysis module includes:
 An object reading unit for reading metadata information of the object in the main site, data information of the object and barrel fragment information in data_log according to the aggregation characteristic of the object;
 an object writing unit for writing the object to be synchronized into the slave station;
 And an object analysis unit which analyzes index information in the bucket fragment information.
Further, the process of reading the object by the object reading unit specifically includes:
 if the object marks the aggregation characteristic, reading the data information of the object from an SSD storage pool of a master site;
 And if the object is not marked with the aggregation characteristic, reading the metadata information of the object from an SSD storage pool of the master site, and acquiring the data information and the offset of the object from an HDD storage pool of the master site through the metadata information of the object.
The effects provided in the summary of the invention are merely effects of embodiments, not all effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
 The method for synchronizing the aggregate objects in the object storage system comprises the steps of obtaining bucket fragment information in data_log of a master site from a site, analyzing whether the objects need to be synchronized or not through index information of the objects in the bucket fragment information, reading the data information of the objects according to aggregation characteristics of the objects if the objects need to be synchronized, reading the data information of the objects from an SSD storage pool of the master site if the objects mark the aggregation characteristics, reading the metadata information of the objects from the SSD storage pool of the master site if the objects do not mark the aggregation characteristics, obtaining the data information and offset of the objects from an HDD storage pool of the master site through the metadata information of the objects, and returning the data information of the objects to a slave site by the master site, wherein the data information of the objects is synchronized to a storage cluster of the slave site according to a preset storage rule. Therefore, the invention solves the problem that the synchronization characteristic and the aggregation characteristic are incompatible when the stations are synchronized in the object storage system, improves the applicability of storage products and improves the competitiveness of object storage.
The object synchronizing system in the object storage system comprises a request sending module, a master station, a synchronization judging module, a slave station, a preset storage rule module and an object analysis module, wherein the request sending module sends http requests and get requests, the master station stores the object of the master station and bucket fragment information in data_log, the synchronization judging module judges whether the object on the master station needs to be synchronized, the slave station stores data information of the object which needs to be synchronized to the slave station, the preset storage rule module presets storage rules of the object in the master station and the slave station, and the object analysis module requests the sending module to read the object in the master station and the bucket fragment information in data_log, analyze index information in the bucket fragment information and write the object which needs to be synchronized to the slave station. If synchronization is needed, the master site reads the data information of the object according to the aggregation characteristic of the object, if the object marks the aggregation characteristic, the data information of the object is read from an SSD storage pool of the master site, if the object does not mark the aggregation characteristic, the metadata information of the object is read from the SSD storage pool of the master site, and the data information and the offset of the object are obtained from an HDD storage pool of the master site through the metadata information of the object. And the master station returns the data information of the object to the slave station, and the slave station synchronizes the data information of the object to a storage cluster of the slave station according to a preset storage rule. The system solves the problem that the synchronization characteristic and the aggregation characteristic are incompatible when the stations are synchronized in the object storage system, increases the compatibility of multiple stations and aggregation, improves the applicability of storage products, and improves the competitiveness of object storage.
Detailed Description
In order to clearly illustrate the technical features of the present solution, the present invention will be described in detail below with reference to the following detailed description and the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different structures of the invention. In order to simplify the present disclosure, components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted so as to not unnecessarily obscure the present invention.
A Memory system (Memory system) is a system in a computer, which is composed of various Memory devices for storing programs and data, control units, and devices (hardware) and algorithms (software) for managing information scheduling. The main memory of the computer can not meet the requirements of high access speed, large storage capacity and low cost at the same time, and the multi-level hierarchical memory with low speed and large capacity is needed to be arranged in the computer, so that the storage system with acceptable performance is formed by the optimal control scheduling algorithm and reasonable cost. The performance of the storage system is increasingly important in computers, the overall efficiency is affected by the quality of storage management and organization, and the requirements of modern information processing such as image processing, databases and knowledge bases on the storage system are high.
As the requirements of scientific computing and data processing on storage systems are increasing, there is a continuous need to improve existing storage technologies, research new storage media, and improve the structure and management of storage systems. Large scale integrated circuits and magnetic disks remain the primary storage medium.
The synchronization refers to data synchronization redundancy between two storage clusters, wherein one of the two storage clusters is normally used, and the other storage cluster is used for redundancy backup, so that one site immediately takes over the role of the other site after a disaster, and the daily used storage cluster is a master site, and the storage cluster used as a backup is a slave site.
Bucket shards (shard), which are index information of objects (objects) in a bucket. In the on bucket shards mode, one socket corresponds to one or more rados object (distributed objects)).
After the multi-site function is started, each object is uploaded or deleted, and the information of the barrel slice where the object is located is recorded in the data_log.
Embodiment one:
 as shown in fig. 1, the method for synchronizing an aggregate object in an object storage system according to an embodiment of the present invention includes:
 the slave station sends an http request and acquires barrel fragment information in data_log of the master station;
 Analyzing whether the objects need to be synchronized or not by the slave station through index information of the objects in the barrel slicing information;
 if synchronization is needed, a get request is sent from a slave station to acquire an object of a master station;
 the master station reads the data information of the object according to the aggregation characteristic of the object and returns the data information of the object to the slave station;
 and the slave station synchronizes the data information of the object into the storage cluster of the slave station according to a preset storage rule.
As shown in fig. 2, in order to improve the read-write performance of object storage, the index information of the object in the bucket is uniformly distributed on 128 bucket fragments according to the hash rule, so that the index information of the object is stored in the bucket fragments.
The main station reads the data information of the object according to the aggregation characteristic of the object specifically as follows:
 If the object marks the aggregation characteristic, reading the data information of the object from an SSD storage pool of the master site;
 If the object is not marked with the aggregation characteristic, the metadata information of the object is read from the SSD storage pool of the master site, and the data information and the offset (the offset is the starting position of the data information of the object stored in the HDD storage pool) of the object are obtained from the HDD storage pool of the master site through the metadata information of the object.
The preset storage rules comprise grading characteristics and aggregation characteristics;
 A hierarchical property for storing objects in an SSD storage pool of the slave site, marking the objects with an aggregate property;
 an aggregation characteristic for aggregating data information of a plurality of objects marked with the aggregation characteristic into one aggregate object, and storing the aggregate object in an HDD storage pool of the site.
According to the object storage system method without starting the grading characteristic, the files in mass storage data are in the unit of hundred million, and tens of billions of mass small files are stored in the same storage cluster according to a common file data storage mode. Such storage may have hundreds of billions of underlying objects. When the underlying storage object is higher, it can have an impact on the performance of the file system, recovery of failure scenario data, disk utilization, etc.
The hierarchical characteristics specifically comprise the steps of storing metadata information of objects and data information of the objects in an SSD storage pool of a slave station, marking the aggregation characteristics for the objects if the capacity value of the objects is smaller than a first preset value, not marking the aggregation characteristics for the objects if the capacity value of the objects is larger than or equal to the first preset value, and reading the data information of the objects without marking the aggregation characteristics from the SSD storage pool and writing the data information of the objects with the unlabeled aggregation characteristics into the HDD storage pool of the slave station. In this embodiment, the first preset value is set to 512k.
In this embodiment, when the storage cluster stores objects after turning on the hierarchy property, objects smaller than 512k are stored in the SSD storage pool, and objects larger than 512k are stored in the HDD storage pool.
The aggregation characteristic comprises the steps of reading data information of a plurality of objects, which are equal to a second preset value in number in an SSD storage pool and marked with the aggregation characteristic, according to a preset period, aggregating the data information of the plurality of objects into one aggregation object, storing the data information and the offset of the aggregation object in an HDD storage pool of a slave station, and deleting the data information of the plurality of objects in the SSD storage pool. In this embodiment, the second preset value is set to 1024.
The aggregation characteristic depends on the hierarchical characteristic, after the hierarchical aggregation characteristic is started, the storage cluster reads out the data information of the objects stored in the SSD storage pool, 1024 objects form an aggregation object, the aggregation object is written into the HDD storage pool, and then the data information of 1024 objects stored in the SSD storage pool is deleted.
FIG. 3 is a diagram showing the structure of the object storage in the website according to the method of the present invention. Each object contains two parts of information, namely data information data and metadata information xattr, wherein xattr records metadata information of the object, such as creation time, size, aggregation characteristics and other customized metadata information.
After the hierarchical aggregation characteristic is started, the object with the capacity value being more than or equal to 512k is stored in the site in the following process:
 And reading the data information of the object without marking the aggregation characteristic from the SSD storage pool and writing the data information into the HDD storage pool of the station.
After the hierarchical aggregation characteristic is started, the process of storing the object with the capacity value smaller than 512k in the site is as follows:
 Storing metadata information and data information of the object in an SSD storage pool of the site, and marking the object with an aggregation characteristic;
 according to a preset period, the data information of 1024 objects marked with aggregation characteristics in the SSD storage pool is read out, the data information of 1024 objects is aggregated into one aggregation object, the data information and the offset of the aggregation object are stored in the HDD storage pool of the slave station, and the data information of a plurality of objects in the SSD storage pool is deleted.
Thus, in the method of the present invention, the storage clusters of the secondary sites comprise SSD storage pools and HDD storage pools, and the storage clusters of the primary sites also comprise SSD storage pools and HDD storage pools.
The method for synchronizing the aggregate objects in the object storage system comprises the steps of obtaining bucket fragment information in data_log of a master site from a site, analyzing whether the objects need to be synchronized or not through index information of the objects in the bucket fragment information, reading the data information of the objects according to aggregation characteristics of the objects if the objects need to be synchronized, reading the data information of the objects from an SSD storage pool of the master site if the objects mark the aggregation characteristics, reading the metadata information of the objects from the SSD storage pool of the master site if the objects do not mark the aggregation characteristics, obtaining the data information and offset of the objects from an HDD storage pool of the master site through the metadata information of the objects, and returning the data information of the objects to a slave site by the master site, wherein the data information of the objects is synchronized to a storage cluster of the slave site according to a preset storage rule. Therefore, the method solves the problem that the synchronization characteristic and the aggregation characteristic are incompatible when the stations are synchronized in the object storage system. The invention provides a synchronization method for an aggregate object in an object storage system, which increases multi-site and aggregate compatibility, improves the applicability of storage products and improves the competitiveness of object storage.
Embodiment two:
 As shown in fig. 2, the synchronization system provided by the present invention is implemented based on the method, and the system includes:
 The request sending module is used for sending http requests and get requests;
 The main site is used for storing the object of the main site and the barrel fragment information in the data_log;
 The synchronization judging module is used for judging whether the objects on the main site need synchronization or not;
 A slave station for storing data information of an object to be synchronized to the slave station;
 The preset storage rule module is used for presetting storage rules of the objects in the master site and the slave site;
 and the object analysis module is used for analyzing index information in the barrel fragment information by reading the object in the master site and the barrel fragment information in the data_log through the request sending module and writing the object to be synchronized into the slave site.
In order to improve the read-write performance of object storage, index information of objects in a bucket is uniformly distributed on 128 bucket fragments according to a hash rule, so that the index information of the objects is stored in the bucket fragments.
The preset storage rule module comprises:
 a hierarchical property unit for storing the object in an SSD storage pool of the slave station, marking the object with an aggregate property;
 and an aggregation characteristic unit for aggregating data information of the plurality of objects marked with the aggregation characteristic into one aggregation object and storing the aggregation object in the HDD storage pool of the slave station.
The object storage system with the characteristic of no starting grading is characterized in that the files in mass storage data are in the unit of hundred million, and tens of billions of mass small files are stored in the same storage cluster according to a common file data storage mode. Such storage may have hundreds of billions of underlying objects. When the underlying storage object is higher, it can have an impact on the performance of the file system, recovery of failure scenario data, disk utilization, etc.
The hierarchical property unit is used for defining a hierarchical property, wherein the hierarchical property specifically comprises the steps of storing metadata information of an object and data information of the object in an SSD storage pool of a slave station;
 if the capacity value of the object is smaller than a first preset value, the aggregation characteristic is marked for the object, if the capacity value of the object is larger than or equal to the first preset value, the aggregation characteristic is not marked for the object, and the data information of the object without the aggregation characteristic is read out from an SSD storage pool and written into the HDD storage pool of the site. In this embodiment, the first preset value is set to 512k.
In this embodiment, when the storage cluster stores objects after turning on the hierarchy property, objects smaller than 512k are stored in the SSD storage pool, and objects larger than 512k are stored in the HDD storage pool.
The aggregation characteristic unit is used for defining an aggregation characteristic, and specifically comprises the steps of reading out data information of a plurality of objects, the quantity of which is equal to a second preset value, in an SSD storage pool according to a preset period, wherein the data information of the objects is marked with the aggregation characteristic, aggregating the data information of the objects into one aggregation object, storing the data information and offset of the aggregation object in the HDD storage pool of a slave station, and deleting the data information of the objects in the SSD storage pool. In this embodiment, the second preset value is set to 1024.
The aggregation characteristic depends on the grading characteristic, and after the grading aggregation characteristic is started, the working process of the preset storage rule module is that the storage cluster reads out the data information of the objects stored in the SSD storage pool, 1024 objects form an aggregation object, the aggregation object is written into the HDD storage pool, and then the data information of 1024 objects stored in the SSD storage pool is deleted.
After the hierarchical aggregation characteristic is started, the process of storing the object with the capacity value larger than or equal to 512k in the site by the preset storage rule module is that metadata information and data information of the object are stored in an SSD storage pool of the site, the aggregation characteristic is not marked for the object, and the data information of the object with the unmarked aggregation characteristic is read out from the SSD storage pool and written into an HDD storage pool of the site.
After the hierarchical aggregation characteristic is started, a preset storage rule module stores metadata information and data information of objects with capacity value smaller than 512k in an SSD storage pool of the site, the aggregation characteristic is marked for the objects, data information of 1024 objects marked with the aggregation characteristic in the SSD storage pool is read out according to a preset period, the data information of 1024 objects is aggregated into one aggregation object, the data information and offset of the aggregation object are stored in an HDD storage pool of the slave site, and the data information of a plurality of objects in the SSD storage pool is deleted.
The object analysis module includes:
 The object reading unit is used for reading metadata information of the objects in the main site, data information of the objects and barrel fragment information in the data_log according to the aggregation characteristics of the objects;
 an object writing unit for writing the object to be synchronized into the slave station;
 And an object analysis unit which analyzes index information in the bucket fragment information.
The process of reading the object by the object reading unit specifically comprises the following steps:
 If the object marks the aggregation characteristic, reading the data information of the object from an SSD storage pool of the master site;
 If the object is not marked with the aggregation characteristic, the metadata information of the object is read from the SSD storage pool of the master site, and the data information and the offset (the offset is the starting position of the data information of the object stored in the HDD storage pool) of the object are obtained from the HDD storage pool of the master site through the metadata information of the object.
The specific process of the work of the object analysis module is as follows:
 When a slave station sends an http request to a master station through a request sending module, an object reading unit obtains barrel fragment information in data_log of the master station;
 The object analysis unit analyzes index information in the barrel fragment information and sends the index information to the synchronous judgment module, and the synchronous judgment module judges whether the objects on the main site need to be synchronized or not;
 If synchronization is needed, the slave station sends a get request to the master station through a request sending module, an object of the master station is obtained, and an object reading unit reads metadata information of the object and data information of the object in the master station according to aggregation characteristics of the object;
 And the object writing unit writes the object which is read by the object reading unit and needs to be synchronized into the storage cluster of the slave station according to the grading characteristic and the aggregation characteristic which are defined by the preset storage rule module.
Thus, in the system of the present invention, the storage clusters of the secondary sites comprise SSD storage pools and HDD storage pools, and the storage clusters of the primary sites also comprise SSD storage pools and HDD storage pools.
The object synchronizing system in the object storage system comprises a request sending module, a master station, a synchronization judging module, a slave station, a preset storage rule module and an object analysis module, wherein the request sending module sends http requests and get requests, the master station stores the object of the master station and bucket fragment information in data_log, the synchronization judging module judges whether the object on the master station needs to be synchronized, the slave station stores data information of the object which needs to be synchronized to the slave station, the preset storage rule module presets storage rules of the object in the master station and the slave station, and the object analysis module requests the sending module to read the object in the master station and the bucket fragment information in data_log, analyze index information in the bucket fragment information and write the object which needs to be synchronized to the slave station. If synchronization is needed, the master site reads the data information of the object according to the aggregation characteristic of the object, if the object marks the aggregation characteristic, the data information of the object is read from an SSD storage pool of the master site, if the object does not mark the aggregation characteristic, the metadata information of the object is read from the SSD storage pool of the master site, and the data information and the offset of the object are obtained from an HDD storage pool of the master site through the metadata information of the object. And the master station returns the data information of the object to the slave station, and the slave station synchronizes the data information of the object to a storage cluster of the slave station according to a preset storage rule. The system solves the problem that the synchronization characteristic and the aggregation characteristic are incompatible when the stations are synchronized in the object storage system, increases the compatibility of multiple stations and aggregation, improves the applicability of storage products, and improves the competitiveness of object storage.
The foregoing is only a preferred embodiment of the present invention, and it will be apparent to those skilled in the art that numerous modifications and variations can be made without departing from the principles of the invention, and such modifications and variations are considered to be within the scope of the invention.