CN113946542A

Movatterモバイル変換

Info

Publication number: CN113946542A
Application number: CN202111050978.5A
Authority: CN
Inventors: 胡胜蓝; 柳永康; 吴剑
Original assignee: Alibaba China Co Ltd; Alibaba Cloud Computing Ltd
Current assignee: Alibaba China Co Ltd; Alibaba Cloud Computing Ltd
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2022-01-18

Abstract

Translated fromChinese

本说明书实施例提供数据处理方法以及装置，其中所述数据处理方法包括：接收针对分布式数据库中共享存储空间的缩容指令，其中，所述缩容指令中携带有缩容配置信息，根据所述缩容配置信息，确定所述共享存储空间中的待缩容存储区块以及目标存储区块，并将所述待缩容存储区块中的待迁移数据迁移至所述目标存储区块，根据迁移结果，建立所述待迁移数据的历史引用地址与迁移后的迁移存储地址间的映射关系，并对所述待迁移数据对应的索引节点号进行加锁处理，以响应所述缩容指令。

Embodiments of this specification provide a data processing method and device, wherein the data processing method includes: receiving a shrinking instruction for a shared storage space in a distributed database, wherein the shrinking instruction carries shrinking configuration information, and according to the the shrinking configuration information, determine the storage block to be reduced and the target storage block in the shared storage space, and migrate the data to be migrated in the storage block to be reduced to the target storage block, According to the migration result, establish a mapping relationship between the historical reference address of the data to be migrated and the migrated storage address after migration, and lock the inode number corresponding to the data to be migrated to respond to the shrinking instruction .

Description

Data processing method and device

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a data processing method.

Background

In the field of big data, distributed computing is generally used for large-scale parallel processing, and big data is not only reflected in huge input data amount, but also often huge intermediate data amount. At present, a big data cluster system usually adopts common commercial servers, and each server has a plurality of physical disks.

In order to fully use the kernel characteristics of the file system or be limited by the used third-party local library, the threads or processes of the distributed computing framework often directly use the local disk to store intermediate data, and in order to save the storage resources of the disk, the threads or processes need to be subjected to capacity reduction processing. Currently, an offline capacity reduction method is usually adopted, and the offline capacity reduction requires a relatively long time to interrupt a project, which affects the operation speed and the operation efficiency of the project.

Disclosure of Invention

In view of this, the embodiments of the present specification provide a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve the technical deficiencies of the prior art.

According to a first aspect of embodiments herein, there is provided a data processing method including:

receiving a capacity reduction instruction aiming at a shared storage space in a distributed database, wherein the capacity reduction instruction carries capacity reduction configuration information;

determining a storage block to be subjected to capacity reduction and a target storage block in the shared storage space according to the capacity reduction configuration information, and transferring data to be transferred in the storage block to be subjected to capacity reduction to the target storage block;

and according to the migration result, establishing a mapping relation between the historical reference address of the data to be migrated and the migrated storage address, and locking an index node number corresponding to the data to be migrated so as to respond to the capacity reduction instruction.

Optionally, the determining, according to the capacity reduction configuration information, a to-be-reduced storage block and a target storage block in the shared storage space includes:

determining the size of a target space of the shared storage space according to the capacity reduction configuration information carried in the capacity reduction instruction;

determining the number of storage blocks to be subjected to capacity reduction in the shared storage space according to the size of the target space and the size of a preset storage block corresponding to the shared storage space;

and determining the storage blocks to be subjected to capacity reduction and the target storage blocks in the shared storage space according to the number of the storage blocks to be subjected to capacity reduction.

Optionally, the migrating the data to be migrated in the memory block to be capacity reduced to the target memory block includes:

and calling data read-write nodes in the distributed database, and transferring the data to be transferred in the storage block to be subjected to capacity reduction to the target storage block.

migrating the metadata corresponding to the file to be migrated in the storage block to be subjected to capacity reduction to the target storage block;

correspondingly, after the data to be migrated in the storage block to be subjected to capacity reduction is migrated to the target storage block, the method further includes:

and constructing a to-be-processed transaction based on the migration result of the metadata, and persistently storing the to-be-processed transaction to a log file of the shared storage space.

Optionally, after the migrating the data to be migrated in the storage block to be capacity reduced to the target storage block, the method further includes:

and constructing a transaction to be processed based on the migration result of the data to be migrated, and persistently storing the transaction to be processed to a log file of the shared storage space.

Optionally, the data processing method further includes:

calling a data read-only node in the distributed database, acquiring and analyzing the to-be-processed transaction in the log file, and acquiring a migration result of the to-be-migrated data;

and according to the migration result, establishing a mapping relation between the historical reference address of the data to be migrated and the migrated storage address after migration.

creating a data migration thread based on the capacity reduction instruction, calling the data migration thread, and migrating the data to be migrated in the storage block to be subjected to capacity reduction to the target storage block;

correspondingly, the establishing a mapping relationship between the history reference address of the data to be migrated and the migrated storage address according to the migration result includes:

and acquiring a historical reference address of the data to be migrated in the address storage thread, and establishing a mapping relation between the historical reference address and a migration storage address of the data to be migrated according to a migration result.

Optionally, the data processing method further includes:

receiving a data access request aiming at the data to be migrated;

determining a data migration state of the data to be migrated;

if the migration is completed, determining whether the data reference type associated with the data to be migrated is a target data reference type;

if yes, obtaining a mapping relation between the historical reference address of the data to be migrated and the migrated storage address, and obtaining the data to be migrated based on the migrated storage address in the mapping relation and returning.

Optionally, the data processing method further includes:

receiving a data access request aiming at the data to be migrated;

determining a node identifier of an index node corresponding to the data to be migrated, and determining a migration state of the data to be migrated according to the node identifier;

if the migration is in progress, the result of the access failure is returned.

detecting a data reference result of the target data reference type;

and under the condition that the data reference result is determined to be the end reference, deleting the mapping relation, and releasing the storage block to be reduced and the metadata corresponding to the file to be migrated in the storage block to be reduced.

According to a second aspect of embodiments herein, there is provided a data processing apparatus comprising:

the system comprises a receiving module, a capacity reduction module and a capacity reduction module, wherein the receiving module is configured to receive a capacity reduction instruction aiming at a shared storage space in a distributed database, and the capacity reduction instruction carries capacity reduction configuration information;

the migration module is configured to determine a storage block to be subjected to capacity reduction and a target storage block in the shared storage space according to the capacity reduction configuration information, and migrate data to be migrated in the storage block to be subjected to capacity reduction to the target storage block;

and the establishing module is configured to establish a mapping relation between the historical reference address of the data to be migrated and the migrated storage address according to the migration result, and lock an index node number corresponding to the data to be migrated so as to respond to the capacity reduction instruction.

According to a third aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any one of the data processing methods.

According to a fifth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the above-mentioned data processing method.

In an embodiment of the present specification, a capacity reduction instruction for a shared storage space in a distributed database is received, where the capacity reduction instruction carries capacity reduction configuration information, a storage block to be subjected to capacity reduction and a target storage block in the shared storage space are determined according to the capacity reduction configuration information, data to be migrated in the storage block to be subjected to capacity reduction is migrated to the target storage block, a mapping relationship between a history reference address of the data to be migrated and a migrated storage address after migration is established according to a migration result, and an index node number corresponding to the data to be migrated is locked to respond to the capacity reduction instruction.

In the embodiment of the present description, in the case of performing data migration to implement capacity reduction, by establishing a mapping relationship between storage addresses before and after data migration to be migrated, on one hand, it can be ensured that data migration can be performed to perform capacity reduction without interrupting the execution process of the project, and on the other hand, correct access to data can be performed through the mapping relationship after data migration, so that it is beneficial to ensure the operation speed of the project, improve the operation efficiency of the project, and save storage resources of a shared storage space.

Drawings

FIG. 1 is a flow chart of a data processing method provided by an embodiment of the present description;

FIG. 2 is a flow chart of a data processing method according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present specification;

fig. 4 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

PBD: the shared storage virtual device is similar to a magnetic disk device, and allows the same virtual device to be mounted on different physical machines, so that the function of sharing and accessing the same PBD by the different physical machines is realized.

POLARSTORE: the implementation of the PBD bottom layer provides backup and disaster recovery capabilities for the data stored on the PBD.

chunk: the linear address space involved by one PBD is divided into a plurality of regions with the same size, each region is called a chunk, the data set stored in the linear address space contained in each chunk is the minimum unit for disaster tolerance and migration of POLARSTORE, and the division of the chunk is according to the granularity of 10G at present.

PFS (POLAR film system): a general file system implemented based on PBD serves distributed databases. The system can run on a PBD provided by the POLARSTORE, which is called PFS on POLARSTORE, or run on a DISK of a common physical machine, which is called PFS on DISK, or run on a cloud DISK provided by a common manufacturer, which is called PFS on ESSD.

In the present specification, a data processing method is provided, and the present specification relates to a data processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.

The PFS only supports online capacity expansion at present and does not support online capacity reduction. For the PFS on polar, once the PBD is expanded, the number of contained chunks cannot be reduced, which wastes the management resources related to chunks. For PFS on DISK or PFS on ESSD, 10G of DISK device space is actually occupied regardless of whether chunk has write data or not, so that the DISK device space required to be used by a user must be configured according to the maximum requirement all the time, and for users with large storage space fluctuation or continuously reduced, much DISK device space which is not required to be used is wasted.

Most of file systems only support offline capacity reduction in the aspect of capacity reduction and do not support online capacity reduction, the offline capacity reduction needs to interrupt the project execution process for a long time, the online capacity reduction does not need to interrupt the project execution process, and the influence on the project is very small. In addition, data/metadata migration during the scalping process often encounters modification problems with long-life references and short-life references in memory, where the short-life references may not be modifiable.

Based on this, an embodiment of the present specification provides an online capacity reduction data processing method, where a capacity reduction instruction for a shared storage space in a distributed database is received, where the capacity reduction instruction carries capacity reduction configuration information, a storage block to be subjected to capacity reduction and a target storage block in the shared storage space are determined according to the capacity reduction configuration information, data to be migrated in the storage block to be subjected to capacity reduction is migrated to the target storage block, a mapping relationship between a history reference address of the data to be migrated and a migrated storage address is established according to a migration result, and an index node number corresponding to the data to be migrated is locked to respond to the capacity reduction instruction.

Fig. 1 shows a flowchart of a data processing method according to an embodiment of the present specification, which specifically includes the following steps.

102, receiving a capacity reduction instruction for a shared storage space in a distributed database, wherein the capacity reduction instruction carries capacity reduction configuration information.

Specifically, the data processing method provided in the embodiments of the present specification is used to implement online capacity reduction of a shared storage space of a distributed database, that is, capacity reduction of a storage space can be implemented without interrupting an execution process of a related item.

Since data in the distributed database generally needs to be stored in a storage space with data storage capability, the distributed database in the embodiment of the present specification is a relational database implemented based on shared storage, and thus, the storage space is the shared storage space.

In practical applications, the shared storage space includes, but is not limited to, a magnetic disk, a cloud disk, and the like, in the embodiments of the present description, the shared storage space is taken as a magnetic disk as an example, a file system is a format for storing information on a hard disk, the magnetic disk may be divided into one or more partitions, and each partition includes one file system.

In the embodiment of the present specification, the disk is divided into a plurality of chunks (chunks) according to a granularity of 10G, each chunk contains a linear space of 10G size, and each chunk stores metadata and data according to a certain rule.

Because the file system only supports online expansion or offline capacity reduction at present, but not online capacity reduction. For PFS on policy, once the disk is expanded, the number of chunks contained in the disk cannot be reduced, which wastes management resources related to chunks. For a PFS on DISK or a PFS on ESSD, the DISK device space of 10G is actually occupied regardless of whether chunk has write data or not, so that the DISK device space that a user needs to use must be configured according to the maximum requirement all the time, and for a user whose storage space fluctuates greatly or continuously decreases, much DISK device space that does not need to be used is wasted, so that in order to overcome such problems, the DISK can be subjected to capacity reduction processing under the condition that capacity reduction is required.

The capacity reduction process is a process of reducing the capacity of a shared storage space, most of the existing disks only support offline capacity reduction in the capacity reduction aspect and do not support online capacity reduction, and the offline capacity reduction requires a long time to interrupt a project, so that the running speed and the running efficiency of the project are influenced.

Therefore, in the embodiments of the present specification, a manner of migrating data in a part of memory blocks, which are sorted later, in a shared memory space to a memory block, which is sorted earlier and has a free space, is adopted to achieve the purpose of performing capacity reduction on the shared memory space.

Further, the capacity reduction configuration information may be a target size to which the shared storage space needs to be reduced.

The distributed database in the embodiment of the present specification includes a control module, where the control module, when determining that a shared storage space used by a certain instance in the distributed database can be subjected to capacity reduction, may notify the distributed database to start capacity reduction, and transmit a target size to which the shared storage space needs to be reduced to the distributed database, that is, the distributed database receives a capacity reduction instruction for the shared storage space in the distributed database, where the instruction carries the capacity reduction configuration information.

And step 104, determining a storage block to be subjected to capacity reduction and a target storage block in the shared storage space according to the capacity reduction configuration information, and migrating the data to be migrated in the storage block to be subjected to capacity reduction to the target storage block.

Specifically, after receiving the capacity reduction instruction, the capacity reduction processing may be performed on the shared storage space based on the capacity reduction configuration information carried in the capacity reduction instruction, and specifically, the to-be-reduced storage block and the target storage block in the shared storage space may be determined according to the capacity reduction configuration information, and then the to-be-migrated data in the to-be-reduced storage block is migrated to the target storage block.

The storage blocks to be reduced are storage blocks which need to be reduced, and the target storage blocks are other storage blocks which have free storage spaces in the shared storage space except the storage blocks to be reduced. And carrying out capacity reduction processing on the storage block to be subjected to capacity reduction, namely transferring the data in the storage block to be subjected to capacity reduction to the target storage block.

In specific implementation, the storage block to be reduced and the target storage block in the shared storage space are determined according to the reduction configuration information, which can be specifically implemented in the following manner:

Specifically, after the capacity reduction instruction is received, the capacity reduction instruction carries capacity reduction configuration information, and the capacity reduction configuration information may be a target size to which the shared storage space needs to be reduced, so that the target space size of the shared storage space, that is, the size of the shared storage space after capacity reduction, can be directly determined according to the capacity reduction configuration information, and then the number of the storage blocks needing capacity reduction in the shared storage space is determined according to the partition granularity of the storage blocks in the shared storage space and the size of the shared storage space after capacity reduction, so as to determine which part of the storage blocks in the shared storage space is the storage block to be reduced and which part is the target storage block according to the number.

In practical application, the storage blocks to be subjected to capacity reduction and the target storage block may be determined according to a sequence of the storage blocks in the shared storage space, for example, in this embodiment of the present specification, the disk divides the storage blocks according to a granularity of 10G, the disk includes 10 storage blocks before capacity reduction, and after receiving a capacity reduction instruction, the target size of the shared storage space is determined to be 70G according to capacity reduction configuration information, so that it is determined that the storage blocks in the shared storage space need to be reduced to 7, and therefore, according to the sequence of the 10 storage blocks in the shared storage space, the 7 storage blocks in the top of the sequence may be determined to be the target storage blocks, and the 3 storage blocks in the bottom of the sequence may be determined to be the storage blocks to be subjected to capacity reduction, so as to migrate data in the storage blocks to be subjected to capacity reduction to the target storage blocks.

Further, the data to be migrated in the storage block to be reduced is migrated to the target storage block, specifically, a data read-write node in the distributed database is called, and the data to be migrated in the storage block to be reduced is migrated to the target storage block.

Specifically, as described above, the distributed database in the embodiment of the present specification is a relational database implemented based on shared storage, and the database of this type generally includes a data read-write node and a data read-only node, and generally has only one data read-write node, and can be used for reading data and writing data, and there are a plurality of data read-only nodes, which can only read data, but the stored data accessed by the data read-write node and the data read-only node is the same, which is the meaning of shared storage.

Therefore, when the capacity reduction processing needs to be performed on the shared storage space, the data read-write node may be invoked to perform migration processing on the data to be migrated in the storage blocks to be subjected to capacity reduction, that is, data in a plurality of storage blocks to be subjected to capacity reduction, which are sorted later in the shared storage space, are migrated to a target storage block, which is sorted earlier in the shared storage space. Specifically, a data read-write instance of the distributed database may be used to call an interface PFS _ mount _ shrinkfs _ start of the PFS to trigger a PFS component inside the instance to start capacity reduction and migrate data, and a prompt message that data migration has started is returned to the management and control module.

In addition, after the number of the storage blocks to be subjected to capacity reduction in the shared storage space is determined, the number of the storage blocks in the shared storage space after capacity reduction can be determined, then the number of the storage blocks in the shared storage space after capacity reduction can be stored in a certain position of a first storage block of the shared storage space in a persistent mode, the capacity reduction state of the shared storage space is maintained, then new metadata allocation of the storage blocks at the tail end of the shared storage space to be subjected to capacity reduction is forbidden, namely, metadata is not allocated to files from the storage blocks to be subjected to capacity reduction at the tail end of the shared storage space any more.

In addition, the data to be migrated in the storage block to be reduced is migrated to the target storage block, and specifically, the metadata corresponding to the file to be migrated in the storage block to be reduced can be migrated to the target storage block;

correspondingly, after the data to be migrated in the storage block to be capacity reduced is migrated to the target storage block, the method further includes:

Specifically, each storage block stores metadata and data according to a certain rule, and the metadata is mainly information describing a data property (property) and is used for supporting functions such as indicating a storage location of a file or data, history data, resource lookup, file recording and the like. Therefore, the data to be migrated actually comprises two parts, one part is the migration of the metadata, and the other part is the migration of the data; and migrating the metadata, namely migrating the metadata corresponding to the file to be migrated in the storage block to be subjected to capacity reduction to a target storage block.

In addition, after a mapping relationship is generated in a transaction of migrating metadata of a data read-write node, the mapping relationship can be persisted to a journal file, specifically, when a PFS module in the data read-write node modifies a file in a storage block according to a certain operation, a to-be-processed transaction is constructed, including migration and modification of all related metadata of the operation, and then the content of the to-be-processed transaction is persisted to a log file (journal file) of a shared storage block.

Because the to-be-processed transaction includes the mapping relationship before and after the metadata migration, that is, the mapping relationship between the storage address before the migration and the storage address after the migration, the modification of the metadata can be synchronized between the PFS module of the data read-write node and the PFS module of the data read-only node through the log file, so that the latest metadata of a certain file can be accessed on the data read-only node after the file is modified on the data read-write node.

Or, if the data is migrated, similarly, after migrating the data to be migrated in the memory block to be subjected to capacity reduction to the target memory block, the method further includes:

Specifically, the PFS module of the data read-write node also constructs a to-be-processed transaction when migrating data, and then persists modified data related to the to-be-processed transaction to a journal file.

After the content of the transaction to be processed is persisted to a log file of the shared storage block, the PFS module of the data read-write node and the PFS module of the data read-only node can synchronize the modification of the data through the log file, so that after the data is migrated on the data read-write node, the data can be accessed on the data read-only node according to the mapping relation.

And 106, establishing a mapping relation between the historical reference address of the data to be migrated and the migrated storage address according to the migration result, and locking an index node number corresponding to the data to be migrated so as to respond to the capacity reduction instruction.

Specifically, the migration process for each data to be migrated involves migrating the data or metadata of the historical storage address to a new storage address, and before migrating the data to be migrated, there may already be a reference to the data to be migrated. Because the storage address (reference address) of the data to be migrated changes after migration, the reference of the part of the data to be migrated needs to be modified synchronously, and in order to ensure that a normal data access request of the distributed database can access correct data or metadata after data migration, an embodiment of the present specification may determine a historical storage address of the data to be migrated before migration and a new migration storage address after migration, and establish a mapping relationship between the historical storage address and the migration storage address of the data to be migrated, so as to implement online capacity reduction of a shared storage space.

In addition, the index node in the embodiment of this specification may be an i-node, that is, an inode, where an index node number is a node number corresponding to the inode.

The locking is to protect the normal access of the data to be migrated, so that in the data migration process, the index node number corresponding to the data to be migrated may be locked. In practical applications, the types of locking are classified into three types: rdlock (read), wrlock (write), xlock (mutual exclusion), and the locking type adopted in the embodiment of the present specification is xlock (mutual exclusion), and the transaction of the migration metadata or the migration data after locking is mutually exclusive with the transaction of the relevant I/O request in the distributed database, that is, when the data to be migrated is migrated and the migration is not completed, the access to the data to be migrated cannot be realized, so as to prevent the access to the wrong data. Thus, in the embodiments of the present specification, each time one metadata/data is migrated, inode-level fine-grained locks are used to make the migration transaction and the normal data access transaction mutually exclusive.

Further, after the to-be-processed transaction is persisted to the log file, the data read-only node in the distributed database may acquire the to-be-processed transaction from the log file, and implement synchronization with data of the data read-write node based on information included in the to-be-processed transaction, which may be specifically implemented in the following manner:

Specifically, the data read-write node migrates the data to be migrated to a target storage block, generates a mapping relationship between storage addresses before and after the migration of the data to be migrated based on a migration result of the data to be migrated, then constructs a transaction to be processed based on the mapping relationship, and persists the transaction to be processed to a log file, the distributed database can analyze the transaction to be processed by using a data read-only node to obtain a migration result of the data to be migrated, the migration result includes the storage addresses before and after the migration of the data to be migrated, and after the data read-only node obtains the analysis result, the data read-only node can establish the mapping relationship between a history storage address of the data to be migrated and the migrated storage address to achieve synchronization between information of the data read-only node and the data read-write node, thereby achieving online capacity reduction of a shared storage space.

In addition, migrating the data to be migrated in the memory block to be capacity reduced to the target memory block includes:

Specifically, after receiving the capacity reduction instruction, the data read-write node creates a data migration thread for processing a capacity reduction task, including migrating metadata and data, that is, calling the data migration thread to migrate data to be migrated in a storage block to be subjected to capacity reduction to a target storage block, and after data migration, establishing a mapping relationship between a history reference address of the data to be migrated and a new migrated storage address after migration, so as to implement synchronization between information of the data read-only node and the data read-write node, thereby implementing online capacity reduction of a shared storage space.

Further, after data migration is performed, a data access request for the data to be migrated may be received;

determining a data migration state of the data to be migrated;

Specifically, the data reference types of the data to be migrated include, but are not limited to, strong reference, soft reference, weak reference, or phantom reference, and the strong reference is a normal reference, and the lifetime is longest, and the soft reference lifetime is shorter than the strong reference, and the weak reference lifetime is shorter than the soft reference. If the four data reference types are further divided according to the life cycle, the strong reference can be divided into long life cycle reference, the soft reference, the weak reference and the phantom reference can be divided into short life cycle reference, and the target data reference type is the short life cycle reference.

When data to be migrated is migrated, because a storage address of the migrated data to be migrated changes, reference of the data needs to be modified synchronously, but because a short life cycle reference is usually a temporary variable on a certain thread stack, the data cannot be modified by a capacity reduction thread in the process of data migration.

In the embodiment of the present specification, a manner of introducing a mapping relationship between storage addresses before and after data migration is considered, and when data to be migrated needs to be referred by using a data reference type of short lifetime reference, a new storage address of the data to be migrated may be determined according to the mapping relationship, so that correct access to the data to be migrated that has short lifetime reference may still be achieved based on the mapping relationship after data migration, and the short lifetime reference does not need to be recovered, and an operation process of an item does not need to be interrupted.

In addition, the data reference type is referred to for the long life cycle of the data to be migrated, the history reference address referred by the data reference type can be synchronously modified into a new migration storage address in the data migration process, and after the migration is finished, correct access of the data can be directly realized according to the modification result of the history reference address. In practical application, however, the mapping relationship between the storage addresses before and after the migration of the data to be migrated, which has long life cycle reference, can still be established, so as to ensure that after the data migration, the normal data access request of the distributed database can access the correct data or metadata.

Further, in the process of data migration, if an access request for the data to be migrated is received, the data migration state of the data to be migrated may be determined, if the data migration is completed, and under the condition that it is determined that the data reference type associated with the data to be migrated is a short-life cycle reference, the mapping relationship between the history reference address of the data to be migrated and the migration storage address is obtained, and the data to be migrated is obtained based on the migration storage address in the mapping relationship and returned.

Or after data migration, receiving a data access request for the data to be migrated;

if the migration is in progress, the result of the access failure is returned.

Specifically, the index node may be an inode.

In the embodiment of the present specification, after receiving a data access request for data to be migrated, a node identifier (node number) of an i node corresponding to the data to be migrated may be determined, and a migration state of the data to be migrated is determined according to the node identifier, and if the data to be migrated is a current data, an access failure result is returned.

In practical application, after a user inputs a node identifier of an inode, the user is allowed to open the inode only after finding the node identifier in an inode table for data access, and if the data migration state of the data to be migrated, which is accessed by the user, is migration, the node identifier cannot be found in the inode table, so that normal data access cannot be performed, and a result of access failure can be returned.

In addition, after the data to be migrated in the memory block to be condensed is migrated to the target memory block, the method further includes:

detecting a data reference result of the target data reference type;

Specifically, the management and control module may poll a capacity reduction state of the shared storage space using a PFS tool, and if it is determined that the capacity reduction is completed, may wait for a data reference process including short life cycle reference to be completed to avoid subsequent reading and writing of data to an erroneous location, then delete the mapping relationship, change the number of storage blocks of the shared storage space to the number of storage blocks after the capacity reduction, then change the state of the shared storage space to a non-capacity reduction state, and release metadata in a memory corresponding to the storage block to be migrated.

Further, after determining that the capacity reduction is completed, the management and control module may notify the data read-only node in the distributed database that the capacity reduction is completed or cancelled, and the data read-only node invokes the capacity reduction ending work inside the PFS interface PFS _ mount _ shrinkfs trigger instance, including that the data reference process of the short life cycle reference is completely completed, the mapping relationship is destroyed, and the metadata in the memory corresponding to the storage block to be subjected to capacity reduction is released.

Then, the management and control module can notify the underlying POLARSTORE to release the storage space of the storage block to be migrated, so as to achieve the purpose of capacity reduction of the shared storage space.

The following will further describe the data processing method by taking the application of the data processing method provided in this specification to a hard disk capacity reduction as an example with reference to fig. 2. Fig. 2 shows a flowchart of a processing procedure of a data processing method according to an embodiment of the present specification, which specifically includes the following steps.

Step 202, the management and control module sends a capacity reduction instruction to the information processing module aiming at the hard disk 1.

And step 204, the information processing module calls the data read-write node to perform capacity reduction processing.

Step 206, the data read-write node determines a storage block to be subjected to capacity reduction and a target storage block in the hard disk 1 according to capacity reduction configuration information carried in the capacity reduction instruction, and transfers the data to be transferred in the storage block to be subjected to capacity reduction to the target storage block.

And step 208, the data read-write node sends a prompt message that data migration has started to the management and control module.

And step 210, the data read-write node establishes a mapping relation between the history reference address of the data to be migrated and the migrated storage address according to the migration result.

In step 212, the policing module polls whether the migration is complete.

And step 214, deleting the mapping relation and releasing the metadata corresponding to the storage block to be reduced and the file to be migrated in the storage block to be reduced under the condition that the data reference result referred by the short life cycle is determined to be end reference by the data read-write node.

In step 216, the management and control module polls whether the migration is completed.

After the migration is complete, step 218 is performed.

In step 218, the management and control module sends a migration completion prompt message to the information processing node.

Step 220, the information processing module calls the data read-only node to establish a mapping relation between the history reference address of the data to be migrated and the migration storage address of the migrated data to be migrated.

In step 222, the data read-only node deletes the mapping relationship and releases the to-be-reduced storage block and the metadata corresponding to the to-be-migrated file in the to-be-reduced storage block when determining that the data reference result referred by the short life cycle is the end reference.

In step 224, the data read-only node sends a prompt message for releasing the storage block to the management and control module.

In step 226, the management and control module releases the memory blocks.

Corresponding to the above method embodiment, this specification further provides a data processing apparatus embodiment, and fig. 3 shows a schematic structural diagram of a data processing apparatus provided in an embodiment of this specification. As shown in fig. 3, the apparatus includes:

a receiving module 302, configured to receive a capacity reduction instruction for a shared storage space in a distributed database, where the capacity reduction instruction carries capacity reduction configuration information;

a migration module 304, configured to determine a storage block to be subjected to capacity reduction and a target storage block in the shared storage space according to the capacity reduction configuration information, and migrate data to be migrated in the storage block to be subjected to capacity reduction to the target storage block;

the establishing module 306 is configured to establish a mapping relationship between the history reference address of the data to be migrated and the migrated storage address according to the migration result, and lock an inode number corresponding to the data to be migrated to respond to the capacity reduction instruction.

Optionally, the migration module 304 is further configured to:

correspondingly, the data processing apparatus further includes a first building module configured to:

Optionally, the data processing apparatus further includes a second building module configured to:

Optionally, the data processing apparatus further includes a calling module configured to:

Optionally, the migration module 304 is further configured to:

accordingly, the establishing module 306 is further configured to:

Optionally, the data processing apparatus further includes a first receiving module configured to:

receiving a data access request aiming at the data to be migrated;

determining a data migration state of the data to be migrated;

Optionally, the data processing apparatus further includes a second receiving module configured to:

receiving a data access request aiming at the data to be migrated;

if the migration is in progress, the result of the access failure is returned.

Optionally, the data processing apparatus further includes a releasing module configured to:

detecting a data reference result of the target data reference type;

The above is a schematic configuration of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method.

FIG. 4 illustrates a block diagram of a computing device 400 provided in accordance with one embodiment of the present description. The components of the computing device 400 include, but are not limited to, a memory 410 and a processor 420. Processor 420 is coupled to memory 410 via bus 430 and database 450 is used to store data.

Computing device 400 also includes access device 440, access device 440 enabling computing device 400 to communicate via one or more networks 460. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 440 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 400, as well as other components not shown in FIG. 4, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 4 is for purposes of example only and is not limiting as to the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 400 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 400 may also be a mobile or stationary server.

Wherein the processor 420 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the data processing method described above.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.

An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor implement the steps of the data processing method described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the data processing method.

The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the data processing method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method of data processing, comprising:

2. The data processing method according to claim 1, wherein the determining, according to the capacity reduction configuration information, a to-be-reduced storage block and a target storage block in the shared storage space comprises:

3. The data processing method according to claim 1 or 2, wherein the migrating the data to be migrated in the memory block to be capacity reduced to the target memory block comprises:

4. The data processing method according to claim 1, wherein the migrating the data to be migrated in the memory block to be capacity reduced to the target memory block comprises:

5. The data processing method according to claim 1, further comprising, after the migrating the data to be migrated in the memory block to be subjected to capacity reduction to the target memory block:

6. The data processing method of claim 4 or 5, further comprising:

7. The data processing method according to claim 1 or 2, wherein the migrating the data to be migrated in the memory block to be capacity reduced to the target memory block comprises:

8. The data processing method of any of claims 1 to 5, further comprising:

receiving a data access request aiming at the data to be migrated;

determining a data migration state of the data to be migrated;

9. The data processing method of any of claims 1 to 5, further comprising:

receiving a data access request aiming at the data to be migrated;

if the migration is in progress, the result of the access failure is returned.

10. The data processing method according to any one of claims 1 to 5, further comprising, after the migrating the data to be migrated in the storage block to be debugged to the target storage block:

detecting a data reference result of the target data reference type;

11. A data processing apparatus comprising:

12. A computing device, comprising:

a memory and a processor;

the memory is for storing computer-executable instructions and the processor is for executing the computer-executable instructions, which when executed by the processor implement the steps of the data processing method of any one of claims 1 to 10.

13. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the data processing method of any one of claims 1 to 10.

14. A computer program for causing a computer to carry out the steps of the data processing method according to any one of claims 1 to 10 when the computer program is carried out in the computer.