Detailed Description
In order to ensure the reliability of data requested to be written by a terminal, the distributed system generally needs to write data to be written into a master node and write the data to be written into a plurality of standby nodes of the master node. At present, the method of maintaining strong consistency is adopted in the data writing method in the distributed system, that is, the data is required to be written successfully on each node, and then the request for writing data is completed, and the writing success is fed back to the corresponding terminal.
Fig. 1 is a schematic diagram of a process for writing data in a specific distributed system. As shown in fig. 1, the scenario may specifically include: a distributed system 10 and a terminal 20, wherein the distributed system 10 comprises a main node 100, a standby node 101 and a standby node 102. To maintain strong consistency, the specific process of writing data may be: firstly, the terminal 20 sends a data writing request to the master node 100 in the distributed system 10, wherein the data writing request carries data x to be written; the master node 100 writes data x on itself, and writes the data x into the standby node 101 and the standby node 102, respectively; then, the standby node 101 feeds back a response message a to the master node 100 after the data x is successfully written, the standby node 102 feeds back a response message b to the master node 100 after the data x is successfully written, and only when the master node 100 receives the response message a and the response message b, the data x is regarded as successfully written in the distributed system 10, so that the successful writing is triggered to be fed back to the terminal 20.
However, the operations such as disk refreshing often occupy a large amount of resources of the CPU, and even cause network jitter, so that some nodes in the distributed system may have performance degradation in a short time, and at this time, once data needs to be written on the performance-degraded nodes, for example: if the performance of the standby node 101 in fig. 1 is degraded, the time required for the master node 100 to write data x on the standby node 101 is far longer than the time required for the master node 100 to write data x on the standby node 102 with normal performance, and then the master node 100 waits for a long time to receive the feedback of the standby node 101 after receiving the feedback of the standby node 102, so as to inform the terminal 20 that the data x is successfully written at this time. That is, since the write efficiency of the standby node with reduced performance is low, it is difficult to complete data writing quickly and efficiently.
Based on this, an embodiment of the present application provides a method for writing data in a distributed system, where when a master node writes data to be written into a plurality of standby nodes corresponding to the master node, the master node may determine a standby node that needs to wait for a feedback response message (that is, a standby node whose performance is not lower than a set performance threshold) by judging whether performance of each standby node is lower than the set performance threshold, and trigger the master node to successfully feed back the data to a terminal by using the feedback response message of the standby node that needs to wait for the feedback response message; and ignoring whether a standby node that does not meet the set performance threshold (i.e., a standby node with performance below the set performance threshold) feeds back a response message. Therefore, even if the performance of some standby nodes is reduced due to the problems of network jitter and the like, the master node can determine the standby nodes with the reduced performance through judgment, so that the feedback of the standby nodes can be temporarily ignored, and only the feedback of the standby nodes with good performance is considered, so that the problems of long time and underground efficiency of successful feedback writing of the whole distributed system due to slow feedback of the standby nodes with low performance are effectively solved, and the data writing efficiency of the distributed system is improved.
For example, still taking the scenario shown in fig. 1 as an example, assuming that the terminal 20 is to write data x to the distributed system 10 including the master node 100, the standby node 101, and the standby node 102, the method provided in the embodiment of the present application may specifically be: firstly, the terminal 20 sends a data writing request to the master node 100 in the distributed system 10, wherein the data writing request carries data x to be written; the master node 100 writes data x on itself, and writes the data x into the standby node 101 and the standby node 102, respectively; then, the master node 100 compares the performance of the standby node 101 and the performance of the standby node 102 with the set performance threshold, and finds that the performance of the standby node 101 is lower than the set performance threshold, but the performance of the standby node 102 is not lower than the set performance threshold, ignores whether the standby node 101 sends a response message a indicating that the data x is successfully written to the master node 100, and as long as a response message b fed back to the master node 100 by the standby node 102 is received, the master node 100 can feed back the success of writing to the terminal 20.
It can be understood that, in the process of writing data x, it takes 10 milliseconds for the dummy device node 101 to successfully write data x due to network jitter; the standby node 102 with normal performance can successfully write the data x only in 1 millisecond, and then the master node 100 can feed back successful writing to the terminal 10 at least in 10 milliseconds by adopting a method for keeping strong consistency; however, by adopting the method provided by the embodiment of the present application, regardless of whether the backup node 101 feeds back, the master node 100 can feed back success of writing to the terminal 10 only after 1 millisecond, which greatly improves the efficiency of writing data.
It should be noted that the master node 100, the standby node 101, and the standby node 102 in the distributed system 10 may be deployed in three mutually independent devices; any two nodes can be deployed in the same device, and the rest other nodes can be deployed in independent devices; all three nodes may also be deployed in the same device, which is not specifically limited in this embodiment of the present application. The terminal 20 may be any terminal that can communicate with the distributed system 10, and in particular, the terminal 20 may write data in the distributed system 10, and in some examples, the terminal 20 may specifically be: smart phones, laptop personal computers, desktop personal computers, tablet computers, and the like.
It is to be understood that the above scenario is only one example of a scenario provided in the embodiment of the present application, and the embodiment of the present application is not limited to this scenario.
A specific implementation manner of a method and an apparatus for writing data in a distributed system in the embodiments of the present application is described in detail below with reference to the accompanying drawings.
Referring to fig. 2, a flowchart of a method for writing data in a distributed system in an embodiment of the present application is shown. The data writing method provided in this embodiment may be applied to the scenario corresponding to fig. 1, for example, and then the data writing method in the distributed system may specifically include:
step 201, a master node receives a data writing request, where the data writing request includes data to be written.
It is understood that the data writing request may be a message sent by the terminal to the distributed system, and is used to request that data to be written is written in the distributed system. The data to be written may be carried in the write data request, and the data to be written is obtained by analyzing the write data request.
As an example, the data writing request may be directly sent by the terminal to the master node of the distributed system, and then, when receiving the data writing request, the master node may parse the data writing request, obtain the data to be written, and then store the data to be written into the master node.
As another example, the data writing request may also be sent by a terminal to any node of the distributed system, and then the node that receives the data writing request may acquire the master node corresponding to the data writing request and send the data writing request to the master node, and then the master node may obtain data to be written by parsing the data writing request and store the data to be written in the master node.
In some possible implementation manners, the data writing request may further include a logical address of the data to be written, and the logical address is used to determine the master node and the standby node, where the data to be written needs to be written correspondingly. Then, for the distributed system, as shown in fig. 3, the determining the mode of the master node and the standby node to which the data to be written in the data write request needs to be written may specifically include:
step 301, determining, by a node in the distributed system, a partition corresponding to the data to be written according to the logical address.
It should be understood that the node in the distributed system refers to a node in the distributed system that receives a data writing request sent by a terminal, for example: the nodes in the distributed system may be master nodes, for example: the nodes in the distributed system may also be other nodes in the distributed system.
In specific implementation, the nodes in the distributed system may first obtain the logical address and the data to be written by analyzing the write data request, and then determine the partition corresponding to the data to be written according to the logical address. As an example, the determining, according to the logical address, a partition mode corresponding to the data to be written may specifically be: and performing hash calculation on the logical address by the node in the distributed system, and taking an obtained calculation result as a partition corresponding to the data to be written.
Step 302, a node in the distributed system determines the master node, the first standby node, and the second standby node according to the partition through the correspondence between the partition and the master node stored on the metadata server.
It is understood that, when storing data to be written, the data to be written is generally stored in the primary node and the corresponding plurality of standby nodes respectively for the sake of reliability of the stored data. It should be noted that, for a piece of data, data to be written needs to be written in the piece of data, and after determining corresponding partitions according to a logical address, the partitions all correspond to a fixed master node and a fixed standby node and are used for storing the data to be written.
In this embodiment of the present application, the corresponding relationship between the partition and the master node may be stored on the metadata server, and then, after the node in the distributed system obtains the partition according tostep 301, the master node corresponding to the partition and the corresponding first standby node and second standby node may be determined according to the corresponding relationship between the partition and the master node.
The metadata server refers to a server having a function of storing a correspondence between a partition and a master node, and may specifically be a dedicated server dedicated for storing a correspondence between a partition and a master node, or may be another server having a function of storing a correspondence between a partition and a master node, which is not specifically limited in the embodiment of the present application.
In a specific implementation, in one case, when the node in the distributed system is the master node determined instep 302,step 202 may be directly performed, that is, the master node writes data to be written into the determined first standby node and second standby node, respectively; in another case, when the node in the distributed system is not a master node, the embodiment of the present application may further include: the nodes in the distributed system send the write data request received from the terminal to the corresponding master node, which then executes the embodiment shown in fig. 2.
For example, the embodiment of the present application is applied to a scenario as shown in fig. 4, where the scenario includes a metadata server 410, a terminal 420, and a distributed system 430. The metadata server 410 stores therein a Partition of each piece of data and a Partition View of a correspondence relationship between a master node, a first standby node, and a second standby node, for example: one PartitionView may include: partition-master node-first standby node-second standby node.
As an implementation manner, if a node in the distributed system is a master node, the process of determining each node storing data specifically includes: when the terminal 420 needs to write data, the metadata server 410 may send a Partition View corresponding to the data to the terminal 420; when the terminal 420 receives the Partition View, the corresponding Partition h (y) can be determined according to the logical address y, and the master node corresponding to h (y) is searched in the Partition View and determined as the master node 431; then, the terminal 420 may send a write data request carrying the data to the determined master node 431; the metadata server 410 may further send the Partition View corresponding to the data to the master node 431, where the master node 431 calculates a corresponding Partition h (y) according to a logical address y in the received write data request, and determines that the master node 431 is the master node storing the data x and the corresponding standby nodes are the first standby node 432 and the second standby node 433 according to the Partition h (y) and the received Partition View.
As another implementation, if the node in the distributed system is not the master node, the process of determining each node storing data specifically includes: when the terminal 420 needs to write data, the terminal 420 may send a data writing request corresponding to the data to any node 43N in the distributed system 430, and at this time, the metadata server 410 may send a Partition View corresponding to the data to the node 43N; when the node 43N receives the Partition View, it may determine the corresponding Partition h (y) according to the logical address y, and search the Partition View for the master node corresponding to h (y) to determine the master node 431; then, the node 43N may send a write data request carrying the data to the determined master node 431; the metadata server 410 may further send the Partition View corresponding to the data to the master node 431, where the master node 431 calculates a corresponding Partition h (y) according to a logical address y in the received write data request, and determines that the master node 431 is the master node storing the data x and the corresponding standby nodes are the first standby node 432 and the second standby node 433 according to the Partition h (y) and the received Partition View.
It should be noted that the metadata server 410 may be a functional node located in the distributed system 430 and having a function of storing a correspondence between partitions and a master node, and specifically may be a server device separately deployed independently from the master node 431, the first standby node 432, and the second standby node 433, or may be deployed in the same device with any one of the nodes, or may be deployed in the same device with any two of the nodes, or may be deployed in the same device with all of the three nodes, which is not specifically limited in this embodiment of the present application.
Therefore, the main node is determined, and the corresponding standby nodes are determined after the main node receives the data writing request, so that preparation is made for writing data in the distributed system.
Step 202, the master node writes the data to be written into the first standby node and the second standby node respectively.
In specific implementation, when the master node receives a data writing request, the data to be written is analyzed from the data writing request, and it is determined that the standby nodes corresponding to the data to be written stored by the master node are the first standby node and the second standby node, the data to be written can be directly and respectively sent to the first standby node and the second standby node, and the first standby node and the second standby node are instructed to store the data to be written.
Step 203, when determining that the performance of the first standby node is lower than a set performance threshold, the master node ignores whether the first standby node sends a first response message that the data to be written is successfully written to the master node;
and 204, when receiving a second response message that the data to be written is successfully written and sent by the second standby node, the master node feeds back the success of writing to the terminal.
It is understood that each node in the distributed system may detect the performance of the node when writing data, and the performance may represent the efficiency of the node in writing data, for example, the performance may be the delay time of the node in writing data, and for example, the performance may also be the speed of the node in writing data.
It is understood that the performance threshold may be a minimum allowable value that is preset and represents the performance of the non-ignored standby node. When detecting that the performance of the standby node is lower than the set performance threshold, indicating that the current performance of the standby node is poor, and needing to ignore whether the standby node sends a response message that the data to be written is successfully written to the host node; when detecting that the performance of the standby node is not lower than the set performance threshold, the current performance of the standby node is good, and the standby node needs to wait for sending a response message that the data to be written is successfully written to the host node.
In a specific implementation, a standby node with a performance lower than a set performance threshold is referred to as a first standby node, where the first standby node is a standby node with a poor performance, and may specifically be a standby node with slow data writing in a short time due to network jitter and other problems. It should be noted that the first standby node may refer to one node or may refer to multiple nodes, and is not specifically limited in this embodiment. Conversely, a standby node with performance not lower than the set performance threshold is marked as a second standby node, and the performance of the second standby node is good, for example, writing data is fast. It should be noted that the second standby node may refer to one node or multiple nodes, and is not specifically limited in this embodiment.
After the master node writes the data to be written into the first standby node, and when the first standby node successfully writes the data into the first standby node, the first standby node may generate a first response message and send the first response message to the master node, so as to inform the master node that the data to be written is successfully written into the first standby node. Since the performance of the first standby node is lower than the set performance threshold, it is likely that the data is written slowly on the first standby node, and a long time is required for generating the first response message indicating that the data to be written is successfully written on the first standby node. Then, the master node may ignore whether the first response message is received from the first standby node, that is, the master node no longer needs to pay attention to whether the first standby node is successfully written, and whether the first standby node is successfully written does not affect that the master node feeds back the success of writing to the terminal. In addition, after the master node writes the data to be written into the second standby node, when the second standby node successfully writes, the second standby node may generate a second response message, and send the second response message to the master node, so as to inform the master node that the data to be written is successfully written into the second standby node. If a plurality of second standby nodes exist, the master node may feed back success in writing to the terminal only when receiving the second response messages fed back by all the second standby nodes.
It should be noted that the feedback of successful writing may be directly sent by the master node to the terminal, or may be fed back to the terminal by the master node through other nodes in the distributed system.
In some possible implementations, the master node may record a first standby node with a performance lower than a set performance threshold as a temporary failure state, where the temporary failure state is used to instruct the master node to determine that the performance of the first standby node is lower than the set performance threshold, and further, to instruct the master node to ignore whether the first standby node is successfully written when executing the write data request. Then, step 203 may specifically be: when the master node detects that the state of the first standby node is a temporary failure state, the master node determines that the performance of the first standby node is lower than a set performance threshold, and ignores whether the first standby node sends a first response message that the data to be written is successfully written to the master node.
It will be appreciated that, betweenstep 203, the master node may also: detecting the performance of each standby node, and judging whether the performance of each standby node is lower than a set performance threshold value; and recording the standby node with the performance lower than the set performance threshold as a first standby node, and marking the first standby node as a temporary failure state.
For example, the set performance threshold is specifically a time threshold, and then when the data to be written all need to be written into the same data entry, the master node may detect a write-in duration of each standby node storing each data to be written.
As an example, in a certain data writing request, the master node may obtain the writing duration of each standby node when data is written last n times, calculate the average writing duration of each standby node, and determine the standby node with a larger average writing duration as the standby node with a performance lower than the set performance threshold when the difference between the average writing durations is greater than the time threshold; and when the difference value of the average writing duration is not more than the time threshold, determining that all the standby nodes with the performance not lower than the set performance threshold. For example: assuming that the time threshold is 300 μ s, and n is 4, including the standby node 1, the standby node 2, and the standby node 3, the write duration of the standby node 1 for the last 4 times may be obtained: t is11、T12、T13And T14The latest 4 times of writing duration of the standby node 2: t is21、T22、T23And T24And the write duration of the last 4 times of the backup node 3: t is31、T32、T33And T34Calculating the average writing time length of the standby node 1 as T1=(T11+T12+T13+T14) 4, average writing time length of backup node 2 is T2=(T21+T22+T23+T24) 4 and the average write duration of the spare node 3 is T3=(T31+T32+T33+T34) 4, div of a division; if T is1-T2If it is more than 300 microseconds, T is determined1The corresponding standby node 1 is a standby node with the performance lower than the set performance threshold value if T2-T3If the time is less than or equal to 300 microseconds, determining T2And T3The corresponding standby nodes 2 and 3 are standby nodes with performance not lower than a set performance threshold.
It is understood that the first standby node with performance lower than the set performance threshold may be marked by the master node as a temporary failure state due to poor performance in order not to affect the overall data writing performance (especially the data writing efficiency) of the distributed system. For example, for the above example including standby node 1, standby node 2, and standby node 3, the master node may mark standby node 1 with performance below a set performance threshold as a temporary failure state.
The temporary failure state is used for indicating the main node to determine that the performance of the first standby node is lower than a set performance threshold. In a specific implementation, when the master node receives a data writing request, as long as the first standby node marked as a temporary failure state is found, it may be determined that the performance of the first standby node is lower than a set performance threshold, that is, the performance of the first standby node is poor. It is understood that, during the period when the first standby node is marked as a temporary failure state, after the master node executes the write data request to write the data to be written to the first standby node, the master node may ignore whether the write of the first standby node is successful, that is, the master node does not need to consider whether the first standby node sends the first response message. For example, in the above example including the standby node 1, the standby node 2, and the standby node 3, the master node may ignore whether the standby node 1 in the temporary failure state feeds back the response message, and may feed back success in writing to the terminal as long as it waits for the response message fed back by the standby node 2 and the standby node 3.
In some possible implementation manners, after determining that the performance of the first standby node is lower than the threshold, ignoring whether the first standby node feeds back the first response message, considering that the first standby node is likely to have performance degraded in a short time due to short-time network jitter and the like, when the network jitter and the like disappears, the performance of the first standby node is likely to be recovered, and at this time, in order to ensure reliability of data writing to the maximum extent, the embodiment of the present application may further include: when detecting that the performance of the first standby node reaches a set performance threshold, the master node records the state of the first standby node as an incomplete normal state, where the incomplete normal state is used to indicate that data stored by the first standby node may not be consistent with data stored by the master node, and the master node needs to determine whether the first standby node is successfully written when executing a data writing request.
It is understood that afterstep 203, the master node may also continuously or periodically detect the performance of the first standby node and determine that the performance of the first standby node reaches the set performance threshold.
For example, the set performance threshold is specifically a time threshold, and then when the data to be written all need to be written into the same data entry, the master node may detect a write duration for storing each data to be written after the first standby node is marked as a temporary failure state.
As an example, in a certain data writing request, the master node may obtain a writing duration of the first standby node when data is written into the first standby node for the latest m times, calculate an average writing duration of the first standby node, and determine that the performance of the first standby node reaches the set performance threshold when a difference between the average writing duration of the first standby node and average writing durations of other standby nodes is smaller than a time threshold. For example: assuming that the time threshold is 50 μ s, and m is 3, and includes the standby node 2 in the normal state and the standby node 1 in the temporary failure state, the write duration of the standby node 1 for the last 3 times may be obtained: t is11’、T12' and T13' and the write duration of the backup node 2 for the last 3 times can be obtained: t is21’、T22' and T23' calculating the average writing time length of the standby node 1 as T1’=(T11’+T12’+T13')/3, and the average write time period of the backup node 2 is T2’=(T21’+T22’+T23')/3; if T is1’-T2' < 50 μ s, then T is determined1' the corresponding standby node 1 is the first standby node whose performance has reached a set performance threshold.
It is understood that, when the performance of the first standby node reaches the set performance threshold, which indicates that the performance of the first standby node has been restored, the master node may mark the first standby node as an incomplete normal state in order to improve the reliability of the stored data of the distributed system.
It should be noted that the set performance threshold for determining whether the performance of the first standby node is recovered may be the same as or different from the set performance threshold for determining that the performance of the first standby node is poor instep 203, but generally, the set performance threshold here is less than or equal to the set performance threshold for determining that the performance of the first standby node is poor instep 203, and may be respectively set according to actual conditions and requirements, and is not limited herein.
On one hand, the incomplete normal state indicates that the master node needs to determine whether the first standby node successfully writes while executing a subsequent write data request. In specific implementation, when the master node receives a data writing request, it can be determined that the first standby node is a standby node whose performance reaches a set performance threshold as long as the first standby node in an incomplete normal state is found, and after the master node executes the data writing request to write data to the first standby node, the master node waits for the first standby node to write successfully, that is, the master node can send feedback that the data writing is successful to the terminal only after receiving a first response message sent by the first standby node and indicating that the first standby node writes successfully.
On the other hand, the incomplete normal state is also used to indicate that there may be inconsistency between the data stored by the first standby node and the data stored by the master node.
It can be understood that, after the performance of the first standby node is lower than the set performance threshold, the master node ignores the first response message fed back by the first standby node, that is, the master node may receive the first response message, or may not receive the first response message, that is, when the first standby node is in the temporary failure state, the master node does not receive the first response message, which indicates that there is a high possibility that the writing on the first standby node is unsuccessful. When a first standby node is in a temporary failure state, if the first standby node has a condition that data to be written is not successfully written, and the data in the main node is not stored in the first standby node under the condition, the data stored in the first standby node is inconsistent with the data stored in the main node after the performance of the first standby node is recovered; otherwise, if the first standby node has no condition that the data to be written is not successfully written, and only the data writing speed is relatively low, then after the performance of the first standby node is recovered, the data stored in the first standby node is consistent with the data stored in the main node.
It can be understood that, when the first standby node recovers from the temporary failure state to the incomplete normal state, because whether the writing of the first standby node is successfully ignored by the master node in the temporary failure state, during this period, there is a high possibility that the writing of the first standby node fails in the first standby node, which causes inconsistency of data stored in the first standby node and the master node, and then, in order to improve reliability of data stored in the distributed system, the embodiment of the present application further includes an operation of performing data synchronization on the first standby node.
Fig. 5 shows a processing manner of the first standby node for recovering performance afterstep 203 in this embodiment of the application, that is, this embodiment of the application may specifically further include:
step 501, when the state of the first standby node is in an incomplete normal state, the master node sends the data lost by the first standby node to the first standby node.
It can be understood that the first standby node is in an incomplete normal state, which indicates that the first standby node recovers from a temporary failure state with poor performance to a standby node with good performance, at this time, the master node needs to compare whether data stored in the first standby node and data stored in the master node are consistent, and if the data stored in the first standby node and the data stored in the master node are consistent, it indicates that the first standby node has a period of time with poor performance, but the storage speed is slow, and no data is lost, and the master node may directly modify the state of the first standby node to a normal state without performingsteps 501 to 503.
If the master node finds that the data stored in the first standby node is inconsistent with the data stored in the master node through comparison, the master node indicates that the first standby node loses part of the data to be stored, and then the master node needs to compare the difference between the data stored in the first standby node and the data stored in the master node, and send the difference part as the lost data of the first standby node to the first standby node.
Step 502, after the first standby node receives the lost data, the master node determines whether the data stored in the first standby node is consistent with the data stored in the master node.
It can be understood that, when the first standby node receives the lost data, the first standby node correspondingly stores the lost data. At this time, in order to ensure that the data stored in the first standby node is completely consistent with the data stored in the master node after the first standby node synchronizes the lost data, and is a reliable data source, the master node needs to compare whether the data stored in the first standby node is consistent with the data stored in the master node, if so,step 503 is executed, otherwise, the data synchronization between the first standby node and the master node is realized throughstep 501 andstep 502.
Step 503, when the data stored in the first standby node is consistent with the data stored in the master node, the master node modifies the state of the first standby node to a normal state, where the normal state is used to indicate that the data stored in the first standby node is consistent with the data stored in the master node, and the master node needs to determine whether the first standby node is successfully written when executing the data writing request.
It can be understood that the normal state of the first standby node not only means that the performance of the first standby node is restored above the set performance threshold, but also requires that the data stored in the first standby node is consistent with the data stored in the main node, that is, the first standby node and the main node achieve complete data synchronization.
In a specific implementation, when the state of the first standby node is a normal state, the master node not only needs to consider whether the first standby node is successfully written in the data writing request, but also can reliably call data in the first standby node even if the master node and/or the second standby node goes down.
In addition, when the distributed system encounters a master node replacement (for example, a master node is down), a new master node needs to be selected and determined from the slave nodes corresponding to the master node. In one case, if the new master node determined by the metadata server is the first standby node in the temporary failure state, the metadata server may be informed that the first standby node is the temporary failure node, and the master node is raised to fail, prompting the occurrence of selection of the new master node; in another case, if the new master node determined by the metadata server is the first slave node in an incomplete normal state, as an example, in consideration of data integrity, the metadata server may be notified that the first slave node is a node whose data may be incomplete, and the master node is raised to fail, and a selection of the new master node is prompted to occur, or, as another example, data differences between the first slave node and the master node may be compared, and if the data differences are within a certain threshold range, data synchronization between the first slave node and the master node may be performed in the manner shown in fig. 5, and then the metadata server is notified that the master node is raised to succeed; in another case, if the new master node determined by the metadata server is the first slave node or the second slave node in the normal state, the metadata server may be directly informed that the master node is successfully updated.
It can be seen that when requesting to write data in the distributed system, the master node may temporarily ignore the feedback of the standby nodes for lower performance (for example, too slow data writing speed) by detecting whether the performance of each standby node meets the requirements, and only wait for successful write feedback of the standby nodes with the performance meeting the requirements, that is, as long as the standby nodes with the performance meeting the requirements return response messages, the master node may feed back successful write to the terminal, compared with a distributed system data writing manner in which the master node waits for all the standby nodes to feed back successful write response messages to the terminal no matter whether the standby nodes with poor performance exist, the master node can effectively avoid the problems of long time and low efficiency of successful feedback write feedback of the whole distributed system due to slow feedback of the standby nodes with lower performance, therefore, the data writing efficiency of the distributed system is improved.
In order to make the method for writing data in a distributed system provided by the embodiment of the present application clearer and easier to understand, the embodiment of the present application shown in fig. 2 is described below with reference to fig. 6, taking an example of writing different logs in one data entry.
For example, after the master node a, the first standby node B, and the second standby node C of the data to be written are determined, in order to provide a basis for a write flow for a log to be written in which the data entry is subsequently written, in the embodiment of the present application, a FAST synchronization Table (english: FAST SYNC Table) may be further generated, where the master node and the node whose performance is not lower than the set performance threshold are present in the FAST synchronization Table. It can be understood that, nodes for which the master node needs to wait for the feedback response message are all recorded in the fast synchronization table, and the state of the nodes in the fast synchronization table can be considered as a normal state; nodes marked as temporarily invalid can be deleted from the fast synchronization table; nodes marked as not completely normal may be added back to the fast synchronization table in a distorted form, e.g., adding deleted node bs back to the fast synchronization table as B'; after data synchronization is performed on the node in the incomplete normal state, the flag of the node may be modified to the normal state, for example, B' is modified to B. It should be noted that the master node a can write data in the distributed system only according to the fast synchronization table.
Fig. 6 is a schematic flow chart of the present embodiment. The embodiment may specifically include:
atstep 601, master node A issues a fast synchronization table comprising A, B and C and writes to Log 4.
As shown in fig. 7, a schematic diagram of the distributed system when no abnormality is detected is presented, and at this time, the master node a, the first standby node B, and the second standby node C store a log 1, a log 2, a log 3, and a log 4, respectively. Further, the fast synchronization table includes "A, B and" C ", so that the master node a receives a first response message indicating that the first standby node B has successfully written and a second response message indicating that the second standby node C has successfully written, and can regard them as write completion.
Step 602, when the master node a writes the log 5 into the first standby node B and the second standby node C, it detects that the performance of the first standby node B is lower than the set performance threshold, and deletes B from the fast synchronization table.
In a specific implementation, when the master node a receives a write data request including the log 5, the master node a detects that the performance of the first standby node B is lower than a set performance threshold, and then, as shown in fig. 8, a schematic diagram of the distributed system when an anomaly is detected is presented, when the first standby node B is marked as a temporary failure state, and "B" in the fast synchronization table is deleted, at this time, "a and C" are included in the fast synchronization table, so that the master node a only needs to receive a second response message indicating that the second standby node C is successfully written, and may be regarded as write completion.
Step 603, after the master node a writes logs 6 and 7 into the first standby node B and the second standby node C, and writes a log 8 into the first standby node B and the second standby node C, when detecting that the performance of the first standby node B reaches a set performance threshold, marking B as an incomplete normal state B ', and adding B' back to the fast synchronization table.
In a specific implementation, when the write data request including the log 8 is received by the master node a after the write in the logs 6 and 7 is completed, the master node a detects that the performance of the first standby node B reaches the set performance threshold, and then, as shown in fig. 9, presents a schematic diagram of the distributed system that recovers the performance after detecting the abnormality, marks the first standby node as an incomplete normal state, and rejoins "B '" into the fast synchronization table, where "A, C and B'" are included in the fast synchronization table, so that the master node a receives a first response message indicating that the write in the first standby node B is successful and a second response message indicating that the write in the second standby node C is successful, and can regard the first response message as the write completion.
Instep 604, the master node a marks the first standby node B as a normal state after completing the data synchronization of the first standby node B.
It should be noted that "B '" is added to the fast synchronization table so that the first standby node B is in an incomplete normal state at this time, that is, there may be data difference with the master node a, at this time, the master node a may send the data (for example, log 6 and log 7) lost by the first standby node B to the first standby node B by comparing the difference between the data stored by the first standby node B and the data stored by itself, complete the data synchronization between the first standby node B and the master node a, at this time, the first standby node B may be marked as a normal state, and "B'" in the fast synchronization table may be replaced by "B", then, as shown in fig. 10, a schematic diagram of the distributed system is presented, the fast synchronization table includes "A, C and" B ", so that the first response message indicating that the first standby node B successfully writes and the second response message indicating that the second standby node C successfully writes, can it be considered a write completion.
Therefore, according to the method for writing data in the distributed system provided by the embodiment of the application, by detecting whether the performance of each standby node is lower than a set performance threshold, the master node can temporarily ignore the feedback of the standby nodes with lower performance (for example, the data writing speed is too slow) and only wait for the feedback of the standby nodes meeting the requirements; moreover, the performance of the temporarily ignored standby node is detected, and the feedback of the standby node with the recovery performance is considered to be re-considered, so that the reliability of data storage is improved to the maximum extent; and by establishing the quick synchronization table, a list of standby nodes needing to be considered for feedback is simply and conveniently improved for the main node, the condition that the main node feeds back writing success to the terminal is indicated, and the like. Therefore, the method and the device for writing the data in the distributed system can effectively solve the problems that due to the fact that the backup node with low performance feeds back slowly, the time for the feedback writing of the whole distributed system is long, the efficiency is low, especially the problem that the performance of the node is reduced due to short-time network jitter is solved, and the data writing efficiency of the distributed system is improved.
Correspondingly, the embodiment of the application also provides a device for writing data in the distributed system. Fig. 11 is a schematic structural diagram illustrating an apparatus for writing data in a distributed system in an embodiment of the present application, where the apparatus 1100 may specifically include:
a receiving unit 1101, configured to receive a write data request by a master node, where the write data request includes data to be written.
A writing unit 1102, configured to write the data to be written into the first standby node and the second standby node, respectively, by the master node.
A first determining unit 1103, configured to, when determining that the performance of the first standby node is lower than a set performance threshold, ignore that whether the first standby node sends a first response message that the data to be written is successfully written to the master node.
A feedback unit 1104, configured to, when receiving a second response message that the data to be written is successfully written and sent by the second standby node, the master node feeds back the success of writing to the terminal.
In a specific implementation, the receiving unit 1101 may be specifically configured to execute the method instep 201, specifically refer to the description ofstep 201 in the method embodiment shown in fig. 2; the writing unit 1102 may be specifically configured to execute the method instep 202, specifically refer to the description ofstep 202 in the embodiment of the method shown in fig. 2; the first determining unit 1103 may be specifically configured to execute the method instep 203, specifically refer to the description ofstep 203 in the method embodiment shown in fig. 2; the feedback unit 1104 may be specifically configured to execute the method instep 204, and please refer to the description ofstep 204 in the embodiment of the method shown in fig. 2, which is not described herein again.
Optionally, the write data request further includes a logical address of the data to be written; then, the apparatus 1100 further comprises:
a second determining unit, configured to determine, by the master node, a partition corresponding to the data to be written according to the logical address; a third determining unit, configured to determine, by the master node, the first standby node, and the second standby node according to the partition and through a correspondence between the partition and the master node, where the correspondence is stored in the metadata server.
In a specific implementation, the second determining unit and the third determining unit may be specifically configured to execute the methods insteps 301 to 302, which please refer to the related description of the method embodiment shown in fig. 3, and are not described herein again.
Optionally, the apparatus 1100 further comprises:
a first recording unit, configured to record, by the master node, a state of the first standby node as a temporary failure state, where the temporary failure state is used to indicate that the master node ignores whether writing of the first standby node is successful when executing the write data request.
In a specific implementation, the first recording unit may specifically refer to the description related to the state flag part instep 203 in the method embodiment shown in fig. 2, and is not described herein again.
Optionally, the apparatus 1100 further comprises:
a second recording unit, configured to record, after the master node determines that the performance of the first standby node is lower than a threshold, the state of the first standby node as an incomplete normal state when the master node detects that the performance of the first standby node reaches the set performance threshold, where the incomplete normal state is used to indicate that data stored by the first standby node is inconsistent with data stored by the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a write data request.
In a specific implementation, the second recording unit may specifically refer to the description of the performance recovery part of the first standby node afterstep 204 in the method embodiment shown in fig. 2, and is not described herein again.
Optionally, the apparatus 1100 further comprises:
a sending unit, configured to send, by the master node, data lost by the first standby node to the first standby node when the state of the first standby node is in an incomplete normal state;
a determining unit, configured to, after the first standby node receives the lost data, the master node determine whether data stored in the first standby node is consistent with data stored in the master node;
a state modifying unit, configured to, when the data stored in the first standby node is consistent with the data stored in the master node, modify, by the master node, the state of the first standby node into a normal state, where the normal state is used to indicate that the data stored in the first standby node is consistent with the data stored in the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a write data request.
In a specific implementation, the sending unit, the determining unit, and the state modifying unit may specifically refer to the description related to the data synchronization part of the first standby node and the master node in the method embodiment shown in fig. 5, and are not described herein again.
Thus, with the apparatus for writing data in a distributed system provided in this embodiment of the present application, when requesting to write data in the distributed system, the master node may temporarily ignore the feedback of the standby nodes by detecting whether the performance of each standby node meets the requirement (for example, the speed of writing data is too slow), and only wait for the successful write feedback of the standby nodes whose performance meets the requirement, that is, as long as the standby nodes whose performance meets the requirement return a response message, the master node may feed back the successful write to the terminal The problem of low efficiency is solved, and therefore the data writing efficiency of the distributed system is improved.
In addition, the embodiment of the application also provides a device for writing data in the distributed system. Fig. 12 is a schematic structural diagram illustrating an apparatus for writing data in a distributed system according to an embodiment of the present application, where the apparatus 1200 includes: at least oneprocessor 1201 and amemory 1202 are connected, where thememory 1202 is used for storing program codes, and theprocessor 1201 is used for calling the program codes in the memory to execute the method for writing data in the distributed system provided in fig. 2.
In a specific implementation, theprocessor 1201 is configured to implement the following operations:
the method comprises the steps that a main node receives a data writing request, wherein the data writing request comprises data to be written;
the master node writes the data to be written into a first standby node and a second standby node respectively;
when the master node determines that the performance of the first standby node is lower than a set performance threshold, ignoring whether the first standby node sends a first response message that the data to be written is successfully written to the master node;
and the main node feeds back successful writing to the terminal when receiving a second response message which is sent by the second standby node and in which the data to be written is successfully written.
Optionally, the write data request further includes a logical address of the data to be written;
theprocessor 1201 is further configured to implement the following operations:
the main node determines a partition corresponding to the data to be written according to the logic address;
and the main node determines the main node, the first standby node and the second standby node according to the partition and the corresponding relation between the partition and the main node stored in the metadata server.
Optionally, theprocessor 1201 is further configured to implement the following operations:
and the master node records the state of the first standby node as a temporary failure state, wherein the temporary failure state is used for indicating whether the master node ignores whether the first standby node is successfully written or not when executing the data writing request.
Optionally, after the master node determines that the performance of the first slave node is lower than a threshold, theprocessor 1201 is further configured to:
when detecting that the performance of the first standby node reaches the set performance threshold, the master node records the state of the first standby node as an incomplete normal state, where the incomplete normal state is used to indicate that data stored by the first standby node is inconsistent with data stored by the master node, and the master node needs to determine whether the first standby node is successfully written in when executing a data writing request.
Optionally, theprocessor 1201 is further configured to implement the following operations:
when the state of the first standby node is in an incomplete normal state, the main node sends the data lost by the first standby node to the first standby node;
after the first standby node receives the lost data, the main node judges whether the data stored in the first standby node is consistent with the data stored in the main node;
when the data stored in the first standby node is consistent with the data stored in the main node, the main node modifies the state of the first standby node into a normal state, the normal state is used for indicating that the data stored in the first standby node is consistent with the data stored in the main node, and the main node needs to determine whether the first standby node is successfully written in when executing a data writing request.
In addition, the embodiment of the application also provides a system for writing data in the distributed system. Fig. 13 is a schematic structural diagram illustrating a system for writing data in a distributed system according to an embodiment of the present application, where the system 1300 includes: the system comprises a main node 1301, a first standby node 1302 and a second standby node 1303; the master node 1301 is configured to execute the method for writing data in the distributed system provided in fig. 2, which is not described herein again.
Furthermore, an embodiment of the present application also provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform one or more steps of any one of the methods described above. The respective constituent modules of the above-described apparatus for writing data in a distributed system may be stored in the computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.
Based on such understanding, the embodiments of the present application also provide a computer program product containing instructions, where a part of or all or part of the technical solution that substantially contributes to the prior art may be embodied in the form of a software product stored in a storage medium, and the computer program product contains instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor therein to execute all or part of the steps of the method described in the embodiments of the present application.
In the names of "first standby node", "first response message", and the like, the "first" mentioned in the embodiments of the present application is only used for name identification, and does not represent the first in sequence. The same applies to "second" etc.
As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a router) to execute the method according to the embodiments or some parts of the embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device, and system embodiments, which are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some descriptions of the method embodiments for related points. The above-described embodiments of the apparatus, device and system are merely illustrative, wherein modules described as separate parts may or may not be physically separate, and parts shown as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only an exemplary embodiment of the present application, and is not intended to limit the scope of the present application.