Summary of the invention
In view of this, the application provides a kind of data reconstruction method and device, made to prevent because of restoring to dataIt is aggravated at OSD machine utilization, influences the problem of Ceph cluster is to client traffic processing.
Specifically, the application is achieved by the following technical solution:
According to a first aspect of the present application, a kind of data reconstruction method is provided, the method is applied to distributed storage systemMonitor in system Ceph cluster, when the object storage device OSD topology of the Ceph cluster changes, the method packetIt includes:
The target for determining that pending data are restored puts in order a group PG;
Detect the current corresponding normal OSD copy amount of target PG whether be more than or equal to preset minimum number of copies, withAnd the detection Ceph cluster load condition;
If the corresponding normal OSD copy amount of the target PG is more than or equal to the minimum number of copies and the Ceph collectionLoad condition is busy condition in group, then delay restores the data that data pending in the target PG are restored.
Optionally, the method also includes:
If the corresponding normal OSD copy amount of the target PG is less than the minimum number of copies or the Ceph clusterIn non-busy condition, then data to be restored in the target PG are restored.
Optionally, the busy-idle condition of the detection Ceph cluster, comprising:
It is pre- that detection reflects whether the current value of the cluster load parameter of the Ceph cluster current loading condition is greater than firstIf value;
If so, determining that the Ceph cluster is in busy condition;
If it is not, then further detection reflects the node load parameter of each OSD current loading condition in the Ceph clusterCurrent value;If there are the OSD that the current value of node load parameter is greater than the second preset value in the Ceph cluster, it is determined that describedCeph cluster is in busy condition;If the current value of the node load parameter of all OSD is respectively less than and is equal in the Ceph clusterSecond preset value, it is determined that the Ceph cluster is in non-busy condition.
Optionally, after the target that the pending data of the determination are restored puts in order group PG, which comprises
Start preset timer;
Whether the current corresponding normal OSD copy amount of detection target PG is more than or equal to preset minimum copyNumber and the detection Ceph cluster load condition, comprising:
Whether overtime detect the timer;
If the timer expiry, detect the current corresponding normal OSD copy amount of target PG whether be more than or equal to it is pre-If minimum number of copies and the detection Ceph cluster load condition;
If the corresponding normal OSD copy amount of the target PG is more than or equal to the minimum number of copies and describedBusy condition is in Ceph cluster, then delay restores the data in the target PG, comprising:
If the corresponding normal OSD copy amount of the target PG is more than or equal to the minimum number of copies and the Ceph collectionBusy condition is in group, then return the detection timer whether Chao Shi step.
It is optionally, described that data to be restored in the target PG are restored, comprising:
When starting to restore data to be restored in target PG, start preset second timer;
In the second timer time-out, whether all data to be restored detected in the target PG are completed to restore;
If it is not, then stopping restoring unrecovered data in target PG, the second timer is closed, and returnThe detection timer whether Chao Shi step.
Optionally, the cluster load parameter includes: the Ceph cluster current business IO quantity and the Ceph clusterThe ratio of current all IO quantity;
The node load parameter includes: hard disk utilization, the number IPOS per second being written and read.
Optionally, before the target PG that the pending data of the determination are restored, the method also includes:
Calculate OSD group corresponding to each PG in the Ceph cluster;
The target PG that the pending data of determination are restored, comprising:
For each PG, if the corresponding OSD group of calculated PG is with the PG, currently corresponding OSD group is inconsistent, reallyThe fixed PG is the target PG that pending data are restored.
According to a second aspect of the present application, a kind of Data Recapture Unit is provided, described device is applied to distributed storage systemMonitor in system Ceph cluster, when the object storage device OSD topology of the Ceph cluster changes, described device packetIt includes:
Determination unit, the target for determining that pending data are restored put in order a group PG;
Detection unit, for detect the current corresponding normal OSD copy amount of target PG whether be more than or equal to it is presetMinimum number of copies and the detection Ceph cluster load condition;
Delay cell, if for the corresponding normal OSD copy amount of the target PG be more than or equal to the minimum number of copies,And load condition is busy condition in the Ceph cluster, then the delay data of restoring data pending in the target PG intoRow restores.
Optionally, described device further include:
Recovery unit, if for the corresponding normal OSD copy amount of the target PG be less than the minimum number of copies orThe Ceph cluster is in non-busy condition, then restores to data to be restored in the target PG.
Optionally, the detection unit, for detecting the cluster load ginseng for reflecting the Ceph cluster current loading conditionWhether several current values is greater than the first preset value;If so, determining that the Ceph cluster is in busy condition;If it is not, then into oneStep detection reflects the current value of the node load parameter of each OSD current loading condition in the Ceph cluster;If the Ceph collectionThere are the OSD that the current value of node load parameter is greater than the second preset value in group, it is determined that the Ceph cluster is in busy shapeState;If the current value of the node load parameter of all OSD, which is respectively less than, in the Ceph cluster is equal to second preset value, reallyThe fixed Ceph cluster is in non-busy condition.
Optionally, described device further include:
Start unit, for starting preset timer;
Whether overtime the detection unit is specifically used for the detection timer;If the timer expiry detects the meshWhether the current corresponding normal OSD copy amount of mark PG is more than or equal to preset minimum number of copies and the detection Ceph collectionGroup's load condition;
The delay cell, if be specifically used for the corresponding normal OSD copy amount of the target PG be more than or equal to it is described mostBusy condition is in small number of copies and the Ceph cluster, then return the detection timer whether Chao Shi step.
Optionally, the recovery unit when data to be restored are restored in the target PG, is specifically used forWhen starting to restore data to be restored in target PG, start preset second timer;It is super in the second timerConstantly, whether all data to be restored detected in the target PG are completed to restore;If it is not, then stopping to inextensive in target PGMultiple data are restored, and the second timer is closed, and return the detection timer whether Chao Shi step.
Optionally, the cluster load parameter includes: the Ceph cluster current business IO quantity and the Ceph clusterThe ratio of current all IO quantity;
The node load parameter includes: hard disk utilization, the number IPOS per second being written and read.
Optionally, described device further include:
Computing unit, for calculating OSD group corresponding to each PG in the Ceph cluster;
The determination unit is specifically used for being directed to each PG, if the corresponding OSD group of calculated PG and the PG are currently rightThe OSD group answered is inconsistent, it is determined that the PG is the target PG that pending data are restored.
Seen from the above description, on the one hand, when being changed due to the OSD topology concentrated in Ceph, the application is not verticalThat is the data to be restored in target PG are restored, but to judge current Ceph cluster load condition, in determinationWhen Ceph cluster is in busy condition, delay restores the data in target PG, therefore can be effectively prevented because rightData are restored and OSD machine utilization are caused to aggravate, and the generation for handling this problem to Ceph cluster client terminal business is influenced.
On the other hand, the quantity of the application normal OSD copy also corresponding to target PG and default minimum number of copies carry outCompare, guarantees still there can be enough normal OSD copies to protect even if postponing recovery to data in target PG with thisCard processing is directed to the read-write business of target PG.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related toWhen attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodimentDescribed in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appendedThe example of the consistent device and method of some aspects be described in detail in claims, the application.
It is only to be not intended to be limiting the application merely for for the purpose of describing particular embodiments in term used in this application.It is also intended in the application and the "an" of singular used in the attached claims, " described " and "the" including majorityForm, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein refers to and wrapsIt may be combined containing one or more associated any or all of project listed.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the applicationA little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing fromIn the case where the application range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to asOne information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ...When " or " in response to determination ".
Shown in Figure 1, Fig. 1 is a kind of group-network construction based on Ceph storage shown in one exemplary embodiment of the applicationFigure.
In the networking, including Ceph cluster and client.
Wherein, above-mentioned client, the also referred to as client of Ceph cluster are mainly used for interacting with Ceph cluster,So that Ceph cluster can handle the read-write business of the client.
Specifically, Ceph cluster can receive the business IO (for example read IO, write IO etc.) that client is sent, and Ceph cluster is based onThe business IO that the client is sent executes the read-write business of client.
For example, Ceph cluster receives the IO that writes of client transmission, this can be write the data write-in of IO carrying by Ceph clusterIt is local.When reading IO, the Ceph cluster that Ceph cluster receives client transmission can be according to the reading I O read data, and will readData return to client.
Above-mentioned Ceph cluster can include: monitor, multiple OSD.Certain Ceph cluster may also include meta data server etc.Other equipment are only illustratively illustrated that the equipment for not included to Ceph cluster carries out specific to Ceph cluster hereGround limits.
Wherein, monitor is mainly used for managing each equipment in Ceph cluster.Monitor can be a physical equipment,The cluster that can be made of more physical equipments only illustratively illustrates, here without specifically limiting.
Above-mentioned OSD, the storage for being mainly used for being responsible for data are receiving for example, data are written after receiving write IO requestData etc. are read after to read I O request.OSD is usually the hard disk in Ceph cluster on each physical server.Here only to OSDFunction and the equipment form of OSD are illustratively illustrated, without specifically defined.
The several concepts being related in Ceph cluster are introduced below.
1)PG
PG also referred to as puts in order group, is the minimum unit that data are restored and changed in Ceph cluster.PG is that a logic is generalIt reads, is equivalent to the logical collection comprising one group of data, the data storage that PG is included is in the corresponding OSD group of the PG.
It include multiple OSD in the corresponding OSD group of PG.Data in PG can be replicated more parts, and it is corresponding to be separately stored in the PGOSD group in each OSD in.
For example, the corresponding OSD group of PG1 is [1,2,3], show that the corresponding OSD group of PG1 includes 3 OSD, these three OSD groupsOSD mark be respectively OSD 1, OSD 2 and OSD 3.
When write-in is directed to the data of the PG1, can write the data into OSD1, which is synchronized to OSD2 by OSD1And OSD3, so that each OSD preserves the data in the portion PG1.
2) the corresponding OSD copy of PG
The corresponding OSD copy of OSD in the corresponding OSD group of PG, referred to as PG.The number of OSD in the corresponding OSD group of PGAmount, the corresponding OSD number of copies of referred to as PG.
Still with the corresponding OSD group of above-mentioned PG1 for for [1,2,3], it is corresponding that OSD1, OSD2 and OSD3 are referred to as the PG1OSD copy.The corresponding OSD copy number of PG1 is 3.
3) business IO and recovery IO
IO present in Ceph cluster can include: business IO and recovery IO.
Business IO: business IO refers to the IO from client, is mainly used for indicating that Ceph cluster carries out the clientRead-write business.
Business IO can include: write IO etc. from the reading IO of client, from client.
For example, Ceph cluster receives the IO that writes of client transmission, this can be write the data write-in of IO carrying by Ceph clusterIt is local.When reading IO, the Ceph cluster that Ceph cluster receives client transmission can be according to the reading I O read data, and will readData return to client.
Restore IO: can be generated when the data in PG are restored, in Ceph cluster and restore IO, restores IO and be mainly used for referring toThe PG led in Ceph cluster is restored.
4) data in PG are restored
After the OSD topology in Ceph cluster changes, monitor can the corresponding OSD group of each PG in computing cluster,Then be directed to each PG, if currently corresponding OSD group is different from calculated OSD group by the PG, by the PG be determined as intoThe PG that row data are restored.
The data in PG restored to pending data carry out restoring to refer to: the data in PG are restored to calculated PGOn OSD in corresponding OSD group.
Existing data reset mode is: when monitor monitors a certain OSD failure in Ceph cluster, immediately shouldData restore that guide data recovery can be generated in data recovery procedure to normal OSD in PG corresponding with failure OSDRestore IO, the data of recovery in need are carried in recovery IO.After the normal OSD receives recovery IO, it can will restoreData are written to local in the PG carried in IO.
However, processing, which restores IO, to be aggravated if the normal OSD is just handling the largely business IO from client at this timeThe workload of normal OSD causes the normal OSD performance decline, seriously affects the processing of the business IO sent to client.
In view of this, the application is directed to a kind of data reconstruction method, after target PG to be restored is determined in monitor,If after the quantity of the corresponding normal OSD copy of target PG is more than or equal to preset minimum number of copies, and at current Ceph clusterIn busy condition, then delay restores the data in the target PG to purpose OSD.
On the one hand, when being changed due to the OSD topology concentrated in Ceph, the application is not immediately to target to be restoredData in PG are restored, but to judge current Ceph cluster busy-idle condition, hurry determining that Ceph cluster is inWhen commonplace state, delay restores the data in target PG, therefore can be effectively prevented and cause because of restoring to dataOSD machine utilization aggravates, and influences the generation for handling this problem to Ceph cluster client terminal business.
On the other hand, the quantity of the application normal OSD copy also corresponding to target PG and default minimum number of copies carry outCompare, guarantees still there can be enough normal OSD copies to protect even if postponing recovery to data in target PG with thisCard processing is directed to the read-write business of target PG.
In summary, the application, both can be extensive by postponing when Ceph cluster is in busy when carrying out data recoveryData in complicated target PG prevent from causing OSD machine utilization to aggravate because of data are restored, and influence Ceph cluster to clientThe problem of holding business processing, it can also be ensured that will not influence in postponing the recovery process to data in target PG for the targetThe read-write business processing of PG.
Referring to fig. 2, Fig. 2 is a kind of flow chart of data reconstruction method shown in one exemplary embodiment of the application.The partyMethod can be applicable on the monitor of Ceph cluster, when the OSD topology of the Ceph cluster changes, can be performed as followsStep.
Step 201: determining the target PG that pending data are restored;
In Ceph cluster, when the OSD in Ceph cluster is offline because of Network Abnormal or Ceph cluster in increase newlyWhen OSD OSD failure, the OSD topology that can all cause in Ceph cluster changes.
When monitor detects that the OSD topology in Ceph cluster changes, monitor can determine target to be restoredPG。
Firstly, first introducing lower monitor is how to detect whether the OSD topology in Ceph cluster changes.
When realizing, monitor can be by detecting whether OSD cluster map changes, to detect the OSD in Ceph clusterWhether topology changes.
Specifically, OSD cluster map, OSD cluster map is stored on monitor for recording the OSD in current Ceph clusterTopology.If monitor monitors the OSD cluster, map changes, and monitor determines that the OSD topology in the Ceph cluster becomesChange, if OSD cluster map does not change, monitor determines that the OSD topology in the Ceph cluster does not change.
If monitor is how to determine target to be restored secondly, the OSD topology introduced in lower Ceph cluster occursPG。
When realizing, when monitor determines that the OSD topology in Ceph cluster changes, monitor can pass through Crush(Controlled Replication Under Scalable Hashing, the controlled copying algorithm under expansible hash) is calculatedMethod calculates the corresponding OSD group of each PG.The corresponding OSD group expression of PG that this is calculated is after data recovery, Ceph clusterIn each PG and OSD group mapping.
Then, for each PG, it is corresponding with the current PG that monitor can obtain the corresponding OSD group of the calculated PGOSD group.
PG in Ceph cluster is configured there are two set, is up set set and acting set set respectively.upThe corresponding OSD group of current each PG is had recorded in set set.Have recorded what Crush algorithm calculated in acting set setThe corresponding OSD group of PG.
When obtaining, monitor can obtain the corresponding OSD group of the current PG from up set corresponding with PG set, fromThe corresponding OSD group of PG that Crush algorithm calculates is determined in acting set set corresponding with the PG.
After getting the corresponding OSD group of calculated PG OSD group corresponding with the current PG, the detectable meter of monitorThe whether consistent of the corresponding OSD group of the PG of calculating OSD group corresponding with the current PG
If the corresponding OSD group of calculated PG OSD group corresponding with the current PG is inconsistent, by the PG be determined as toCarry out the target PG of data recovery;
If the corresponding OSD group of calculated PG OSD group corresponding with the current PG is unanimously, it is determined that the PG be not intoThe PG that row data are restored.
For example, still by taking PG1 as an example, it is assumed that have recorded OSD group [1,2,3] in up set corresponding with PG1, PG1 is corresponding[4,2,3] are had recorded in acting set.
Monitor can obtain PG1 currently corresponding OSD group [1,2,3] in up set set, can be from acting set collectionIt is obtained in conjunction and passes through the corresponding OSD group [4,2,3] of the calculated PG1 of Crush algorithm.Since monitor determines that PG1 is currently corresponded toOSD group [1,2,3] and by the corresponding OSD group [4,2,3] of the calculated PG1 of Crush algorithm it is inconsistent, then monitor determinePG1 is target PG to be restored.
It should be noted that target PG described here can be a PG, it is also possible to multiple PG, here not to targetThe number of PG carries out specifically defined.
Step 202: detect the current corresponding normal OSD copy of the target PG quantity whether be more than or equal to it is preset mostSmall number of copies and the detection Ceph cluster load condition.
Below to two aspect of specific implementation of the trigger mechanism of step 202 and step 202, have to step 202Illustrate to body.
1, step 202 trigger mechanism
For monitor after determining the target PG group that pending data are restored, monitor can start preset timer (hereIt is denoted as first timer).
Whether monitor can detect first timer overtime.
If first timer has not timed out, first timer time-out is continued waiting for.
If first timer has timed out, step 202 is triggered, even first timer has timed out, then detects the targetWhether the quantity of the current corresponding normal OSD copy of PG is more than or equal to preset minimum number of copies and the detection Ceph collectionGroup's busy-idle condition.
2, the specific implementation of step 202:
1) whether the quantity of the detectable corresponding normal OSD copy of target PG determined of monitor is more than or equal to presetMinimum number of copies.
When realizing, monitor can determine the quantity of the corresponding normal OSD copy of target PG first.
Specifically, monitor can be from target PG currently corresponding OSD group and the corresponding OSD group of calculated target PGIn, select identical OSD in two OSD groups that it is corresponding normal then to count target PG as the corresponding normal OSD of target PGThe number of OSD.
For example, still by taking PG1 is target PG as an example, it is assumed that currently corresponding OSD group is OSD group [1,2,3] to PG1, is calculatedThe corresponding OSD group of PG1 be OSD group [4,2,3].
Identical OSD in the two OSD groups is the corresponding normal OSD that OSD2 and OSD3, then OSD2 and OSD3 are PG1Copy.The number of the normal OSD copy is 2.
Then, whether the quantity of the detectable corresponding normal OSD copy of target PG determined of monitor is more than or equal to pre-If minimum number of copies.
It should be noted that above-mentioned minimum number of copies is the least OSD copy amount that user can bear according to oneselfTo determine minimum number of copies.
For example, in existing Ceph cluster, user is in order to guarantee the reliabilities of data, according to memory space, write delaySituations such as one number of copies (assuming that with N to indicate) can be set for Ceph.Usual N >=3.
In this application, other than existing number of copies N, minimum number of copies M is had also been devised in the application.User can basisThe value of N sets minimum number of copies (assuming that with M to indicate).
For example, the value of M, i.e. 1≤M≤N can be arranged in user in the section of [1, N].Here only minimum to setting secondaryNotebook data is illustratively illustrated, without specifically defined.
Be arranged minimum number of copies and the corresponding normal OSD copy of detection target PG quantity whether be more than or equal to it is presetThe purpose of minimum number of copies is: even if guaranteeing recovery of the delay to data in target PG, still can have enough normalOSD copy come guarantee processing be directed to target PG read-write business.
Specifically, for example assume OSD1 failure, currently corresponding OSD group is OSD group [1,2,3] to target PG, calculatedThe corresponding OSD group of target PG is OSD group [4,2,3], needs to restore the data in target PG to OSD4 at this time.
During data in target PG are restored to OSD4, Ceph cluster can still be received from client and needleTo the business IO of target PG, and due to OSD1 failure, and data do not complete and restore, so these business IO can be by OSD copy(i.e. OSD2 or OSD3) is handled.
For example business IO is to read IO, the data at this time due to OSD1 failure, and in OSD1 are not restored also to OSD4, thenIt needs to read data from the corresponding OSD copy of target PG, for example reads data from OSD2 or OSD3.
If the corresponding OSD copy of target PG is all abnormal or the corresponding OSD number of copies of target PG is less than user presetThreshold value, then be unable to satisfy the processing for business IO.
For these reasons, whether the quantity that the application needs to detect the corresponding normal OSD copy of target PG is more than or equal toPreset minimum number of copies guarantees still can to have enough normal even if postponing recovery to data in target PG with thisOSD copy come guarantee processing be directed to target PG read-write business.
2) the Ceph cluster busy-idle condition is detected
Mode one:
Step 2021: it is pre- whether the current value that monitor can detect the cluster load parameter of the Ceph cluster is greater than firstIf value.
Wherein, the cluster load parameter is used to reflect the load condition of cluster entirety, for example cluster load parameter can be withIt is the Ceph cluster current business IO quantity and the Ceph cluster currently all IO ratio of number.
Wherein, the first preset value can be set according to the actual situation by user, here not to first preset value intoRow is specifically defined.
When realizing, the number of the statistics available Ceph cluster of monitor currently the quantity A and statistical service IO of all IOMeasure B.
Then, monitor can calculate the ratio of B and A, obtain C, wherein C=B/A, and C is exactly the cluster load of Ceph clusterThe current value of parameter.
Step 2022: if whether the current value of the cluster load parameter of the Ceph cluster is greater than the first preset value, reallyThe fixed Ceph cluster is in busy condition.
Step 2023: if the current value of the cluster load parameter of the Ceph cluster is less than or equal to the first preset threshold,Further detect the node load parameter of each OSD in the Ceph cluster.
Wherein, the node load parameter of each OSD is used to characterize the load condition of the OSD;The node load parameter of OSD is worked asPreceding value is bigger, shows that the IO of OSD carrying is more, OSD is busier.
The node load parameter may include: the IOPS (Input/Output of the hard disk utilization rate of OSD, OSDOperations Per Second, the number per second being written and read).When node load parameter is the hard disk utilization rate of OSDWhen, which is preset value relevant to the hard disk utilization rate, and when the node load parameter is the IOPS of OSD,Second preset value is preset value relevant to the IOPS.
Step 2024: if there are the OSD that the current value of node load parameter is greater than the second preset value, prisons in the Ceph clusterVisual organ then can determine that the Cpeh cluster is in busy condition.
Step 2025: if the node load parameter current value of all OSD is respectively less than that be equal to second default in the Ceph clusterValue, monitor then can determine that the Ceph cluster is in non-busy condition.
Mode two:
Monitor detects the node load parameter of each OSD in Ceph cluster, if there are node load ginsengs in the Ceph clusterSeveral current values is greater than the OSD of the second preset value, and monitor then can determine that the Cpeh cluster is in busy condition, if the CephThe current value of the node load parameter of all OSD, which is respectively less than, in cluster is equal to the second preset value, and monitor then can determine describedCeph cluster is in non-busy condition.
The advantages of mode one, is:
On the one hand, node load ginseng of the application according to each OSD in the cluster load parameter and Ceph cluster of Ceph clusterThe busy-idle condition to determine Ceph cluster is counted, is reflected on the whole in terms of each node two of Ceph cluster from Ceph clusterThe state of Ceph cluster, so that the reflection of Ceph cluster busy-idle condition is more fully.
On the other hand, compared with mode two, existing Ceph cluster can detect and record Ceph cluster overall load ginsengThe cluster load parameter of Ceph cluster can be read directly in number, monitor.And monitor will obtain the node load ginseng of each OSDNumber, it is necessary to be obtained from each OSD node.So reading node of the cluster load parameter of Ceph cluster than obtaining each OSDLoad parameter is more convenient.
So the method for employing mode one judges, it is greater than the first preset value in the cluster load parameter for determining Ceph clusterWhen, so that it may it determines that Ceph cluster is in busy condition, and does not have to the node load parameter for obtaining each OSD again, so significantlyThe speed of determining Ceph cluster busy-idle condition is saved.
It should be noted that the application is to " whether the quantity for detecting the corresponding normal OSD copy of the target PG is bigIn being equal to preset minimum number of copies " and the timing of " detecting the Ceph cluster busy-idle condition " specifically limited.
Step 203: if the quantity of the corresponding normal OSD copy of the target PG be more than or equal to preset minimum number of copies,And busy condition is in the Ceph cluster, then delay restores the data in the target PG.
In the embodiment of the present application, if monitor will not be immediately by target PG after currently Ceph cluster is in busy conditionMiddle data restore OSD corresponding to calculated target PG, but periodically detect the corresponding normal OSD pair of the target PGWhether this quantity is more than or equal to preset minimum number of copies and the detection Ceph cluster busy-idle condition, until target PGCorresponding counter part number is more than or equal to minimum number of copies, and when the state of current Ceph cluster be in non-busy condition, ability is rightData in target PG are restored.
When realizing, if the quantity of the corresponding normal OSD copy of the target PG be more than or equal to preset minimum number of copies,And busy condition is in the Ceph cluster, then the step of " detection first timer whether time-out " in return step 202,If first timer is overtime, step 203 is continued to execute, if first timer has not timed out, waits the first timer overtime.Until the quantity of the corresponding normal OSD copy of the target PG is less than the minimum number of copies or the Ceph cluster is inNon- busy condition starts to restore the data in the target PG.
For example, if the quantity of the corresponding normal OSD copy of the target PG is more than or equal to preset minimum number of copies and instituteIt states in Ceph cluster in busy condition, then checks whether first timer is overtime.
If first timer has not timed out, wait first timer overtime.If first timer is overtime, then detects targetWhether the corresponding normal OSD number of copies of PG is more than or equal to preset minimum number of copies, and the state of the current Ceph cluster of detection.If detecting, the corresponding normal OSD number of copies of target PG is more than or equal to preset minimum number of copies, and current Ceph cluster is inWhether overtime busy condition then reexamines first timer.
If first timer has not timed out, wait first timer overtime.If first timer is overtime, then detects targetWhether the corresponding normal OSD number of copies of PG is more than or equal to preset minimum number of copies, and the state of the current Ceph cluster of detection,And so on, until the corresponding counter part number of target PG is less than minimum number of copies, or detect that current Ceph cluster is inWhen non-busy condition, start to restore the data in target PG.
Step 204: if the corresponding normal OSD copy amount of the target PG is less than the minimum number of copies or describedCeph cluster is in non-busy condition, then restores to data to be restored in the target PG.
When realizing, when starting to restore data to be restored in target PG, start preset second timer;
In the second timer time-out, whether all data to be restored detected in the target PG are completed to restore.
If all data to be restored in target PG are completed to restore, terminate Data Recovery Process.
If all data to be restored in target PG are not completed to restore, stop to unrecovered data in target PGRestored, close second timer, and the step of " detection first timer " in return step 202.
If first timer is overtime, detect whether the current corresponding normal OSD copy amount of target PG is more than or equal toPreset minimum number of copies and the detection Ceph cluster load condition, if the corresponding normal OSD number of copies of the target PGAmount is less than the minimum number of copies or the Ceph cluster is in non-busy condition, then starts to unrecovered in target PGData are restored, and start second timer.
In second timer time-out, whether all data to be restored detected in the target PG are completed to restore.If meshAll data to be restored in mark PG are restored to complete, then terminate process.If all data to be restored in target PG are not completeAt recovery, then stops restoring unrecovered data current in target PG, close the second timer, and return to instituteThe step of stating " whether overtime detecting the first timer ", until all data to be restored in target PG are completed to restore.
Seen from the above description, on the one hand, when being changed due to the OSD topology concentrated in Ceph, the application is not verticalThe data in target PG to be restored are restored, but current Ceph cluster busy-idle condition are judged, trueWhen determining Ceph cluster and being in busy condition, delay restores the data in target PG, thus can be effectively prevented becauseData are restored and OSD machine utilization is caused to aggravate, influence the generation for handling this problem to Ceph cluster client terminal business.
On the other hand, the quantity of the application normal OSD copy also corresponding to target PG and default minimum number of copies carry outCompare, guarantees still there can be enough normal OSD copies to protect even if postponing recovery to data in target PG with thisCard processing is directed to the read-write business of target PG.
The third aspect, the method that data provided by the present application are restored can be compatible with existing Ceph cluster, such as can be withCompatible with the Crush algorithm of Ceph cluster etc., all data reconstruction methods provided by the present application have good compatibility.
It is the flow chart of another data reconstruction method shown in one exemplary embodiment of the application referring to Fig. 3, Fig. 3.It shouldMethod can be applicable on the monitor in Ceph cluster.
Step 301: when the OSD topology in Ceph cluster changes, determining the target PG that pending data are restored.
Step 302: starting first timer.
Step 303: whether detection first timer is overtime.
If first timer has not timed out, step 304 can be performed.
If first timer is overtime, 305 are thened follow the steps.
Step 304: monitor may wait for first timer time-out.
Step 305: monitor can detect whether the current corresponding normal OSD number of copies of target PG is greater than preset minimum pairThis number.
If the current corresponding normal OSD number of copies of target PG is more than or equal to preset minimum number of copies, then follow the steps306。
If the current corresponding normal OSD number of copies of target PG is less than preset minimum number of copies, 308 are thened follow the steps.
Step 306: whether the current value that monitor can detect the cluster load parameter of Ceph cluster is greater than the first preset value.
If the current value of the cluster load parameter of Ceph cluster is greater than the first preset value, return step 303.
If the current value of the cluster load parameter of Ceph cluster is less than equal to the first preset value, 307 are thened follow the steps.
Step 307: whether the current value that monitor can detect the node load parameter of each OSD in Ceph cluster is greater than theTwo preset values.
If the current value of the node load parameter of each OSD in Ceph cluster, which is respectively less than, is equal to the second preset value, executeStep 308.
If in Ceph cluster there are the current value of node load parameter be greater than the second preset value OSD when, return step303。
Step 308: monitor starts to restore the data of target PG, and starts second timer.
Step 309: after second timer time-out, monitor, which can detect all data to be restored in the target PG, isNo completion restores.
If all data to be restored in the target PG are completed to restore, 311 are thened follow the steps.
If thening follow the steps 310 there are unrecovered data in the target PG.
Step 310: monitor can stop restoring unrecovered data current in target PG, and close the second timingDevice.
After executing the step 310, return step 303.
Step 311: monitor can terminate the recovery to data in target PG.
The embodiment of the present application also provides Data Recapture Units corresponding with above-mentioned data reconstruction method.
Referring to fig. 4, Fig. 4 is a kind of block diagram of Data Recapture Unit shown in one exemplary embodiment of the application, the dataRecovery device can be applicable on monitor, it may include unit as follows.
Determination unit 401, the target for determining that pending data are restored put in order a group PG;
Detection unit 402, for detect the current corresponding normal OSD copy amount of target PG whether be more than or equal to it is defaultMinimum number of copies and the detection Ceph cluster load condition;
Delay cell 403, if being more than or equal to the minimum copy for the corresponding normal OSD copy amount of the target PGLoad condition is busy condition in number and the Ceph cluster, then postpones the number for restoring data pending in the target PGAccording to being restored.
Optionally, described device further include:
Recovery unit 404, if for the corresponding normal OSD copy amount of the target PG be less than the minimum number of copies,Or the Ceph cluster is in non-busy condition, then restores to data to be restored in the target PG.
Optionally, the detection unit 402, for detecting the cluster load for reflecting the Ceph cluster current loading conditionWhether the current value of parameter is greater than the first preset value;If so, determining that the Ceph cluster is in busy condition;If it is not, then intoThe detection of one step reflects the current value of the node load parameter of each OSD current loading condition in the Ceph cluster;If the CephThere are the OSD that the current value of node load parameter is greater than the second preset value in cluster, it is determined that the Ceph cluster is in busyState;If the current value of the node load parameter of all OSD, which is respectively less than, in the Ceph cluster is equal to second preset value,Determine that the Ceph cluster is in non-busy condition.
Optionally, described device further include:
Start unit 405, for starting preset timer;
Whether overtime the detection unit 402 is specifically used for the detection timer;If the timer expiry, detectionWhether the current corresponding normal OSD copy amount of target PG is more than or equal to described in preset minimum number of copies and detectionCeph cluster load condition;
The delay cell 403, if being specifically used for the corresponding normal OSD copy amount of the target PG more than or equal to describedBusy condition is in minimum number of copies and the Ceph cluster, then return the detection timer whether Chao Shi stepSuddenly.
Optionally, the recovery unit 404, it is specific to use when data to be restored are restored in the target PGIn when starting to restore data to be restored in target PG, start preset second timer;In second timingWhen device time-out, whether all data to be restored detected in the target PG are completed to restore;If it is not, then stopping in target PGUnrecovered data are restored, and the second timer is closed, and return the detection timer whether Chao Shi stepSuddenly.
Optionally, the cluster load parameter includes: the Ceph cluster current business IO quantity and the Ceph clusterThe ratio of current all IO quantity;
The node load parameter includes: hard disk utilization, the number IPOS per second being written and read.
Optionally, described device further include:
Computing unit 406, for calculating OSD group corresponding to each PG in the Ceph cluster;
The determination unit 401 is specifically used for being directed to each PG, if the corresponding OSD group of calculated PG is worked as with the PGPreceding corresponding OSD group is inconsistent, it is determined that the PG is the target PG that pending data are restored.
Correspondingly, present invention also provides the hardware structure diagrams of 4 shown device of corresponding diagram.
Monitor described herein can be made of a physical server, be also possible to by more with physical custody deviceThe virtual monitor that platform physical server is invented.When the monitor is physical custody device, the hardware configuration of the monitorFigure can be as shown in Figure 5.
It is a kind of hardware structure diagram of monitor shown in one exemplary embodiment of the embodiment of the present application referring to Fig. 5, Fig. 5.
The monitor includes: communication interface 501, processor 502, machine readable storage medium 503 and bus 504;Wherein,Communication interface 501, processor 502 and machine readable storage medium 503 complete mutual communication by bus 504.Processor502 by reading and executing machine-executable instruction corresponding with data recovery control logic in machine readable storage medium 503,Above-described data reconstruction method can be performed.
Machine readable storage medium 503 referred to herein can be any electronics, magnetism, optics or other physical storesDevice may include or store information, such as executable instruction, data, etc..For example, machine readable storage medium may is that easilyLose memory, nonvolatile memory or similar storage medium.Specifically, machine readable storage medium 503 can be RAM(Radom Access Memory, random access memory), flash memory, memory driver (such as hard disk drive), solid state hard disk,Any kind of storage dish (such as CD, DVD) perhaps similar storage medium or their combination.
The function of each unit and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatusRealization process, details are not described herein.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method realityApply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unitThe unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be withIt is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actualThe purpose for needing to select some or all of the modules therein to realize application scheme.Those of ordinary skill in the art are not payingOut in the case where creative work, it can understand and implement.
The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the applicationWithin mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.