Disclosure of Invention
The invention aims to solve the technical problems that in the prior art, a data balancing method, a device, a terminal and a storage medium of a distributed storage system are provided for overcoming the defects of the prior art, and the problems that in the prior art, due to the fact that data storage amounts in various allocation groups are different, some magnetic disks are not available easily, and therefore resource waste is caused are solved.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a data balancing method for a distributed storage system, the method comprising:
acquiring data use information of each allocation group in a distributed storage system, and determining a data use low-valley period based on the data use information, wherein the data use low-valley period reflects a period with the least data use amount of the distributed storage system;
acquiring data storage amounts corresponding to the various homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts;
and based on the data using the valley period, migrating the preset data quantity in the data release allocation group to the data receiving allocation group.
In one implementation, the acquiring the data usage information of each homing group in the distributed storage system, and determining the data usage valley period based on the data usage information includes:
recording data use information of all the homing groups in a period by taking the time of a day as the period, and determining the data use amount of each moment of all the homing groups based on the data use information;
and comparing the data use quantity at each moment with a preset use threshold value quantity, and determining the data use valley time period.
In one implementation, the comparing the data usage amount at each time with a preset usage threshold amount, and determining the data usage valley period includes:
comparing the data usage amount of each moment with the usage threshold amount respectively, and screening out a plurality of target moments of which the data usage amount is smaller than the usage threshold amount;
splicing adjacent moments in the target moments according to a time sequence to obtain a plurality of candidate time periods;
and respectively acquiring the data use amount corresponding to each candidate time period, and taking the candidate time period with the minimum data use amount as the data use valley time period.
In one implementation manner, the acquiring the data storage amount corresponding to each homing group, and determining the data receiving homing group and the data releasing homing group based on the data storage amount, includes:
acquiring a preset first storage capacity threshold value and a preset second storage capacity threshold value, wherein the first storage capacity threshold value is larger than the second storage capacity threshold value;
respectively comparing the data storage amount corresponding to each reset group with the first storage amount threshold value and the second storage amount threshold value;
determining a plurality of homing groups of which the data storage amount is larger than the first storage amount threshold value to obtain a first candidate homing group;
determining a plurality of homing groups of which the data storage amount is smaller than the second storage amount threshold value to obtain a second candidate homing group;
and determining the data receiving and releasing allocation groups based on the first and second candidate allocation groups.
In one implementation, the determining the data reception and release homing groups based on the first and second candidate homing groups includes:
sorting a plurality of the first candidate preset groups from high to low in data storage amount to obtain the preset group with the highest data storage amount, and taking the preset group with the highest data storage amount as the data release preset group;
and sequencing a plurality of the second candidate preset groups from low data storage to high data storage to obtain the preset group with the lowest data storage, and taking the preset group with the lowest data storage as the data receiving preset group.
In one implementation, the migrating the preset data amount in the data release configuration group to the data receiving configuration group based on the data usage valley period includes:
acquiring intersection of the data using the valley time period and a preset configuration time period to obtain a data migration time period;
and in the data migration time period, migrating the preset data quantity in the data release allocation group to the data receiving allocation group.
In one implementation manner, the migrating, during the data migration period, the preset data amount in the data release configuration group to the data receiving configuration group includes:
determining a data quantity difference value according to the data release allocation group and the data receiving allocation group;
taking half of the data volume difference value as the preset data volume;
and in the data migration time period, migrating the preset data quantity from the data release allocation group to the data receiving allocation group.
In a second aspect, an embodiment of the present invention further provides a data balancing apparatus of a distributed storage system, where the apparatus includes:
the data use analysis module is used for acquiring data use information of each homing group in the distributed storage system and determining a data use low-valley time period based on the data use information, wherein the data use low-valley time period reflects a time period with the minimum data use amount of the distributed storage system;
the data storage quantity determining module is used for obtaining the data storage quantity corresponding to each resetting group and determining a data receiving resetting group and a data releasing resetting group based on the data storage quantity;
and the data migration module is used for migrating the preset data quantity in the data release allocation group to the data receiving allocation group based on the data using valley time period.
In a third aspect, an embodiment of the present invention further provides a terminal, where the terminal includes a memory, a processor, and a data balancing program of a distributed storage system stored in the memory and capable of running on the processor, and when the processor executes the data balancing program of the distributed storage system, the processor implements the steps of the data balancing method of the distributed storage system in any one of the foregoing schemes.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a data balancing program of a distributed storage system is stored on the computer readable storage medium, where the data balancing program of the distributed storage system is executed by a processor, to implement the steps of the data balancing method of the distributed storage system according to any one of the foregoing solutions.
The beneficial effects are that: compared with the prior art, the invention provides a data balancing method of a distributed storage system, which comprises the steps of firstly acquiring data use information of each homing group in the distributed storage system, and determining a data use low-valley time period based on the data use information, wherein the data use low-valley time period reflects a time period with the minimum data use amount of the distributed storage system. And then, acquiring data storage amounts corresponding to the respective homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts. Then, based on the data using the valley period, a preset amount of data in the data release configuration group is migrated to the data reception configuration group. The invention can analyze the data use condition of each of the preset groups, and transfer the data of the preset groups with more data storage quantity to the preset groups with less data storage quantity, so that the data storage quantity in each preset group is not different, and the data balance among the preset groups is realized.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment provides a data balancing method of a distributed storage system, which can adjust the data storage amount in each of the allocation groups based on the method of the embodiment, and ensure that the data storage amounts in each of the allocation groups are not different, thereby realizing data balancing. In specific application, the embodiment first obtains data use information of each allocation group in the distributed storage system, and determines a data use low-valley period based on the data use information, where the data use low-valley period reflects a period with the minimum data use amount of the distributed storage system. And then, acquiring data storage amounts corresponding to the respective homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts. Then, based on the data using the valley period, a preset amount of data in the data release configuration group is migrated to the data reception configuration group. According to the embodiment, the data use condition of each of the preset groups can be analyzed, and the data of the preset groups with more data storage capacity are migrated to the preset groups with less data storage capacity, so that the data storage capacity in each preset group is not different, and data balance among the preset groups is realized.
The data balance of the distributed storage system of the embodiment can be applied to terminals, wherein the terminals comprise intelligent product terminals such as computers, mobile phones and intelligent televisions. Specifically, as shown in fig. 1, in this embodiment, the data balancing method of the distributed storage system includes the following steps:
step S100, acquiring data use information of each allocation group in the distributed storage system, and determining a data use low-valley period based on the data use information, wherein the data use low-valley period reflects a period with the least data use amount of the distributed storage system.
The terminal firstly acquires the data use information of each allocation group in the distributed storage system, and the data use information can be used for reflecting the use condition of each allocation group on the data. For example, the data usage amount of each of the homing groups at each time point. Furthermore, the embodiment can determine the time period with the least data usage of the distributed storage system, namely the data usage valley time period. According to the embodiment, the idle time periods of the reset groups can be determined by analyzing the data using valley time periods, and the space time periods can be used for data migration, so that the influence on the client side and the read-write speed of the client side can be avoided when the data migration is carried out.
In one implementation, the present embodiment includes the following steps when determining that the data uses the valley period:
step S101, taking the time of a day as a period, recording data use information of all the homing groups in the period, and determining the data use amount of each moment of all the homing groups based on the data use information;
step S102, comparing the data usage amount at each moment with a preset usage threshold amount, and determining the data usage valley time period.
In particular application, the embodiment records the data usage information of each homing group at each time in one day, and since the data usage information reflects the data usage amount and the data storage amount at the corresponding time, the data usage period of the data usage valley can be determined by comparing the data usage amount at each time with the preset usage threshold amount. The usage threshold amount in this embodiment is used to measure whether the usage amount of data of each of the preset groups meets the requirement, and if the usage amount of data at a certain time is greater than the usage threshold amount, it is indicated that the usage amount of data at the time is greater, and the time is not in the valley period. Conversely, if the amount of data usage at a time instant is less than the usage threshold amount, it is indicated that the amount of data usage at that time instant is relatively small, and that time instant is the off-peak period.
Specifically, the embodiment may record the data usage amount at each time, and then compare the data usage amount at each time with the usage threshold amount, and screen out a plurality of target time points where the data usage amount is smaller than the usage threshold amount. These target moments are time points, and the moments of the present embodiment may be set every 5 minutes, and the target moments are time points, for example, 14:20, 14:25;14:30, etc. Next, in this embodiment, adjacent time instants among the plurality of target time instants are spliced according to a time sequence, so as to obtain a plurality of candidate time periods. Since the present embodiment analyzes the data usage amount at a single time, it may occur that the data usage amount at some time is greater than the usage threshold amount and the data usage amount at some time is less than the usage threshold amount. In this embodiment, adjacent time instants may be spliced according to a time sequence to obtain a plurality of candidate time periods, where the duration of each candidate time period may be the same or different. In practical application, when a certain time in the target time is isolated and there is no time adjacent to the certain time, the certain time can be a candidate time period alone, and the duration of the candidate time period can be a duration of setting each time at intervals, such as 5 minutes. Next, the present embodiment obtains the data usage amount corresponding to each candidate time period, that is, sums the data usage amounts at each time under each time period, so as to obtain the data usage amount corresponding to each candidate time period. Then, the present embodiment uses the candidate period with the smallest data usage amount as the data usage valley period. The data uses the least amount of data used in the valley period, that is, the idle period of each of the homing groups.
Step 200, obtaining data storage amounts corresponding to the respective homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts.
Next, the embodiment acquires the data storage amount of each of the configuration groups, where the data storage amount may be the remaining data amount after a period of use of the data, or may be the total data amount when the data has not been used. The terminal may determine which of the homing groups has more data storage and which homing group has less data storage based on the data storage. The object of the present embodiment is to achieve data balance between respective sorting groups such that the data storage amounts in the respective sorting groups do not differ much, and therefore, a sorting group with a large amount of determined data storage amounts can be used as a data release sorting group for releasing data. And taking the determined reset group with less data storage amount as a data receiving reset group for receiving data, thereby realizing data balance.
In one implementation, the method includes the following steps when determining the data reception configuration combination data release configuration group:
step S201, a preset first storage capacity threshold value and a preset second storage capacity threshold value are obtained, wherein the first storage capacity threshold value is larger than the second storage capacity threshold value;
step S202, respectively comparing the data storage amount corresponding to each reset group with the first storage amount threshold and the second storage amount threshold;
step S203, determining a plurality of preset groups of which the data storage amount is larger than the first storage amount threshold value to obtain a first candidate preset group;
step S204, determining a plurality of preset groups of which the data storage amount is smaller than the second storage amount threshold value to obtain a second candidate preset group;
step S205, determining the data receiving and releasing allocation groups based on the first and second candidate allocation groups.
In specific application, the embodiment obtains a preset first storage capacity threshold value and a preset second storage capacity threshold value, wherein the first storage capacity threshold value is larger than the second storage capacity threshold value. The first storage threshold is used for screening out the preset groups with more data storage, and the second storage threshold is used for screening out the preset groups with less data storage. The embodiment compares the data storage amount corresponding to each preset group with the first storage amount threshold value and the second storage amount threshold value respectively. Determining a plurality of homing groups with the data storage capacity larger than the first storage capacity threshold value to obtain a first candidate homing group; and determining a plurality of homing groups of which the data storage amount is smaller than the second storage amount threshold value to obtain a second candidate homing group. At this time, the first candidate placement group is a placement group with a large data storage amount, and the second candidate placement group is a placement group with a small data storage amount. Then, the embodiment may sort the plurality of preset groups in the first candidate preset groups from high to low data storage amount, obtain the preset group with the highest data storage amount, and use the preset group with the highest data storage amount as the data release preset group. And then, sorting a plurality of the second candidate preset groups from low data storage to high data storage to obtain the preset group with the lowest data storage, and taking the preset group with the lowest data storage as the data receiving preset group. Therefore, the embodiment obtains the most data storage group and the least data storage group, and the data storage amounts in the two groups are relatively different, so that the embodiment can take the most data storage group as the data release group and the least data storage group as the data receiving group, thereby being convenient for taking the most data storage group with the low data release value. The data storage amount is relatively balanced between the first storage amount threshold value and the second storage amount threshold value, and the data storage amount is not greatly different, so that the data balance is realized.
In another implementation manner, after determining the first candidate allocation group and the second candidate allocation group, the embodiment may release allocation groups as data for each allocation group in the first candidate allocation group, and then use each allocation group in the second candidate allocation group as a data receiving group. When data migration is performed in the subsequent step, the homing group with the highest data storage amount in the first candidate homing group and the homing group with the lowest data storage amount in the second candidate homing group can form a data migration group, and data migration is performed between the homing groups; and forming a data migration group by arranging the second data storage amount in the first candidate arranging group and the last second data storage amount in the second candidate arranging group, and performing data migration among the first candidate arranging group, and the second candidate arranging group, forming a plurality of data migration groups and performing data migration so as to realize data balance.
And step S300, migrating the preset data quantity in the data release configuration group to the data receiving configuration group based on the data using the valley period.
After determining the data release allocation group and the data receiving allocation group, the embodiment can migrate the preset data amount in the data release allocation group to the data receiving allocation group in the data use valley period so as to realize data balance.
In one implementation manner, the data migration method in this embodiment includes the following steps:
step S301, the intersection of the data using the valley time period and a preset configuration time period is taken, and a data migration time period is obtained;
step S302, in the data migration time period, migrating the preset data quantity in the data release allocation group to the data receiving allocation group.
Specifically, the present embodiment may first acquire a configuration period, which is preset based on a historical usage time of a disk (OSD), the configuration period being an idle period of the disk. In this embodiment, the intersection of the data use valley period and the preset configuration period is taken, so as to obtain a data migration period, where the data migration period uses a relatively small period or an idle period of the disk, so that the data migration in the data migration period can be ensured not to affect the client and not to affect the read-write speed of the client. When data migration is performed, the embodiment determines a data quantity difference value according to the data release allocation group and the data receiving allocation group. Then, half of the data amount difference is taken as the preset data amount. And finally, migrating the preset data quantity from the data release allocation group to the data receiving allocation group in the data migration time period.
In summary, the present embodiment first obtains data usage information of each of the allocation groups in the distributed storage system, and determines a data usage low-valley period based on the data usage information, where the data usage low-valley period reflects a period with a minimum data usage amount of the distributed storage system. And then, acquiring data storage amounts corresponding to the respective homing groups, and determining a data receiving homing group and a data releasing homing group based on the data storage amounts. Then, based on the data using the valley period, a preset amount of data in the data release configuration group is migrated to the data reception configuration group. According to the embodiment, the data use condition of each of the preset groups can be analyzed, and the data of the preset groups with more data storage capacity are migrated to the preset groups with less data storage capacity, so that the data storage capacity in each preset group is not different, and data balance among the preset groups is realized.
Based on the above embodiment, the present invention further provides a data balancing apparatus of a distributed storage system, as shown in fig. 2, where the apparatus includes: the data usage analysis module 10, the set-up group determination module 20, and the data migration module 30. Specifically, the data usage analysis module 10 is configured to obtain data usage information of each of the configuration groups in the distributed storage system, and determine a data usage low-valley period based on the data usage information, where the data usage low-valley period reflects a period in which the data usage of the distributed storage system is minimum. The reset group determining module 20 is configured to obtain data storage amounts corresponding to the reset groups, and determine a data receiving reset group and a data releasing reset group based on the data storage amounts. The data migration module 30 is configured to migrate, based on the data usage valley period, a preset data amount in the data release configuration group to the data reception configuration group.
In one implementation, the data usage analysis module 10 includes:
a usage amount acquisition unit configured to record data usage information of all the allocation groups in a period with a time of day as a period, and determine a data usage amount of each time of all the allocation groups based on the data usage information;
and the usage amount analysis unit is used for comparing the data usage amount at each moment with a preset usage threshold amount and determining the data usage valley time period.
In one implementation, the usage analysis unit includes:
the target time determining subunit is used for respectively comparing the data use quantity of each time with the use threshold quantity and screening out a plurality of target times of which the data use quantity is smaller than the use threshold quantity;
a candidate time period determining subunit, configured to splice adjacent time instants in the plurality of target time instants according to a time sequence, so as to obtain a plurality of candidate time periods;
and the low-valley time period determining subunit is used for respectively acquiring the data use amount corresponding to each candidate time period and taking the candidate time period with the least data use amount as the data use low-valley time period.
In one implementation, the homing group determination module 20 includes:
the storage capacity threshold value acquisition unit is used for acquiring a preset first storage capacity threshold value and a preset second storage capacity threshold value, wherein the first storage capacity threshold value is larger than the second storage capacity threshold value;
the storage quantity comparison unit is used for respectively comparing the data storage quantity corresponding to each reset group with the first storage quantity threshold value and the second storage quantity threshold value;
the first candidate allocation group determining unit is used for determining a plurality of allocation groups of which the data storage amount is larger than the first storage amount threshold value to obtain a first candidate allocation group;
the second candidate allocation group determining unit is used for determining a plurality of allocation groups of which the data storage amount is smaller than the second storage amount threshold value to obtain a second candidate allocation group;
and the sorting unit is used for determining the data receiving sorting group and the data releasing sorting group based on the first candidate sorting group and the second candidate sorting group.
In one implementation, the homing group screening unit includes:
the data release and allocation group determining subunit is used for sequencing a plurality of allocation groups in the first candidate allocation group from high data storage amount to low data storage amount to obtain the allocation group with the highest data storage amount, and taking the allocation group with the highest data storage amount as the data release and allocation group;
and the data receiving and arranging group determining subunit is used for sorting a plurality of arranging groups in the second candidate arranging group from low data storage amount to high data storage amount to obtain the arranging group with the lowest data storage amount, and taking the arranging group with the lowest data storage amount as the data receiving and arranging group.
In one implementation, the data migration module 30 includes:
the time period analysis unit is used for acquiring an intersection of the data use valley time period and a preset configuration time period to obtain a data migration time period;
and the data migration execution unit is used for migrating the preset data quantity in the data release configuration group to the data receiving configuration group in the data migration time period.
In one implementation, the data migration execution unit includes:
a difference value determining subunit, configured to determine a data amount difference value according to the data release allocation group and the data reception allocation group;
a preset data amount determining subunit, configured to take half of the data amount difference as the preset data amount;
and the data migration subunit is used for migrating the preset data quantity from the data release allocation group to the data receiving allocation group in the data migration time period.
The working principle of each module in the data balancing device of the distributed storage system in this embodiment is the same as that of each step in the above method embodiment, and will not be described here again.
Based on the above embodiment, the present invention also provides a terminal, and a schematic block diagram of the terminal may be shown in fig. 3. The terminal may include one or more processors 100 (only one shown in fig. 3), a memory 101, and a computer program 102 stored in the memory 101 and executable on the one or more processors 100, such as a data balancing program of a distributed storage system. The one or more processors 100, when executing computer program 102, may implement the various steps in a data balancing method embodiment of a distributed storage system. Alternatively, the functions of the modules/units in the data balancing apparatus embodiment of the distributed storage system may be implemented by one or more processors 100 when executing computer program 102, which is not limited herein.
In one embodiment, the processor 100 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In one embodiment, the memory 101 may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. The memory 101 may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the electronic device. Further, the memory 101 may also include both an internal storage unit and an external storage device of the electronic device. The memory 101 is used to store computer programs and other programs and data required by the terminal. The memory 101 may also be used to temporarily store data that has been output or is to be output.
It will be appreciated by those skilled in the art that the functional block diagram shown in fig. 3 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the terminal to which the present inventive arrangements may be applied, as a specific terminal may include more or less components than those shown, or may be combined with some components, or may have a different arrangement of components.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium, that when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, operational database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual operation data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.