CN120179176B

Movatterモバイル変換

Info

Publication number: CN120179176B
Application number: CN202510312385.3A
Authority: CN
Inventors: 朱箭; 李林
Original assignee: Yuge Electronic Technology Co ltd
Current assignee: Yuge Electronic Technology Co ltd
Priority date: 2025-03-17
Filing date: 2025-03-17
Publication date: 2025-08-26
Anticipated expiration: 2045-03-17
Also published as: CN120179176A

Abstract

The invention provides a data full flash memory optimizing method and system based on cloud computing, which collects multidimensional performance data sets of all flash memory cluster nodes including memory space fragmentation rate, input and output request queue depth and network link bandwidth utilization rate in real time, receives a user terminal memory operation request stream, analyzes a data block identification set to be stored and an operation mode label, accesses a record base based on the traversal history of the data block identification to be stored, and extracting the adjacency of the historical access time sequence characteristic and the physical storage position of the associated data block, constructing a dynamic access heat prediction model in a preset time period by combining the multidimensional performance data and the historical access time sequence characteristic, and finally generating a cross-node fragmented storage topological graph and a cache preloading strategy matrix according to the adjacency of the dynamic access heat prediction model and the physical storage position, so as to optimize full flash storage and improve the data storage performance in a cloud computing environment.

Description

Cloud computing-based data full-flash memory optimization method and system

Technical Field

The invention relates to the technical field of cloud computing, in particular to a data full-flash memory optimization method and system based on cloud computing.

Background

With the rapid development of cloud computing technology, data volume shows explosive growth, and higher requirements are put on the performance, reliability and efficiency of data storage. Full flash memory is becoming an important choice for data storage in cloud computing environments due to its high-speed data read-write performance. However, in practical applications, full flash memory systems face many challenges, and it is difficult for existing solutions to meet the increasing complex demands.

In the current cloud computing data storage field, for optimizing a full flash storage cluster, when a storage operation request of a user terminal is processed, a traditional method generally allocates storage positions simply according to fixed rules, and deep analysis on characteristics of data blocks in the request and historical access conditions is lacked. For example, the historical access time sequence characteristics of the data blocks to be stored are not considered, so that the data storage distribution lacks scientificity, and reasonable arrangement cannot be performed according to the actual access frequency and rule of the data, and the read-write efficiency of the data is further affected. Meanwhile, the prior art also pays little attention to the physical storage position adjacency of the associated data blocks, so that the distribution of the data in the storage clusters is scattered, and the seek time and the network transmission overhead during data access are increased.

Disclosure of Invention

In view of the above-mentioned problems, in combination with the first aspect of the present invention, an embodiment of the present invention provides a method for optimizing data full flash memory based on cloud computing, where the method includes:

Collecting multidimensional performance data sets of all nodes in a full-flash storage cluster in real time, wherein the multidimensional performance data sets comprise node storage space fragmentation rate, input/output request queue depth and network link bandwidth utilization rate;

Receiving a storage operation request stream sent by a user terminal, and analyzing a to-be-stored data block identification set and a corresponding operation mode label in the storage operation request stream;

based on the data block identification set traversal history access record library to be stored, extracting the history access time sequence characteristics of the data block to be stored and the physical storage position adjacency of the associated data block;

according to the multidimensional performance data set and the historical access time sequence characteristics, constructing a dynamic access heat prediction model of the data block to be stored in a preset time period;

and generating a cross-node fragment storage topological graph and a cache preloading strategy matrix based on the dynamic access heat prediction model and the physical storage position adjacency.

For example, the accessing the record base based on the history of the identified set of data blocks to be stored, extracting the historical access timing characteristics of the data blocks to be stored and the physical storage location proximity of the associated data blocks, includes:

according to each data block identifier in the data block identifier set to be stored, matching a corresponding historical access record in the historical access record library, wherein the historical access record comprises an access timestamp set and an associated data block identifier subset;

extracting an access time stamp sequence in the history access record, and calculating a periodic access interval distribution parameter and a time stamp set of the sudden access event based on the access time stamp sequence;

screening out an associated data block identification set which has the common access times exceeding a cooperative threshold value with the data block to be stored in a preset time window from the associated data block identification subset;

Acquiring a physical storage coordinate set of the associated data block identification set in the all-flash storage cluster, wherein the physical storage coordinate set comprises a storage node identification and a logic unit address;

And generating a physical storage position proximity matrix of the associated data block based on the topological connection relation between storage node identifiers in the physical storage coordinate set and the distance difference value between logic unit addresses.

In yet another aspect, an embodiment of the present invention further provides a data all-flash memory optimization system based on cloud computing, which includes a processor, and a machine-readable storage medium, where the machine-readable storage medium is connected to the processor, and the machine-readable storage medium is used to store a program, an instruction, or a code, and the processor is used to execute the program, the instruction, or the code in the machine-readable storage medium, so as to implement the method described above.

Based on the above aspects, after the multi-dimensional performance data set of each node in the full flash storage cluster is collected in real time, a storage operation request stream of a user terminal is received, a to-be-stored data block identification set and an operation mode label are analyzed, a history access record library is traversed based on the to-be-stored data block identification set, history access time sequence characteristics and the physical storage position adjacency of the associated data block are extracted, the history access characteristics of the data block and the spatial relation between the data block and the peripheral data block are deeply mined, and furthermore, a dynamic access heat prediction model of the to-be-stored data block in a preset time period is constructed according to the multi-dimensional performance data set and the history access time sequence characteristics, the dynamic access heat prediction model fuses real-time performance data and a history access rule, the access heat of the data block in a future period can be predicted dynamically and accurately, the variability and the complexity of data access in a cloud computing environment can be better adapted, and the pre-judging capability of the data access trend is effectively improved. And finally, generating a cross-node partitioned storage topological graph and a cache preloading strategy matrix based on the dynamic access heat prediction model and the physical storage position adjacency, so that the optimal allocation and the efficient utilization of storage resources are realized. The cross-node fragmented storage topological graph considers the access heat and the physical storage position relation of the data blocks, reasonably plans the storage distribution of the data among different nodes, effectively balances the load pressure of each node, reduces the fragmentation degree of the storage space and improves the read-write performance of the whole storage cluster. Meanwhile, the cache preloading strategy matrix loads the data blocks which are likely to be frequently accessed into the cache in advance according to the predicted access heat, so that the waiting time of data access is greatly reduced, and the response speed and the user experience of the system are remarkably improved.

Drawings

Fig. 1 is a schematic diagram of an execution flow of a cloud computing-based data full flash memory optimization method according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of exemplary hardware and software components of a cloud computing based data all-flash memory optimization system provided by an embodiment of the present invention.

Detailed Description

The present invention is specifically described below with reference to the accompanying drawings, and fig. 1 is a schematic flow chart of a cloud computing-based data full flash memory optimization method according to an embodiment of the present invention, and the cloud computing-based data full flash memory optimization method is described in detail below.

Step S110, collecting multidimensional performance data sets of all nodes in the full flash storage cluster in real time, wherein the multidimensional performance data sets comprise node storage space fragmentation rate, input and output request queue depth and network link bandwidth utilization rate.

In detail, for a data center of a certain enterprise, a full flash storage cluster may be deployed to store massive business data, such as financial data, customer information, sales records, and the like of the enterprise. The full flash cluster contains a plurality of nodes, each of which is responsible for storing and processing a portion of data.

In this scenario, the collection process of the node storage space fragmentation rate is as follows, as the enterprise continuously performs data storage, deletion and modification operations, the storage space in the node will gradually become fragmented. For example, when the financial department frequently updates financial statement data, many small blocks of free space may be formed in the storage space instead of a continuous large block of free space, and the node storage space fragmentation rate is obtained by calculating the proportion of these fragmented spaces.

For the depth of the input/output request queue, it is assumed that sales departments of the enterprise perform statistics and reporting of sales data in a month end set of each month, during which a large number of read/write requests are sent to the full flash storage cluster. Each node receives these requests and places the requests in the incoming and outgoing request queues in order of arrival. Thus, the number of requests in the queue, i.e., the input-output request queue depth, is monitored in real time. For example, if during peak sales data processing, the input-output request queue depth of a node may reach hundreds or even thousands of requests.

The collection of network link bandwidth utilization is related to the transmission of data between nodes. In detail, data may be shared between different departments of an enterprise, and when a customer service department obtains customer information from a storage cluster to provide services to customers, the data needs to be transmitted between nodes through network links. In this process, the ratio of the actual transmission data amount of the network link to the total bandwidth of the link in unit time can be monitored, which is the bandwidth utilization of the network link. For example, if the total bandwidth of the network link is 1000Mbps and the actual transmission data amount is 500Mbps at a certain time, the bandwidth utilization of the network link at this time is 50%.

Step S120, a storage operation request stream sent by a user terminal is received, and a to-be-stored data block identification set and a corresponding operation mode label in the storage operation request stream are analyzed.

Continuing with the above-described enterprise data center as an example, the user terminal may be a computer, a server, or other devices used by employees of various departments within the enterprise. Assume that the market segment of the enterprise wants to store a new market research report into the full flash storage cluster. Market department personnel send a stream of storage operation requests to the storage cluster through specific software (user terminals).

The storage operation request stream contains a set of identifiers of data blocks to be stored, for example, a market research report may be stored in a plurality of data blocks, each data block having a unique identifier. At the same time, the request stream also carries a corresponding operation mode tag, which may be "write", indicating that this is a store (write) operation. Therefore, after the storage operation request stream is received, the storage operation request stream is analyzed, and the operation mode label of the data block identification set to be stored and the operation mode label of writing in is accurately extracted so as to be processed according to the information.

Step S130, based on the data block identification set traversal history access record library to be stored, extracting the history access time sequence characteristics of the data block to be stored and the physical storage position adjacency of the associated data block.

Still based on the enterprise data center scenario, a store operation for market research reports. For example, the matching may be performed in a history access record base according to each data block identification in the set of data block identifications to be stored. It is assumed that the history access record library records access to all data in the past of the enterprise.

For the extraction of the historical access timing characteristics, taking one of the data blocks as an example, if the data block is displayed in the past access record, it is frequently accessed at 9 to 10 am every workday, which constitutes an access time stamp sequence. Based on the sequence of access time stamps, a periodic access interval distribution parameter can be calculated, for example, an access peak is found on average every 7 days, which is the periodic access interval distribution parameter. Meanwhile, if there is a sudden large number of accesses during a particular event (such as a new product release period of a company), the set of time stamps of the sudden access event may be recorded.

For the physical storage location proximity of the associated data block, a set of associated data block identifications that have been accessed together with the data block to be stored within a preset time window (e.g., the past month) more than a synergy threshold (say 10 times) is selected from the subset of associated data block identifications. For example, some market analysis data blocks related to market research reports, which have been commonly accessed 15 times in the past month. These associated data block identifications are then acquired to identify a set of physical storage coordinates that are grouped in a full flash storage cluster. Suppose that a data block of a market research report is stored at a location with a logical unit address of 100 for node A, and an associated market analysis data block is stored at a location with a logical unit address of 105 for node A and a logical unit address of 200 for node B. Based on the topological connection relation between the storage node identifications (the node A and the node B are connected through a high-speed network) and the distance difference value between the logic unit addresses (the logic unit addresses 100 and 105 are relatively close to each other and relatively far from 200), a physical storage position proximity matrix of the associated data block is generated.

And step 140, constructing a dynamic access heat prediction model of the data block to be stored in a preset time period according to the multi-dimensional performance data set and the historical access time sequence characteristics.

Still taking the above scenario as an example, a periodic access peak interval (e.g., 9 to 10 am every workday), an emergency access event timestamp (e.g., access peak during new product release), and an access interval distribution statistic (e.g., an average of one access peak every 7 days) are extracted from the historical access timing characteristics.

Assuming that the operating mode label corresponding to the market research report is "written," some of the data contained therein may be read frequently later for adjustment of the market policy. And identifying the read-write operation proportion corresponding to the operation mode label, and activating the hot spot data prediction mark when the read operation proportion exceeds a preset threshold (60% is assumed). For example, after analysis, it is found that 80% of the market share analysis portion in the market research report is likely to be read later, exceeding a preset threshold of 60%, so that the hot spot data predictive marker is activated.

And generating a time dimension access probability density function by combining the periodic access peak interval and the sudden access event time stamp. For example, if the access probability at 9 to 10 am is 0.3, the access probability corresponding to the sudden access event during the release of the new product is 0.2, and the time-dimension access probability density function is constructed based on these data.

And correcting the weight parameter of the time dimension access probability density function based on the access interval distribution statistic value and the hot spot data predictive marker. The access interval distribution statistics are assumed to indicate that the access probability gradually decreases between two access peaks, and the weight parameters of the time-dimension access probability density function are adjusted according to the situation.

And carrying out coupling calculation on the corrected time dimension access probability density function and the network link bandwidth utilization rate in the multidimensional performance data set. For example, a bandwidth fluctuation dataset of network link bandwidth utilization over a plurality of historical time windows is extracted from a multi-dimensional performance dataset, assuming a peak bandwidth utilization interval of 80% from 9 am to 10 am on weekdays, and an average transmission rate of 500Mbps. And carrying out alignment mapping on the bandwidth utilization peak interval and a time axis of the time dimension access probability density function to generate a time-synchronous bandwidth utilization distribution sequence. Based on the average transmission rate, a bandwidth weight factor of the time-dimension access probability density function at each time unit is calculated. And carrying out weighted correction on the access probability value of the time dimension access probability density function in the corresponding time unit according to the bandwidth weight factor to generate a bandwidth-aware access probability density distribution function. And carrying out time dimension normalization processing on the bandwidth-aware access probability density distribution function, and generating a heat level parameter of the dynamic access heat prediction model based on the maximum probability value after normalization and the slope change of the probability distribution curve, wherein the heat level parameter can be used for representing the access heat condition of the market research report data block in a future preset time period.

Step S150, generating a cross-node slicing storage topological graph and a cache preloading strategy matrix based on the dynamic access heat prediction model and the physical storage position adjacency.

In detail, for the storage of the market research report data block, determining a fragmentation redundancy threshold and the minimum copy number of the data block to be stored according to the heat level parameter output by the dynamic access heat prediction model. Assuming a higher level of heat, a slice redundancy threshold of 3 is determined (indicating that a data block can be stored in up to 3 slices), and the minimum number of copies is 2 (at least 2 copies per slice).

And traversing the storage space fragmentation rate and the input/output request queue depth of all nodes in the full-flash storage cluster, and screening candidate node subsets meeting the fragmentation capacity constraint. For example, node a may be selected as a subset of candidate nodes if its storage space fragmentation rate is low, its input-output request queue depth is within an acceptable range, and there is sufficient space to store fragments of market research report data blocks.

Based on the physical storage location proximity, a storage location association score is calculated for each node in the subset of candidate nodes. Assume that node a is closer to the associated data block storage location, and the storage location association score is higher. And constructing a storage path weight table among the nodes according to the storage position association scores and the network link bandwidth utilization rate in the multidimensional performance data set. For example, if node a to node B network link bandwidth utilization is high and the storage location association score is high, then the storage path weight value between them is high.

Based on the stored path weight table and the minimum number of copies, a sliced stored topology graph is generated that includes redundant path cross-connects. For example, according to the inter-node stored path weight values in the stored path weight table, a candidate stored path set with a weight value exceeding a preset weight threshold (assumed to be 0.6) is screened out. And determining that the number of the sliced copies of the market research report data block is 2 based on the minimum number of the copies, and distributing an initial storage path for each sliced copy. The storage path from the node A to the node B is firstly allocated to the first sliced copy, the storage path weight value in the candidate storage path set is traversed, and the highest path weight value from the node A to the node B is found, so that the highest path weight value is used as the default storage path of the sliced copy. And detecting the inter-node connection state of the default storage path, and if only one connection path from the node A to the node B is found to have single-point fault risk, selecting a standby storage path with a next highest weight value (assuming that the path weight value from the node A to the node C is next highest) from the candidate storage path set as a cross redundant path. And performing bidirectional connection on the cross redundant paths and the default storage paths to generate a segmented storage topological graph containing the cross connection of the redundant paths. And verifying whether the path redundancy of each partitioned copy in the partitioned storage topological graph meets the redundancy constraint condition corresponding to the minimum copy number, and if not, reselecting the standby storage path and updating the cross connection relation of the partitioned storage topological graph.

And generating a cache preloading strategy matrix based on the dynamic access heat prediction model and the physical memory location proximity. A sequence of target data chunks spatially associated with the market research report data chunk is identified based on the physical storage location proximity, as was previously mentioned for the market analysis data chunk. Based on the dynamic access heat prediction model, predicting concurrent access probability distribution of the target data block sequence in a preset time period. It is assumed that 60% of the market analysis data blocks are accessed simultaneously with the market research report data blocks within the next week. And calculating the preloading priority coefficient of each target data block according to the concurrent access probability distribution and the node storage space fragmentation rate in the multidimensional performance data set. The preload priority coefficient may be higher if the node memory space fragmentation rate is lower. And based on the preloading priority coefficient, differential buffer reservation period and compression grade parameters are allocated for the target data block sequence. For example, for market analysis data blocks with high preloading priority coefficients, the current buffer space occupancy rate and the historical buffer replacement frequency of the edge nodes in the full-flash storage cluster are obtained. When the concurrent access probability distribution is higher than a first preset threshold (assuming 50%), a fixed reservation period (e.g., 3 days) is allocated for the corresponding data block and the buffer space is locked. And when the concurrent access probability distribution is lower than a second preset threshold (30% assumed), dynamically adjusting the retention period decay rate based on the historical cache replacement frequency. And performing variable compression rate processing on the low-priority data blocks according to the compression grade parameters, and recording metadata verification information of the compressed data blocks.

Based on the steps, after the multidimensional performance data sets of all nodes in the full-flash storage cluster are collected in real time, a storage operation request stream of a user terminal is received, a to-be-stored data block identification set and an operation mode label are analyzed, a history access record library is traversed based on the to-be-stored data block identification set, history access time sequence characteristics and the physical storage position adjacency of related data blocks are extracted, the history access characteristics of the data blocks and the spatial relation between the data blocks and the peripheral data blocks are deeply mined, and furthermore, a dynamic access heat prediction model of the to-be-stored data blocks in a preset time period is constructed according to the multidimensional performance data set and the history access time sequence characteristics, the dynamic access heat prediction model fuses real-time performance data and a history access rule, the access heat of the data blocks in a future period can be predicted dynamically and accurately, the variability and the complexity of data access in a cloud computing environment can be better adapted, and the pre-judging capability of the data access trend is effectively improved. And finally, generating a cross-node partitioned storage topological graph and a cache preloading strategy matrix based on the dynamic access heat prediction model and the physical storage position adjacency, so that the optimal allocation and the efficient utilization of storage resources are realized. The cross-node fragmented storage topological graph considers the access heat and the physical storage position relation of the data blocks, reasonably plans the storage distribution of the data among different nodes, effectively balances the load pressure of each node, reduces the fragmentation degree of the storage space and improves the read-write performance of the whole storage cluster. Meanwhile, the cache preloading strategy matrix loads the data blocks which are likely to be frequently accessed into the cache in advance according to the predicted access heat, so that the waiting time of data access is greatly reduced, and the response speed and the user experience of the system are remarkably improved.

For example, in one possible implementation, step S130 includes:

Step S131, according to each data block identifier in the set of data block identifiers to be stored, matching a corresponding history access record in the history access record library, where the history access record includes an access timestamp set and an associated data block identifier subset.

In the enterprise data center scene, in the process of storing the data blocks of the market research report, firstly, according to each data block identifier of the market research report, matching the corresponding historical access record in the historical access record library. Since the history access record library records the past access condition of all data of the enterprise in detail, records corresponding to the market research report data blocks can be accurately found. These historical access records contain a set of access time stamps and an associated subset of data block identifications, for example, for a particular data block in a market research report, the set of access time stamps in the historical access records showing the specific time each time it was accessed in the past may include a series of time points from the enterprise after a large marketing campaign was developed, such as 2023, 3, 1, 10, 15, 2023, 3, 5, 14, 30, etc. The subset of associated data block identifications then contains the other data block identifications that may be involved in accessing the data block.

Step S132, extracting an access time stamp sequence in the history access record, and calculating a periodic access interval distribution parameter and a time stamp set of the sudden access event based on the access time stamp sequence.

For example, the access time stamp sequence of the specific data block of the market research report is 2023, 3, 1, 10, 15, 2023, 3, 5, 14, 30, or the like. A periodic access interval distribution parameter is calculated based on the access timestamp sequence. The calculation process is that the time interval of adjacent access time stamps is counted, for example, the time interval from 15 minutes at 10 days of 2023 month 3 day 1 to 30 minutes at 14 days of 2023 month 3 day 5 is 4 days, 4 hours and 15 minutes, and all the adjacent access time stamps are calculated. These time intervals are then analyzed to count the frequency of occurrence of each time interval, which is then considered as an important periodic access interval reference if the frequency of occurrence is highest for a 4 day 4 hour 15 minute time interval of 3 occurrences. By counting and analyzing all time intervals in this way, a periodic access interval distribution parameter is obtained, and the periodic access interval distribution parameter can reflect the periodicity rule of the data block accessed in different time periods. In this process, it is also possible to determine a set of timestamps of the sudden access event, for example, when the enterprise performs annual financial examination, a large amount of data in the market research report needs to be referenced, so that there is a large amount of centralized access during the period from 1 day at 4 in 2023 to 5 days at 4 in 2023, and the timestamps of the period form the set of timestamps of the sudden access event.

And step S133, screening out the associated data block identification set which is accessed together with the data block to be stored in a preset time window for more than a cooperative threshold value from the associated data block identification subset.

Assuming that the preset time window is the past 6 months, the synergy threshold is set to 15 times. And checking the common access times corresponding to each identifier in the associated data block identifier subset, for example, a market analysis data block, wherein the common access times with the market research report data block are 20 times in the past 6 months, and the common access times exceed the 15-time collaboration threshold, so that the common access times are screened, and the screened data block identifiers form an associated data block identifier set.

Step S134, obtaining a physical storage coordinate set of the associated data block identifier set in the full flash storage cluster, where the physical storage coordinate set includes a storage node identifier and a logical unit address.

For example, for the filtered market analysis data block, its storage node stored in the full flash storage cluster is identified as node a, and the logical unit address is 105. There may be other associated data blocks stored at the node B, logical unit address 200, etc., which storage node identities and logical unit addresses constitute a set of physical storage coordinates.

Step S135, generating a physical storage position proximity matrix of the associated data block based on the topological connection relation between storage node identifiers in the physical storage coordinate set and the distance difference value between the logic unit addresses.

For example, for a topological connection between storage node identities, if node a and node B are connected by a high-speed network and the network delay is low, then the topological connection between them is tighter. For the distance difference between logical unit addresses, e.g. the distance difference between logical unit address 105 of node a and another logical unit address 110 of node a is relatively small, and the distance difference between logical unit address 200 of node B is relatively large. And carrying out quantitative evaluation on the relation between each pair of associated data blocks by comprehensively considering the factors, thereby constructing a physical storage position proximity matrix of the associated data blocks. The physical storage position proximity matrix can accurately reflect the proximity relation of each associated data block on the physical storage position, and provides important basis for subsequent data storage and management strategies.

In one possible implementation, step S140 includes:

step S141, extracting periodic access peak intervals, sudden access event time stamps and access interval distribution statistic values from the historical access time sequence characteristics.

In detail, in the aspect of periodically accessing the peak interval, the historical access record is reviewed, and the data block is found to have larger access quantity between 9 am and 10 am of each working day, and the time period is the periodically accessing peak interval. In terms of sudden access event time stamps, for example, during the adjustment of the quarter sales strategy by the enterprise, the market research report data block has a large number of sudden accesses from the 15 th year of 2023 to the 20 th year of 2023, and these time points are the sudden access event time stamps. For the access interval distribution statistics, the time interval between each access is counted, for example, the first access is 3 days apart from the second access, the second access is 5 days apart from the third access, etc., and the access interval distribution statistics are obtained by performing statistical analysis on a large amount of interval data.

Step S142, the read-write operation proportion corresponding to the operation mode label is identified, and when the read operation proportion in the read-write operation proportion exceeds a preset threshold value, the hot spot data prediction mark is activated.

In detail, the market research report data block is stored with an operation mode tag, assuming that 70% of operations are read operations through analysis. The preset threshold is set to 60%, and since the read operation is 70% over the preset threshold of 60%, the hot spot data predictive flag is activated.

Step S143, generating a time dimension access probability density function in combination with the periodic access peak interval and the burst access event time stamp.

To illustrate with a simple calculation procedure, it is assumed that the access probability in the periodic access peak section (9 to 10 am on the working day) is initially set to 0.3, and the access probability in the 2023, 6, 15 to 2023, 6, 20 days corresponding to the emergency access event time stamp is set to 0.2. And setting corresponding probability values according to factors such as historical access frequency and the like for other time periods, so as to construct a preliminary time dimension access probability density function.

And step S144, correcting the weight parameter of the time dimension access probability density function based on the access interval distribution statistical value and the hot spot data prediction mark.

The access interval distribution statistics are assumed to indicate a gradual decrease in access probability between access peaks, e.g. a decrease of 0.05 per day in a period of time after a periodic access peak interval. Because the hotspot data prediction flag is activated, which indicates that the data block may be hotspot data, for this case, the weight parameter needs to be adjusted, so that the decay speed of the access probability after the peak is reduced, for example, the decay rate is originally adjusted to be 0.03 per day, so as to correct the weight parameter of the time dimension access probability density function.

And step S145, performing coupling calculation on the corrected time dimension access probability density function and the network link bandwidth utilization rate in the multi-dimensional performance data set to generate the dynamic access heat prediction model, and outputting the heat level parameter of the data block to be stored by the dynamic access heat prediction model.

For example, step S145 includes:

step S1451, extracting a bandwidth fluctuation data set of the network link bandwidth utilization within a plurality of historical time windows from the multi-dimensional performance data set, wherein the bandwidth fluctuation data set comprises a bandwidth utilization peak interval and an average transmission rate.

Step S1452, performing alignment mapping on the bandwidth utilization peak interval in the bandwidth fluctuation dataset and the time axis of the time dimension access probability density function, and generating a time-synchronized bandwidth utilization distribution sequence.

Step S1453, calculating a bandwidth weight factor of the time dimension access probability density function at each time unit based on the average transmission rate in the bandwidth utilization distribution sequence.

Step S1454, performing weighted correction on the access probability value of the time dimension access probability density function in the corresponding time unit according to the bandwidth weight factor, so as to generate a bandwidth-aware access probability density distribution function.

Step S1455, performing a time dimension normalization process on the bandwidth-aware access probability density distribution function, and generating a heat level parameter of the dynamic access heat prediction model based on the maximum probability value of the normalized bandwidth-aware access probability density distribution function and the slope change of the probability distribution curve.

For example, the past 10 workdays are selected as the historical time window, and the network link bandwidth utilization of each workday is counted. Wherein the bandwidth utilization peak interval may occur from 10 to 11 am, with a peak of 80% and an average transmission rate of 500Mbps. And carrying out alignment mapping on the bandwidth utilization peak interval in the bandwidth fluctuation data set and a time axis of the time dimension access probability density function to generate a time-synchronous bandwidth utilization distribution sequence. Assuming that the time unit of the time dimension access probability density function is hour, the access probability from 9 am to 10 am is 0.3, and the bandwidth utilization peak from 10 am to 11 am is 80%. The process of calculating the bandwidth weight factor of the time dimension access probability density function at each time unit is as follows, firstly determining the influence degree of the relation between the average transmission rate and the bandwidth utilization peak value on the access probability. If the higher the average transmission rate and the higher the bandwidth utilization peak, it is indicated that the data transmission demand is large and the network resources are fully utilized during this period, the bandwidth weight factor of this time unit should be larger. For example, in a simple calculation, if the average transmission rate is 500Mbps, the bandwidth utilization peak is 80%, and a base weight is set to 1, then the bandwidth weight factor calculated from these two values may be 1.2 (here, just one example calculation, the actual calculation would be according to more complex logic). And carrying out weighted correction on the access probability value of the time dimension access probability density function in the corresponding time unit according to the bandwidth weight factor, wherein the original access probability is 0.3 from 9 am to 10 am, and the access probability density function is changed into 0.3 multiplied by 1.2=0.36 after the weighted correction, so that the bandwidth-aware access probability density distribution function is generated.

Further, the normalization process adjusts the access probability values of all time units to a specific range, for example, between 0 and 1. The calculation process is that the maximum probability value in the access probability density distribution function of bandwidth perception is found first, and is assumed to be 0.4. Then dividing the access probability value of each time unit by the maximum probability value, for example, if the original access probability of a certain time unit is 0.2, the normalized access probability value becomes 0.2/0.4=0.5. And generating a heat level parameter of the dynamic access heat prediction model based on the maximum probability value of the normalized bandwidth-aware access probability density distribution function and the slope change of the probability distribution curve. If the maximum probability value is close to 1 and the slope of the probability distribution curve is large, indicating that the data block has a high access heat and a significant heat change in certain time periods, the heat level parameter may be set to a higher level, such as level 3 (here, it is assumed that the heat level is classified as level 1 to level 5), and if the maximum probability value is small and the slope change is gentle, the heat level parameter may be set to a lower level, such as level 1. The heat level parameter can accurately reflect the access heat condition of the market research report data block in a preset time period, and provides an important basis for a subsequent data storage management strategy.

In a possible embodiment, after step S143, the method further includes:

Step S210, dividing a plurality of data access areas according to the node topology structure of the full flash storage cluster, and configuring an independent access frequency monitor for each area.

The node topology of a full flash storage cluster may be a complex network structure consisting of multiple storage nodes. For example, nodes storing enterprise financial data are divided into one area, nodes storing sales data are divided into another area, nodes storing relevant data blocks for market research reports are divided into specific areas, etc. Each such area is provided with an independent access frequency monitor which can accurately count the access of each area.

And step S220, collecting actual access times distribution data of each area in a historical time window through the access frequency monitor.

Taking the area where the market research report data block is located as an example, a historical time window is set as the past month, and during the month, the access frequency monitor records the actual access times of the area in detail in each day or each specific time period. For example, in the first few days of the month, as each department of the enterprise just makes a working plan, the number of accesses to the market research report data block is small, and may be only 10 accesses per day, the number of accesses is increased to 30 times per day by the adjustment period of the marketing plan in the month, and when the month is summarized near the end of the month, the number of accesses is changed, and may be 20 times per day, so that the actual access number distribution data of the area in the historical time window is obtained.

And step S230, carrying out fitting verification on the actual access times distribution data and the time dimension access probability density function, and adjusting the curve form of the probability density function.

Explained with a simple calculation procedure, it is assumed that the access probability set by the time-dimension access probability density function at the beginning of the month is 0.1, and the actual access number distribution data shows that the actual access number at the beginning of the month is small. The difference between the actual number of visits and the number of visits predicted from the probability density function is calculated, e.g. 15 visits should be made per day at the beginning of the month, whereas only 10 visits are actually made, the difference being 5. Such a difference calculation is performed for each time period within the entire historical time window. If a large difference is found for a certain time period, this indicates that the probability density function is not predicted accurately for that time period. For example, in the marketing-plan adjustment period in the month, the number of visits corresponding to the visit probability predicted from the probability density function is 25, and there are actually 30 visits, and the difference is 5. The curve morphology of the probability density function is adjusted based on these differences. If the actual access times of a certain time period are more than the predicted access times, the access probability of the time period is properly increased, so that the curve is adjusted upwards in the time period, and otherwise, the curve is adjusted downwards.

Step S240, updating the heat level parameter calculation rule of the dynamic access heat prediction model based on the adjusted probability density function.

The heat level parameter calculation rule may be based on some characteristic value of the original probability density function, such as the maximum probability value, the slope of the probability distribution curve, etc., before adjusting the probability density function. For example, the original rule is that the heat level parameter is level 3 when the maximum probability value is greater than 0.3 and the slope is greater than a certain value. These eigenvalues change after adjusting the probability density function. And determining a new characteristic value according to the adjusted probability density function again, wherein the slope is changed when the maximum probability value is changed to 0.35. Updating the heat level parameter calculation rule based on these new feature values, for example, the new rule may become that the heat level parameter is level 3 when the maximum probability value is greater than 0.35 and the slope satisfies the new condition. Therefore, the dynamic access heat prediction model can more accurately reflect the access heat of the market research report data block under different conditions, and more accurate basis is provided for operations such as data storage, management, prefetching and the like.

In one possible implementation, step S150 includes:

And step S151, determining a fragmentation redundancy threshold and the minimum copy number of the data block to be stored according to the heat level parameter output by the dynamic access heat prediction model.

It is assumed that the heat level parameter of the market research report data block is higher, indicating that it may have a higher access frequency in the future. Based on this, the slice redundancy threshold is determined to be 3, which means that the data block can be divided into at most 3 slices for storage, so as to improve the availability and reliability of the data. At the same time, the minimum number of copies is determined to be 2, i.e. there are at least 2 copies per slice. The purpose of this is to ensure data accessibility in the face of node failure, data corruption, etc.

Step S152, traversing the storage space fragmentation rates and the input/output request queue depths of all nodes in the full flash storage cluster, and screening candidate node subsets meeting the fragmentation capacity constraint.

For example, there are multiple nodes in the full flash storage cluster, such as node a, node B, node C, etc. For node a, its storage space fragmentation rate is 20% and the input-output request queue depth is 50 requests at the current time. The memory fragmentation rate of the node B is 30% and the input-output request queue depth is 80 requests. Assuming that the shard capacity constraint is that the storage space fragmentation rate is no more than 30% and the input-output request queue depth is no more than 100 requests, then both node a and node B satisfy the shard capacity constraint and are thus selected to be a subset of candidate nodes. If the storage space fragmentation rate of the node C is 40% or the input-output request queue depth is 150 requests, the fragmentation capacity constraint is not satisfied, and the candidate node subset is not selected.

Step S153, calculating storage location association scores of all nodes in the candidate node subset based on the physical storage location proximity.

Taking the previously mentioned association of the market analysis data block with the market research report data block as an example, assuming that the market analysis data block is stored in the node a and a certain associated data block of the market research report data block is stored in a position adjacent to the logical unit address of the node a, the association score of the node a and the storage position of the market research report data block is higher. If another node B is farther from the storage location of the associated data block of the market research report data block, its storage location association score is relatively low. By taking these factors into account in combination, an accurate storage location association score is calculated for each node in the subset of candidate nodes.

And step S154, constructing a storage path weight table among nodes according to the storage position association scores and the network link bandwidth utilization rate in the multidimensional performance data set.

For example, the storage location association between node a and node B is scored higher while the network link bandwidth utilization between them is 70%, indicating that there is a higher efficiency in transferring data between the two nodes. Based on these factors, a higher weight value is given to the storage path between node a and node B. If the storage location association score between node a and node C is low, the network link bandwidth utilization is 50%, and the storage path weight value between them is relatively low. By such evaluation of the relationships between all nodes in the candidate node subset, a complete stored path weight table between nodes is constructed.

Step S155, based on the stored path weight table and the minimum copy number, generating a sliced stored topology graph containing redundant path cross-connections.

For example, step S155 includes:

step S1551, screening candidate storage path sets with the storage path weight value exceeding a preset weight threshold according to the inter-node storage path weight values in the storage path weight table.

And step S1552, determining the number of the partitioned copies of the data block to be stored based on the minimum number of copies, and distributing an initial storage path for each partitioned copy.

And step S1553, traversing the storage path weight values in the candidate storage path set, and selecting a main storage path with the highest weight value as a default storage path of the fragmented copy.

Step S1554, detecting the inter-node connection state of the default storage path, and if there is a single node connection path, selecting a standby storage path with a next highest weight value from the candidate storage path set as a cross redundant path.

And step S1555, performing bidirectional connection on the cross redundant paths and the default storage paths to generate a segmented storage topological graph containing the cross connection of the redundant paths.

Assuming that the preset weight threshold is 0.6, the stored path weight value from node a to node B is 0.7, and the stored path weight value from node B to node C is 0.8, then the stored paths from node a to node B and node B to node C are selected into the candidate stored path set. And determining that the number of the partitioned copies of the data block to be stored is 2 based on the minimum number of the copies, and distributing an initial storage path for each partitioned copy. For example, a first sharded copy is allocated a storage path from node A to node B. Then traversing the storage path weight values in the candidate storage path set, finding that the path weight value from the node A to the node B is highest, so that the path weight value is used as the default storage path of the sliced copy. And detecting the inter-node connection state of the default storage path, and if only one connection path from the node A to the node B is found, carrying out single-point fault risk. At this time, the backup storage path of the next highest weight value (assuming that the path weight values of the nodes a to C are next highest) is selected from the candidate storage path set as the cross redundant path. And performing bidirectional connection on the cross redundant paths and the default storage paths to generate a segmented storage topological graph containing the cross connection of the redundant paths. For example, in the sliced storage topology, the first sliced copy is the primary storage path from node a to node B and the cross redundant path from node a to node C, both connected in both directions, to ensure that data is still accessible through node C in the event of a failure of node a or node B.

And step S1556, verifying whether the path redundancy of each partitioned copy in the partitioned storage topological graph meets the redundancy constraint condition corresponding to the minimum copy number, and if not, reselecting a standby storage path and updating the cross connection relation of the partitioned storage topological graph.

In detail, since the minimum number of copies is 2, for each sliced copy, its path redundancy should be at least 1, i.e., there is at least one spare storage path in addition to the primary storage path. Checking the path condition of each sliced copy in the sliced storage topological graph, and if the path redundancy of a certain sliced copy is found to be unsatisfied, for example, only a main storage path is found and no standby storage path exists, then the standby storage path is reselected and the cross connection relation of the sliced storage topological graph is updated. Assuming that a certain sliced copy originally only has a main storage path from a node A to a node B, and does not meet the redundancy constraint condition, selecting a path from the node A to the node C from the candidate storage path set again as a standby storage path, and updating the sliced storage topological graph to enable the nodes A to the node B and the nodes A to the node C to establish a correct cross connection relation, thereby ensuring the reliability and the availability of the whole sliced storage topological graph, meeting the requirement of an enterprise data center on the storage of market research report data blocks, and improving the storage safety and the access efficiency of data.

In one possible embodiment, the method further comprises:

step S310, detecting whether a single point failure risk path exists in the partitioned storage topology graph.

For example, for a fragmented copy of a market research report data block, its primary storage path is from node a to node B, and if this is the only storage path, there are no other backup paths connected to it, then this is a single point-of-failure risk path. Because should node a or node B fail, such as if node a's power supply fails or node B's storage medium is damaged, the fragmented copy will not be accessible, thereby affecting the availability of the entire data block.

Step S320, if there is a single point failure risk path, dynamically inserting a backup storage node to form a ring redundant path based on the real-time performance data of the candidate node subset.

The candidate node subset is assumed to contain node a, node B, node C, etc. And the real-time performance data of the nodes, such as the low real-time storage space fragmentation rate of the node C, are checked, the depth of the input/output request queue is also in a reasonable range, the network link bandwidth utilization rate is high, and the network link has a good performance state. At this time, node C is dynamically inserted into a path that has a single point of failure risk. For the previous path from node a to node B, it is extended to a ring redundant path from node a to node C to node B. Thus, even if the node A or the node B fails, the data can still be transmitted through the node C, and the reliability of the data is improved. In this process, the connection relationship and the data flow direction between the nodes need to be considered in detail, so as to ensure the rationality and the effectiveness of the annular redundant path.

And step S330, optimizing the copy synchronization priority in the partitioned storage topological graph according to the inter-node delay data of the annular redundant path.

For example, by measuring delay data between each node in the ring redundant path, e.g., delay from node a to node C is5 milliseconds, delay from node C to node B is 3 milliseconds, and delay from node B to node a is 4 milliseconds. The priority of replica synchronization is determined from these delay data. If a slice copy has an update on node a, because the delay from node C to node B is relatively small, the update can be preferentially synchronized to node C, then from node C to node B, and finally from node B back to node a, so as to ensure the consistency and timeliness of the data. The process needs to accurately calculate and balance the influence of delays of different paths on data synchronization, so that the data reliability is improved, and meanwhile, the high-efficiency synchronization of the data can be ensured.

And step S340, performing association mapping on the optimized fragment storage topological graph and the cache preloading strategy matrix to generate a joint storage strategy configuration file.

The cache preloading policy matrix contains the cache policy information of the data blocks related to the market research report data blocks, such as preloading priority coefficient, cache retention period, compression level parameter and the like. And carrying out association mapping on the optimized fragment storage topological graph and the cache policy information. For example, for a sharded copy stored on node a in the sharded storage topology graph, if the corresponding associated data block has a higher preloading priority coefficient in the cache preloading policy matrix, then the node a where the sharded copy is located is marked in the joint storage policy configuration file to need to perform the cache preloading operation more preferentially. Through the association mapping, a comprehensive joint storage strategy configuration file is generated, and the joint storage strategy configuration file guides the whole data storage system how to store, cache, synchronize data and the like on the market research report data blocks and related data blocks thereof.

In one possible embodiment, the method further comprises:

step S410, a storage life cycle tracking log is created for the data block to be stored, so as to record the creation time stamp, migration event and cache state change history of the sharded copy through the storage life cycle tracking log.

In detail, when a sliced copy of the market research report data block is created at a certain time, the stored lifecycle tracking log records the creation timestamp, e.g., 2023, 7, 1, 10, 15 minutes. During the storage process of the data block, if a migration event occurs to the sliced copy due to the performance problem of a certain node or the requirement of load balancing, such as migration from the node a to the node C, the migration event is recorded in detail in the log, including information of a starting node, a target node, and migration time. At the same time, the change history of the cache state is recorded, and if the cache of a certain piece of copy is changed from the initial uncached state to the cached state or the retention period of the cache is changed, the information is recorded.

Step S420, analyzing the abnormal event sequence in the storage life cycle tracking log, and identifying the potential cause of the storage performance bottleneck.

For example, in a storage lifecycle tracking log, it is found that, during a certain period of time, the fragmented copies of the market research report data blocks frequently experience migration events, while the cache state is also unstable. By analyzing the sequence of abnormal events, it may be found that the storage space fragmentation rate of a certain node (such as node a) is too high, resulting in reduced storage and access efficiency of the data block, thereby causing frequent migration events and unstable cache states. Or found to be because the network link bandwidth utilization suddenly drops at some point, affecting the transfer and buffering of data, which is a potential cause of storage performance bottlenecks.

Step S430, adjusting the parameter updating frequency of the dynamic access heat prediction model and the calculation logic of the fragment redundancy threshold according to the potential reasons.

If a problem is found that is caused by too high a storage space fragmentation rate of the node, the parameter update frequency of the dynamic access heat prediction model can be adjusted. For example, the original parameter update frequency is once a day, and the change of the fragmentation rate of the storage space can have a large influence on the data access heat, so that the parameter update frequency is increased to be once an hour, so as to reflect the actual access condition of the data more timely. For the calculation logic of the segment redundancy threshold, if the instability of the network link bandwidth utilization is found to have a larger influence on the data availability, the consideration weight on the network link bandwidth utilization is increased when the segment redundancy threshold is calculated. For example, the slice redundancy threshold is determined to be 3 only according to the heat level parameter, and the slice redundancy threshold may be adjusted to be 4 according to the fluctuation condition of the network link bandwidth utilization rate due to the influence of network factors, so as to improve the reliability of data.

Step S440, the adjusted parameter updating frequency and the calculation logic are synchronized to all the management nodes of the all-flash storage cluster in real time.

Each management node in the full flash storage cluster needs to obtain the adjusted information to ensure that the whole storage system performs data storage and management according to a new strategy. For example, the management node a, the management node B and the like need to receive and update the parameters, so that when the market research report data block and other data blocks are subjected to storage operation, decision can be made according to the new dynamic access heat prediction model parameter updating frequency and the calculation logic of the fragmentation redundancy threshold value, and the high efficiency, the reliability and the availability of the data of the whole storage system are ensured.

In one possible implementation, step S150 further includes:

Step S156, identifying a target data block sequence spatially associated with the data block to be stored according to the physical storage location proximity.

In detail, in a full flash storage cluster, market research report data blocks are stored at specific node and logical unit addresses. Other data blocks stored adjacent to the market research report data block are determined by analyzing the physical storage location proximity, such as looking at the relationship of storage node identification and logical unit address. It is assumed that the market research report data block is stored at the logical unit address 100 of the node a, and may be auxiliary data blocks related to the market research report, such as raw data collection records, preliminary analysis results and the like, which are in the same node and have similar logical unit addresses (e.g., 101-105), and these data blocks form a target data block sequence spatially related to the market research report data block.

Step S157, predicting a concurrent access probability distribution of the target data block sequence in the preset time period based on the dynamic access popularity prediction model.

The dynamic access heat prediction model outputs information such as heat level parameters and the like for the market research report data block, and predicts concurrent access probability distribution of the target data block sequence based on the information. For example, according to the high access heat of the market research report data block from 9 to 10 am on the working day, it is presumed that the original data acquisition record data block spatially associated with the market research report data block has a high concurrent access probability in the time period, and the probability may reach 0.4. And for the primary analysis result data block, the concurrent access probability may be slightly lower, which is 0.3. And for other time periods, the concurrent access probability of each time period is calculated according to the access heat trend of the market research report data block and the association tightness degree of each target data block and the market research report data block, so that the concurrent access probability distribution of the target data block sequence in the preset time period is obtained.

Step S158, calculating a preloading priority coefficient of each target data block according to the concurrent access probability distribution and the node storage space fragmentation rate in the multi-dimensional performance data set.

Taking the original data acquisition record data block as an example, the concurrent access probability is 0.4, and the storage space fragmentation rate of the node A is assumed to be 20%. The calculation process is such that if the storage space fragmentation rate is low, indicating that the node has more space for caching, there is a positive impact on the preload priority. A base score, such as a concurrent access probability of 0.4 for 40 points, may be set and then adjusted based on the storage space fragmentation rate. Since the storage space fragmentation rate is 20%, at a lower level, given a certain score, assuming a score of 10, the pre-load priority coefficient of the original data acquisition record data block is 50. For the data block of the primary analysis result, the concurrent access probability is 0.3, the fragmentation rate of the storage space of the node where the data block is located is assumed to be 30%, the fragmentation rate of 30% is relatively high, the pre-loading priority is negatively affected to a certain extent, and according to the same calculation mode, the basic score is 30 minutes, and the pre-loading priority coefficient is 25 minutes because the fragmentation rate is high and possibly less than 5 minutes. By such calculation, a preload priority coefficient is calculated for each target data block.

Step S159, based on the preloading priority coefficient, allocating differentiated buffer reservation period and compression level parameters for the target data block sequence, integrating the buffer reservation period and compression level parameters according to node dimensions, and generating a multi-layer structure of the buffer preloading policy matrix.

In a full flash memory cluster, there are a plurality of nodes, and for each node, the cache retention period and compression level parameters of a target data block stored on the node are integrated. For example, at the node A, the buffer memory retention period of the data block of the original data acquisition record is 3 hours, the compression level is a low compression level, and the buffer memory retention period of the data block of the primary analysis result is a high compression level according to the dynamic adjustment condition (assuming that the current time is 1 hour). And sorting the information according to the node dimension to form a cache preloading strategy matrix with a multi-layer structure. The cache preloading strategy matrix can clearly guide the cache preloading operation of the full-flash storage cluster on different nodes on the target data block sequence, including when to cache, how long to cache, what compression rate is adopted, and the like, so that the data access efficiency and the storage resource utilization rate are improved.

Wherein, step S159 includes:

Step S1591, obtaining the current buffer space occupancy rate and the history buffer replacement frequency of the edge node in the full flash storage cluster.

And step S1592, when the concurrent access probability distribution is higher than a first preset threshold, allocating a fixed reservation period for the corresponding data block and locking a cache space.

And step S1593, dynamically adjusting the retention period attenuation rate based on the history buffer replacement frequency when the concurrent access probability distribution is lower than a second preset threshold.

And step S1594, performing variable compression rate processing on the low-priority data block according to the compression grade parameter, and recording metadata verification information of the compressed data block.

Assuming that the current buffer space occupancy of the edge node is 60%, the historical buffer replacement frequency is one replacement every 2 hours. For the original data acquisition record data block with higher preloading priority coefficient, when the concurrent access probability distribution is higher than a first preset threshold value (assumed to be 0.35), a fixed reservation period is allocated for the corresponding data block and the buffer space is locked. For example, a fixed buffer retention period of 3 hours is allocated, and during this 3 hours, the data block is not replaced even if the buffer space is tight. For the block of preliminary analysis results, its concurrent access probability distribution is below a second preset threshold (say 0.25), and the retention period decay rate is dynamically adjusted based on the historical cache replacement frequency. Since the historical cache replacement frequency is replaced every 2 hours, when the cache space is tight, its cache retention period is decayed at a faster rate, for example, 10 minutes per 30 minutes, to free up the cache space as soon as possible to the higher priority data blocks that are more needed to be cached. The variable compression rate processing is performed on the low-priority data blocks according to the compression level parameters, and for such low-priority data blocks as a result of the preliminary analysis, compression is performed with a higher compression rate according to their compression level parameters (assuming a higher compression level). During the compression process, metadata verification information, such as a checksum, of the compressed data block is recorded, so as to perform verification of data integrity when the data block is accessed later.

In one possible embodiment, the method further comprises:

and step S510, deploying a dynamic load balancing controller in the full-flash memory cluster, and collecting the performance data fluctuation trend of each node in real time.

For example, for each node storing market research report data blocks and their associated data blocks, dynamic load balancing controllers may continue to focus on their performance status. It records the change of each item of performance data of nodes such as node A, node B and the like along with time. Taking node A as an example, the method monitors the change of the storage space fragmentation rate of the node A in different time periods, such as 15% of the morning, 18% of the noon and 20% of the afternoon, monitors the fluctuation of the input/output request queue depth, such as 30 requests in the morning when the business is just developed, increases to 80 requests in the peak business in the morning along with the frequent access of market departments to the market research report data block and the related data block, and then gradually drops back to 50 requests in the afternoon. By recording the data at different time points, the fluctuation trend of the performance data of each node is analyzed.

And step S520, identifying potential overload nodes according to the performance data fluctuation trend, and triggering a data block copy migration early warning mechanism.

For example, it is observed that node a's input-output request queue depth increases at a faster rate, from 30 requests to 80 requests in the past hour, and its rate of increase (number of requests increased divided by time interval) is calculated assuming that the rate of increase exceeds a pre-set queue depth warning line (e.g., 50 requests per hour). Meanwhile, the gradient of the change of the fragmentation rate of the storage space of the node A is larger, and the fragmentation threshold value (assumed to be a 15% -20% change range) is reached from 15% to 20%. At this point, node a is identified as a potentially overloaded node, triggering a data block copy migration early warning mechanism.

Step S530, selecting a migration target node and calculating an optimal delay parameter of the migration path based on the redundant path cross connection relationship in the partitioned storage topology.

In the partitioned storage topology, node a has a redundant path cross-connect relationship with other nodes (e.g., node B, node C, etc.). The real-time performance data of the nodes is checked, including storage space fragmentation rate, input-output request queue depth, network link bandwidth utilization, and the like. Assuming that the storage space fragmentation rate of the node B is low, the depth of the input/output request queue is small, and the bandwidth utilization rate of the network link is high, the node B is selected as the migration target node. An optimal delay parameter of the migration path from node a to node B is calculated. Consider various link links between node a and node B, such as passing intermediate nodes, network devices, etc. Delay data for each link is measured, e.g., 3 ms delay from node a to the intermediate node, 2 ms delay from the intermediate node to node B, and 5 ms total delay. And meanwhile, the influence of factors such as network congestion, data transmission rate and the like possibly encountered in the migration process on delay is considered, and the optimal delay parameter of the migration path from the node A to the node B is determined by comprehensively analyzing the data.

Step S540, during the migration operation, maintaining the access availability of the original data block and synchronously updating the federated storage policy configuration file.

When starting to migrate a fragmented copy of the market research report data block on node a to node B, it is ensured that during the migration process, the market department or other department requiring access to the data block will still be able to access the data normally. This may require some technical means, such as temporary copies of data, multi-path access of data, etc. Meanwhile, in the migration process, the joint storage strategy configuration file is synchronously updated. The joint storage policy configuration file contains information about the partitioned storage topology map, the cache preloading policy matrix, etc. of the market research report data block. Because of the change in storage locations of the data blocks, relevant information needs to be updated in the federated storage policy configuration file. For example, the original storage information about the data block on the node a is modified to the storage information on the node B, including updating the relevant information such as the storage path of the fragmented copy on the node B, the cache policy, etc., so as to ensure that the policy of the whole storage system matches with the actual storage condition of the data.

Wherein, step S520 includes:

Step S521 monitors a gradient of a change in the depth of the input/output request queue of the potentially overloaded node.

And step S522, when the growth rate exceeds a queue depth early warning line and the change gradient reaches a fragmentation threshold, generating a migration task queue.

Step S523, sorting the migration execution sequence and distributing the migration bandwidth resources according to the data block priority labels in the migration task queues.

For example, as described above, for node a, the significant increase in the depth of its input/output request queue and the significant increase in the fragmentation rate of the storage space are recorded in detail. And when the growth rate exceeds the queue depth early warning line and the change gradient reaches the fragmentation threshold, generating a migration task queue. Assume that a market research report data block has multiple sliced copies on node a, which are labeled with different data block priority labels according to factors such as their importance. When the migration task queue is generated, it is ordered according to these priority labels. For example, for a sliced copy associated with core data of a market research report, a high priority is marked, for auxiliary data, a medium priority is marked, and other copies of data blocks with lower association are marked as low priorities. According to the priority order, the high-priority data block copies are arranged in front of the migration task queue, the medium-priority data block copies are arranged in the middle, and the low-priority data block copies are arranged in the last. And meanwhile, distributing migration bandwidth resources for the data block copies in the migration task queue according to the current network bandwidth resource condition of the full flash memory cluster. If the total bandwidth is 1000Mbps, it is possible to allocate 500Mbps of bandwidth to the copy of the data block with high priority, 300Mbps for medium priority, and 200Mbps for low priority, depending on the priority and size of the data block.

Step S524, after the migration is completed, verifying the data consistency of the target node and updating the node state identification of the fragment storage topological graph.

After the copy of the data block has been completely migrated from node A to node B, the data on node B is authenticated for consistency. This may include comparing the checksums of the data blocks to see if the size, content, etc. of the data blocks are consistent with when on node a. If the data is consistent, the migration is successful. And updating the node state identification of the sliced storage topological graph, modifying the state identification of the data block copy on the node A to be migrated, and modifying the state identification on the node B to be received and stored. Through the operation, the partitioned storage topological graph can accurately reflect the storage position and the state of the data block copy, and an accurate basis is provided for subsequent storage management operation.

In one possible embodiment, the method further comprises:

Step S610, implementing a data integrity verification cycle in the full flash storage cluster, periodically scanning checksum information of the partitioned copies, and initiating a copy repair request based on a redundant path cross connection relationship in the partitioned storage topological graph when detecting that the checksum is abnormal.

In this embodiment, taking the market research report data block as an example, the full flash storage cluster may scan the partitioned copies of the market research report data block stored on each node for checksum information at predetermined time intervals, for example, every hour or every day. The checksum is a value representing the characteristics of the content of the data block calculated by a specific algorithm. For each sliced copy of the market research report data block, its checksum may be recalculated and then compared to previously stored checksum information.

Assuming that in one scan, the checksum of a sliced copy of the market research report data block stored on node a is found to not match the previously stored value, this indicates that the data of that sliced copy may be corrupted or erroneous. At this time, the redundant path cross-connect relationship in the sharded storage topology may be referred to. For example, the sharded storage topology shows that the sharded copy has a primary storage path on node A while being connected to node B by a redundant path and that node B has a redundant copy of the sharded copy stored thereon. Based on this, a copy repair request may be initiated to node B, requiring that node B provide the correct fragmented copy data to repair the corrupted copy on node a.

And step S620, according to the compression grade parameter in the cache preloading strategy matrix, performing recompression and cache state refreshing operation on the repaired data block, updating the metadata mapping relation in the joint storage strategy configuration file and feeding back the repair result to the user terminal.

In the cache pre-load policy matrix, each data block has a corresponding compression level parameter. Assume that the compression level parameter of the sliced copy of the block of market research report data is a medium compression level. After repair is completed, the repaired data block may be recompressed at the intermediate compression level. For the compression process, repeated data in the data block can be processed according to a compression algorithm so as to reduce the memory space occupation of the data block. Meanwhile, since the content of the data block changes (after repair and recompression), a refresh operation is required for the cache state. If the data block is in a cached state in the cache before and there is a certain cache retention period and a cache policy, then the states related to the caches all need to be updated according to new situations. For example, it may be necessary to recalculate the cache retention period, or adjust the priority of the cache, etc.

The joint storage policy configuration file contains various information about the storage of the market research report data blocks, wherein the metadata mapping relation records the association between the data blocks and various information such as storage positions, cache policies, compression levels and the like. These related information changes as the data block undergoes repair, recompression and cache state refreshing. For example, the location of a data block on a storage node may change due to a repair operation, or a change in compression level may result in a change in its relationship in the cache preload policy. Therefore, the metadata mapping in the federated storage policy configuration file needs to be updated to ensure that the information in the file matches the storage and management conditions of the actual data blocks. Finally, the repair result can be fed back to the user terminal. For users of the market research report data block, such as market department staff of an enterprise, notification of the repair result can be received through the user terminal. If the repair is successful, the notification may show that the fragmented copy of the market research report data block on the node A has been successfully repaired, the data integrity is recovered and can be used normally, and if the repair is failed, the notification may show that the fragmented copy of the market research report data block on the node A is failed to repair, and the manager is contacted for further inspection. In this way, the status of the data block can be known in time to relevant personnel of the enterprise to take further action when required.

FIG. 2 illustrates a schematic diagram of exemplary hardware and software components of a cloud computing based data all-flash memory optimization system 100 that may implement the concepts of the present application, provided by some embodiments of the present application. For example, the processor 120 may be used on the cloud computing based data all-flash memory optimization system 100 and to perform the functions of the present application.

The cloud computing-based data full-flash memory optimization system 100 may be a general-purpose server or a special-purpose server, both of which may be used to implement the cloud computing-based data full-flash memory optimization method of the present application. Although only one server is shown, the functionality described herein may be implemented in a distributed fashion across multiple similar platforms for convenience to balance processing loads.

For example, the cloud computing-based data all-flash memory optimization system 100 can include a network port 110 connected to a network, one or more processors 120 for executing program instructions, a communication bus 130, and various forms of storage media 140, such as magnetic disk, ROM, or RAM, or any combination thereof. By way of example, the cloud computing-based data all-flash memory optimization system 100 can also include program instructions stored in ROM, RAM, or other types of non-transitory storage media, or any combination thereof. The method of the present application may be implemented in accordance with these program instructions. The cloud computing based data full flash memory optimization system 100 also includes an Input/Output (I/O) interface 150 between the computer and other Input/Output devices.

For ease of illustration, only one processor is depicted in the cloud computing based data all-flash memory optimization system 100. It should be noted, however, that the cloud computing-based data all-flash memory optimization system 100 of the present application may also include multiple processors, and thus the steps performed by one processor described in the present application may also be performed jointly or separately by multiple processors. For example, if the processors of the cloud computing-based data full-flash memory optimization system 100 perform steps a and B, it should be understood that steps a and B may be performed by two different processors together or performed separately in one processor. For example, the first processor performs step a, the second processor performs step B, or the first processor and the second processor together perform steps a and B.

In addition, the embodiment of the invention also provides a readable storage medium, wherein computer executable instructions are preset in the readable storage medium, and when a processor executes the computer executable instructions, the data full-flash memory optimization method based on cloud computing is realized.

It should be noted that in order to simplify the presentation of the disclosure and thereby aid in understanding one or more embodiments of the invention, various features are sometimes grouped together in a single embodiment, figure, or description thereof.

Claims

Translated fromChinese

1.一种基于云计算的数据全闪存储优化方法，其特征在于，所述方法包括：1. A method for optimizing data all-flash storage based on cloud computing, characterized in that the method comprises:

实时采集全闪存储集群中各节点的多维性能数据集合，所述多维性能数据集合包括节点存储空间碎片化率、输入输出请求队列深度及网络链路带宽利用率；Real-time collection of multi-dimensional performance data sets of each node in the all-flash storage cluster, including the node storage space fragmentation rate, input and output request queue depth and network link bandwidth utilization;

接收用户终端发送的存储操作请求流，解析所述存储操作请求流中的待存储数据块标识集合及对应的操作模式标签；Receiving a storage operation request stream sent by a user terminal, parsing a set of identifiers of data blocks to be stored and corresponding operation mode tags in the storage operation request stream;

基于所述待存储数据块标识集合遍历历史访问记录库，提取所述待存储数据块的历史访问时序特征及关联数据块的物理存储位置邻近度；Traversing a historical access record library based on the set of identifiers of the data blocks to be stored, extracting historical access time sequence features of the data blocks to be stored and physical storage location proximity of the associated data blocks;

根据所述多维性能数据集合及所述历史访问时序特征，构建所述待存储数据块在预设时间周期内的动态访问热度预测模型；Constructing a dynamic access popularity prediction model for the data block to be stored within a preset time period based on the multi-dimensional performance data set and the historical access time series characteristics;

基于所述动态访问热度预测模型及所述物理存储位置邻近度，生成跨节点的分片存储拓扑图及缓存预加载策略矩阵；Generate a cross-node shard storage topology map and a cache preloading strategy matrix based on the dynamic access heat prediction model and the physical storage location proximity;

基于所述动态访问热度预测模型及所述物理存储位置邻近度，生成缓存预加载策略矩阵，包括：Generating a cache preloading strategy matrix based on the dynamic access popularity prediction model and the physical storage location proximity includes:

根据所述物理存储位置邻近度识别与所述待存储数据块存在空间关联的目标数据块序列；identifying a target data block sequence spatially associated with the data block to be stored according to the physical storage location proximity;

基于所述动态访问热度预测模型，预测所述目标数据块序列在所述预设时间周期内的并发访问概率分布；Based on the dynamic access popularity prediction model, predicting the concurrent access probability distribution of the target data block sequence within the preset time period;

根据所述并发访问概率分布及所述多维性能数据集合中的节点存储空间碎片化率，计算各目标数据块的预加载优先级系数；Calculating a preloading priority coefficient for each target data block based on the concurrent access probability distribution and the node storage space fragmentation rate in the multidimensional performance data set;

基于所述预加载优先级系数，为所述目标数据块序列分配差异化的缓存保留周期及压缩等级参数；Allocating differentiated cache retention period and compression level parameters to the target data block sequence based on the preloading priority coefficient;

将所述缓存保留周期及压缩等级参数按节点维度整合，生成所述缓存预加载策略矩阵的多层结构；Integrating the cache retention period and compression level parameters according to the node dimension to generate a multi-layer structure of the cache preloading strategy matrix;

其中，所述基于所述预加载优先级系数，为所述目标数据块序列分配差异化的缓存保留周期及压缩等级参数，包括：The step of allocating differentiated cache retention periods and compression level parameters to the target data block sequence based on the preloading priority coefficient includes:

获取所述全闪存储集群中边缘节点的当前缓存空间占用率及历史缓存替换频率；Obtaining the current cache space occupancy rate and historical cache replacement frequency of the edge nodes in the all-flash storage cluster;

当所述并发访问概率分布高于第一预设阈值时，为对应数据块分配固定保留周期并锁定缓存空间；When the concurrent access probability distribution is higher than a first preset threshold, allocating a fixed retention period for the corresponding data block and locking the cache space;

当所述并发访问概率分布低于第二预设阈值时，基于所述历史缓存替换频率动态调整保留周期衰减速率；When the concurrent access probability distribution is lower than a second preset threshold, dynamically adjusting the retention period decay rate based on the historical cache replacement frequency;

根据所述压缩等级参数对低优先级数据块执行可变压缩率处理，并记录压缩后数据块的元数据校验信息。Variable compression rate processing is performed on the low priority data block according to the compression level parameter, and metadata verification information of the compressed data block is recorded.

2.根据权利要求1所述的基于云计算的数据全闪存储优化方法，其特征在于，所述根据所述多维性能数据集合及所述历史访问时序特征，构建所述待存储数据块在预设时间周期内的动态访问热度预测模型，包括：2. The cloud computing-based all-flash data storage optimization method according to claim 1, wherein constructing a dynamic access popularity prediction model for the data block to be stored within a preset time period based on the multi-dimensional performance data set and the historical access time series characteristics comprises:

从所述历史访问时序特征中提取周期性访问峰值区间、突发访问事件时间戳及访问间隔分布统计值；Extracting periodic access peak intervals, burst access event timestamps, and access interval distribution statistics from the historical access time series features;

识别所述操作模式标签对应的读写操作比例，当读写操作比例中的读取操作占比超过预设阈值时，激活热点数据预测标记；Identifying a read-write operation ratio corresponding to the operation mode tag, and activating a hotspot data prediction mark when a proportion of read operations in the read-write operation ratio exceeds a preset threshold;

结合所述周期性访问峰值区间及所述突发访问事件时间戳，生成时间维度访问概率密度函数；Combining the periodic access peak interval and the burst access event timestamp, generating a time dimension access probability density function;

基于所述访问间隔分布统计值及所述热点数据预测标记，修正所述时间维度访问概率密度函数的权重参数；Based on the access interval distribution statistics and the hotspot data prediction mark, modifying the weight parameter of the time dimension access probability density function;

将修正后的时间维度访问概率密度函数与所述多维性能数据集合中的网络链路带宽利用率进行耦合计算，生成所述动态访问热度预测模型，所述动态访问热度预测模型输出所述待存储数据块的热度等级参数。The corrected time dimension access probability density function is coupled with the network link bandwidth utilization in the multi-dimensional performance data set to generate the dynamic access heat prediction model, and the dynamic access heat prediction model outputs the heat level parameter of the data block to be stored.

3.根据权利要求2所述的基于云计算的数据全闪存储优化方法，其特征在于，所述结合所述周期性访问峰值区间及所述突发访问事件时间戳，生成时间维度访问概率密度函数之后，所述方法还包括：3. The cloud computing-based all-flash data storage optimization method according to claim 2, characterized in that after generating the time dimension access probability density function by combining the periodic access peak interval and the burst access event timestamp, the method further comprises:

根据所述全闪存储集群的节点拓扑结构，划分多个数据访问区域，并为每个区域配置独立的访问频率监测器；Divide the data access area into multiple areas according to the node topology of the all-flash storage cluster, and configure an independent access frequency monitor for each area;

通过所述访问频率监测器收集各区域在历史时间窗口内的实际访问次数分布数据；Collecting the actual number of visits to each area within a historical time window through the visit frequency monitor;

将所述实际访问次数分布数据与所述时间维度访问概率密度函数进行拟合验证，调整所述概率密度函数的曲线形态；Fitting and verifying the actual access frequency distribution data with the time dimension access probability density function, and adjusting the curve shape of the probability density function;

基于调整后的概率密度函数，更新所述动态访问热度预测模型的热度等级参数计算规则。Based on the adjusted probability density function, the heat level parameter calculation rules of the dynamic access heat prediction model are updated.

4.根据权利要求1所述的基于云计算的数据全闪存储优化方法，其特征在于，基于所述动态访问热度预测模型及所述物理存储位置邻近度，生成跨节点的分片存储拓扑图，包括：4. The cloud computing-based all-flash data storage optimization method according to claim 1, wherein generating a cross-node shard storage topology map based on the dynamic access popularity prediction model and the physical storage location proximity comprises:

根据所述动态访问热度预测模型输出的热度等级参数，确定所述待存储数据块的分片冗余度阈值及最小副本数量；Determining a shard redundancy threshold and a minimum number of copies of the data block to be stored according to the heat level parameter output by the dynamic access heat prediction model;

遍历所述全闪存储集群中所有节点的存储空间碎片化率及输入输出请求队列深度，筛选满足分片容量约束的候选节点子集；Traversing the storage space fragmentation rates and input and output request queue depths of all nodes in the all-flash storage cluster, and screening a subset of candidate nodes that meet the shard capacity constraints;

基于所述物理存储位置邻近度，计算所述候选节点子集中各节点的存储位置关联评分；Calculating a storage location relevance score for each node in the candidate node subset based on the physical storage location proximity;

根据所述存储位置关联评分及所述多维性能数据集合中的网络链路带宽利用率，构建节点间的存储路径权重表；Constructing a storage path weight table between nodes based on the storage location association score and the network link bandwidth utilization in the multi-dimensional performance data set;

基于所述存储路径权重表及所述最小副本数量，生成包含冗余路径交叉连接的分片存储拓扑图。Based on the storage path weight table and the minimum number of replicas, a shard storage topology diagram including redundant path cross connections is generated.

5.根据权利要求4所述的基于云计算的数据全闪存储优化方法，其特征在于，所述方法还包括：5. The cloud computing-based all-flash data storage optimization method according to claim 4, further comprising:

检测所述分片存储拓扑图中是否存在单点故障风险路径；Detect whether there is a single point failure risk path in the shard storage topology diagram;

若存在单点故障风险路径，则基于所述候选节点子集的实时性能数据，动态插入备用存储节点以形成环形冗余路径；If there is a single point failure risk path, dynamically insert a backup storage node based on the real-time performance data of the candidate node subset to form a ring redundant path;

根据所述环形冗余路径的节点间延迟数据，优化所述分片存储拓扑图中的副本同步优先级；Optimizing the replica synchronization priority in the shard storage topology diagram according to the inter-node delay data of the ring redundant path;

将优化后的分片存储拓扑图与所述缓存预加载策略矩阵进行关联映射，生成联合存储策略配置文件。The optimized shard storage topology map is associated and mapped with the cache preloading strategy matrix to generate a joint storage strategy configuration file.

6.根据权利要求4所述的基于云计算的数据全闪存储优化方法，其特征在于，所述方法还包括：6. The cloud computing-based all-flash data storage optimization method according to claim 4, further comprising:

为所述待存储数据块创建存储生命周期跟踪日志，以通过所述存储生命周期跟踪日志记录分片副本的创建时间戳、迁移事件及缓存状态变更历史；Creating a storage lifecycle tracking log for the data block to be stored, so as to record the creation timestamp, migration events and cache status change history of the shard copy through the storage lifecycle tracking log;

分析所述存储生命周期跟踪日志中的异常事件序列，识别存储性能瓶颈的潜在原因；Analyzing abnormal event sequences in the storage lifecycle tracking log to identify potential causes of storage performance bottlenecks;

根据所述潜在原因调整所述动态访问热度预测模型的参数更新频率及所述分片冗余度阈值的计算逻辑；Adjusting the parameter update frequency of the dynamic access popularity prediction model and the calculation logic of the shard redundancy threshold according to the potential cause;

将调整后的参数更新频率及计算逻辑实时同步至所述全闪存储集群的所有管理节点。The adjusted parameter update frequency and calculation logic are synchronized to all management nodes of the all-flash storage cluster in real time.

7.根据权利要求5所述的基于云计算的数据全闪存储优化方法，其特征在于，所述方法还包括：7. The method for optimizing data all-flash storage based on cloud computing according to claim 5, further comprising:

在所述全闪存储集群中部署动态负载均衡控制器，实时收集各节点的性能数据波动趋势；Deploy a dynamic load balancing controller in the all-flash storage cluster to collect performance data fluctuation trends of each node in real time;

根据所述性能数据波动趋势识别潜在过载节点，并触发数据块副本迁移预警机制；Identify potential overloaded nodes based on the performance data fluctuation trend and trigger a data block replica migration early warning mechanism;

基于所述分片存储拓扑图中的冗余路径交叉连接关系，选择迁移目标节点并计算迁移路径的最优延迟参数；Based on the redundant path cross-connection relationship in the shard storage topology graph, selecting a migration target node and calculating an optimal delay parameter of the migration path;

在执行迁移操作过程中，维持原始数据块的访问可用性并同步更新所述联合存储策略配置文件；During the migration operation, maintaining the access availability of the original data block and synchronously updating the joint storage policy configuration file;

其中，所述触发数据块副本迁移预警机制，包括：The triggering of the data block replica migration early warning mechanism includes:

监测所述潜在过载节点的输入输出请求队列深度的增长速率及存储空间碎片化率的变化梯度；Monitoring the growth rate of the input and output request queue depth and the change gradient of the storage space fragmentation rate of the potential overloaded node;

当所述增长速率超过队列深度预警线且所述变化梯度达到碎片化阈值时，生成迁移任务队列；When the growth rate exceeds the queue depth warning line and the change gradient reaches the fragmentation threshold, a migration task queue is generated;

根据所述迁移任务队列中的数据块优先级标签，排序迁移执行顺序并分配迁移带宽资源；sorting the migration execution order and allocating migration bandwidth resources according to the data block priority tags in the migration task queue;

在迁移完成后，验证目标节点的数据一致性并更新所述分片存储拓扑图的节点状态标识。After the migration is completed, the data consistency of the target node is verified and the node status identifier of the shard storage topology map is updated.

8.根据权利要求5所述的基于云计算的数据全闪存储优化方法，其特征在于，所述方法还包括：8. The cloud computing-based all-flash data storage optimization method according to claim 5, further comprising:

在所述全闪存储集群中实施数据完整性验证循环，定期扫描分片副本的校验和信息，并当检测到校验和异常时，基于所述分片存储拓扑图中的冗余路径交叉连接关系，发起副本修复请求；Implementing a data integrity verification cycle in the all-flash storage cluster, periodically scanning checksum information of shard replicas, and initiating a replica repair request based on redundant path cross-connection relationships in the shard storage topology graph when a checksum anomaly is detected;

根据所述缓存预加载策略矩阵中的压缩等级参数，对修复后的数据块执行重新压缩及缓存状态刷新操作，并更新所述联合存储策略配置文件中的元数据映射关系并反馈修复结果至用户终端。According to the compression level parameter in the cache preloading strategy matrix, recompression and cache status refresh operations are performed on the repaired data block, and the metadata mapping relationship in the joint storage strategy configuration file is updated and the repair result is fed back to the user terminal.

9.一种基于云计算的数据全闪存储优化系统，其特征在于，所述基于云计算的数据全闪存储优化系统包括处理器和存储器，所述存储器和所述处理器连接，所述存储器用于存储程序、指令或代码，所述处理器用于执行所述存储器中的程序、指令或代码，以实现上述权利要求1-8任意一项所述的基于云计算的数据全闪存储优化方法。9. A cloud computing-based all-flash data storage optimization system, characterized in that the cloud computing-based all-flash data storage optimization system includes a processor and a memory, the memory is connected to the processor, the memory is used to store programs, instructions or codes, and the processor is used to execute the programs, instructions or codes in the memory to implement the cloud computing-based all-flash data storage optimization method described in any one of claims 1-8 above.