Disclosure of Invention
In order to solve the above problems, the present invention proposes a predictive storage optimization method and system for a distributed storage system, which optimizes storage operations, ensures that resources are used as effectively as possible, and significantly improves I/O performance and reduces latency by intelligently determining the best node for data placement.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
in a first aspect, the present invention provides a predictive storage optimization method for a distributed storage system, including:
predicting the access mode under a future set period by adopting a pre-constructed prediction module and the state of each node in the distributed storage system;
according to the predicted access mode, evaluating the access rate of the data blocks and the proximity between the data blocks;
according to the predicted node state, evaluating access delay and resource utilization rate of each node;
the priority of the access rate, the proximity, the access delay and the resource utilization rate is preset, and the optimal node for placing each data block is determined according to the access rate and the proximity of the data block and the access delay and the resource utilization rate of the node under the premise of considering the node load balance and the priority, so that the data block is moved to the corresponding node for storage.
As an alternative embodiment, the related data of the access pattern includes a read/write operation, a file size, an access rate, and an access time; node states include I/O rate, latency, error rate, and bandwidth utilization.
As an alternative embodiment, the process of evaluating the access rate of the data blocks and the proximity between the data blocks includes: determining the access rate of the different data blocks based on the output probabilities thereof, thereby determining the data block with the highest access rate; the proximity of the data blocks accessed together is determined based on a common access rate, a shared characteristic, or a custom relationship.
As an alternative embodiment, the process of evaluating access delay and resource utilization of each node includes: based on the predicted node states, access delays and resource utilization of each node are determined, thereby determining the node with the lowest access delay and determining the node with sufficient available resources.
As an alternative embodiment, an asynchronous data movement mechanism based on a set movement priority is used to move the data blocks to the corresponding nodes for storage.
As an alternative embodiment, the asynchronous data movement mechanism based on setting movement priority includes: the priority is allocated to different data movement tasks, the key operation is prevented from being blocked by separating the movement process from the main execution flow, the data movement operation is implemented in a non-blocking mode, the read/write operation is ensured not to stop or delay due to the ongoing data transmission, and the asynchronous I/O mechanism or the special thread/process is used for data movement.
As an alternative embodiment, the asynchronous data movement mechanism based on setting the movement priority further includes: and carrying out batch processing on the plurality of data movement operations, and adopting a parallel processing mode.
As an alternative implementation mode, the predictive storage optimization method is integrated into the distributed storage system, and correct butt joint with the distributed storage system is ensured by adopting a data consistency mechanism and developing an error processing and recovering mechanism.
In a second aspect, the present invention provides a predictive storage optimization system for a distributed storage system, comprising:
the prediction module is configured to predict an access mode and the state of each node in the distributed storage system in a future set period by adopting a pre-constructed prediction module;
the data block evaluation module is configured to evaluate the access rate of the data blocks and the proximity between the data blocks according to the predicted access mode;
the node evaluation module is configured to evaluate access delay and resource utilization rate of each node according to the predicted node state;
the storage optimization module is configured to preset the priority of the access rate, the proximity, the access delay and the resource utilization rate, and determine the optimal node for placing each data block according to the access rate and the proximity of the data block and the access delay and the resource utilization rate of the node under the premise of considering the node load balance and the priority, so that the data block is moved to the corresponding node for storage.
In a third aspect, the invention provides an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of the first aspect.
In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a predictive storage optimization method and a predictive storage optimization system for a distributed storage system, which optimize storage operation, ensure that resources are used as effectively as possible, improve efficiency, and obviously improve I/O performance, reduce delay and improve overall system performance by intelligently judging the optimal node for data placement.
The invention provides a predictive storage optimization method and a predictive storage optimization system for a distributed storage system, and by optimizing layering and resource allocation, an organization can more effectively use a storage infrastructure thereof and reduce the expensive hardware upgrading requirement, thereby saving the operation cost; the method can adapt to the continuously changing data quantity and workload without manual intervention, and the storage system is easier to expand.
The invention can solve the problem of prediction caching; the AI model can predict which data will be needed in the near future, allowing the system to cache it ahead of time, reducing access time and improving read performance.
The invention can solve the problem of intelligent layering; by predicting data access patterns, the AI model can automate data movement between different storage tiers, placing frequently accessed data on fast, expensive storage, and less frequently accessed data on slower, cheaper storage.
The invention can solve the problem of load balancing; AI model predictive analysis can evenly distribute workload across all nodes in a storage system, preventing any single node from becoming a bottleneck, and ensuring optimal utilization of resources.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Detailed Description
The invention is further described below with reference to the drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, e.g., processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Example 1
The embodiment provides a predictive storage optimization method for a distributed storage system, as shown in fig. 1, including:
predicting the access mode under a future set period by adopting a pre-constructed prediction module and the state of each node in the distributed storage system;
according to the predicted access mode, evaluating the access rate of the data blocks and the proximity between the data blocks;
according to the predicted node state, evaluating access delay and resource utilization rate of each node;
the priority of the access rate, the proximity, the access delay and the resource utilization rate is preset, and the optimal node for placing each data block is determined according to the access rate and the proximity of the data block and the access delay and the resource utilization rate of the node under the premise of considering the node load balance and the priority, so that the data block is moved to the corresponding node for storage.
In this embodiment, access pattern related data including read/write operations, file sizes, access rates, and access times are obtained from the distributed storage system;
obtaining node status telemetry data from the storage hardware for characterizing workload characteristics including I/O rate, latency, error rate, and bandwidth utilization;
then, the data are cleaned, missing, incorrect or irrelevant parts in the data are processed, and outliers are deleted; the numerical data is then scaled by normalization and normalization to ensure that certain features do not dominate the model by their size.
In this embodiment, an AI prediction model is trained based on the above data to predict access patterns and workload characteristics;
specifically:
(1) Initial candidate models: selecting a set of candidate models that are appropriate for the nature of the data and the type of problem; such as linear regression, decision trees, random forests, support Vector Machines (SVMs), neural networks, or integrated methods (e.g., gradient boosting or bagging);
wherein, simple data with a small amount of characteristics adopts linear regression or logistic regression and other linear models; 2. nonlinear relation or complex data, adopting decision trees, random forests or gradient lifting; the deep learning and neural network is suitable for unstructured data, large data sets or complex relations; an integrated approach, when combining predictions for multiple models, may improve performance.
Complexity and nature of the data: evaluating data set size: whether small, medium or large; determining the data type: whether structured, unstructured or semi-structured; identifying the feature type: numbers, classifications, text, images, time series, etc.; analysis of data distribution: whether an outlier, imbalance class, or missing value exists; consider the relationship between elements: whether there is a linear relationship or a non-linear relationship.
(2) Reference evaluation criteria: defining a performance metric from the problem; for regression, consider the Mean Square Error (MSE), R squared; for classification, accuracy, precision, recall, F1 score, ROC-AUC are considered; additionally considering computing resources, interpretability, and model complexity.
(3) Training and verification: the data is divided into a training set, a validation set, and a test set (e.g., using k-fold cross validation), and each candidate model is trained using the training set.
(4) Super-parameter adjustment: the hyper-parameters are adjusted using techniques such as grid search, random search, or bayesian optimization, and the parameters are adjusted to optimize model performance on the validation set.
(5) Model comparison and selection: the performance of each model is evaluated using predefined metrics on the validation set, and the model with the best performance is selected based on the evaluation metrics.
(6) Model evaluation: validating the selected model on a test set (data not shown) to evaluate its generalization performance; performance metrics such as accuracy, precision, recall, F1 score, or specific metrics related to the storage system such as prediction delay or error rate.
(7) Model complexity adjustment: if the model is too complex (over-fit) or too simple (under-fit), its complexity can be adjusted accordingly; as in neural networks, modifying the number of layers, neurons, or methods using different architectures (e.g., CNN, RNN); regularization and discarding: regularization techniques or deletion layers are applied in the neural network to prevent overfitting.
In the embodiment, a pre-built prediction module is adopted to predict the access mode under a future set period and the state of each node in the distributed storage system, so as to realize low-delay prediction;
then, according to the predicted access mode and node state, respectively evaluating the access rate of the data blocks and the proximity between the data blocks, the access delay of each node, the resource utilization rate and the like;
for a data block with the highest access rate at a predicted future set period, a balancing node with the lowest access delay and the greatest resource availability is determined while taking into account the proximity of other data blocks accessed therewith.
Specifically:
(1) The access rate of the different data blocks is determined based on their output probabilities, thereby determining the data block with the highest access rate.
(2) Determining proximity by analyzing historical access patterns (indicating which data blocks are frequently accessed together) refers to a preference to place related or frequently accessed data blocks closer to each other; for example, the proximity is calculated based on factors such as common access rate, shared characteristics, or user-defined relationships.
(3) And determining access delay of each node according to the predicted node state, thereby determining the node with the lowest access delay.
(4) Determining nodes with enough available resources by monitoring the resource utilization of each node, such as CPU, memory, disk space, etc.; wherein the metrics are evaluated according to a predefined threshold or historical average to determine if the node has sufficient resources available to process other data;
node selection for optimal balance: consider a node that balances low latency (for most clients accessing the data) and sufficient resource availability.
(5) According to the importance of each standard, the priority is assigned to the access rate, the proximity, the access delay and the resource utilization rate in a self-defining manner; for example, low latency may be more important for certain types of data than for other types of data;
in case of conflict, such as when selecting between low access latency and proximity, depending on the defined priority; if two high priority data blocks of the same node are contended, then prioritizing based on access rate or potential impact on overall system performance;
then, to ensure that the decision does not cause overload of a specific node, load balancing is achieved by considering the distribution of other data blocks;
finally, an optimization algorithm (such as a greedy algorithm, an algorithm or a mathematical optimization technology) is adopted to determine the optimal node for placing each data block, and all constraints are considered;
when the data movement is triggered, if the data block needs to be moved, a data movement process is started through the data movement engine.
In this embodiment, the system is designed to expand horizontally, processing an increase in the amount of data without significantly degrading performance; uniformly distributing data processing load in the whole system; continuously monitoring and tuning the system to obtain the best delay, involving optimizing data serialization/parallelization, stream partitioning, and ensuring efficient resource utilization; realizing check points to maintain states when the system fails, thereby realizing quick recovery; utilizing a powerful error handling and retry mechanism to manage data stream interrupts or anomalies; continuously monitoring the performance of the real-time analysis engine; feedback from system performance is used to adjust the AI model and decision algorithm, improving accuracy and efficiency over time.
In this embodiment, a data movement engine is created that moves data between different storage layers or nodes according to decisions made by the analysis engine, which can perform such operations with minimal impact on system performance; ensuring that the data movement engine is tightly integrated with the architecture of the storage system to facilitate rapid movement of data.
In the embodiment, an asynchronous data movement mechanism based on the set movement priority is adopted to store the data blocks to the corresponding nodes; specifically:
asynchronous data movement: allowing data movement to occur independently of other processes, thereby enabling synchronous read/write operations when data transfer is performed in the background; asynchronous data movement enhances system responsiveness and efficiency by separating the movement process from the main execution flow to avoid blocking critical operations.
Non-blocking operation: implementing data movement operations in a non-blocking manner ensures that critical read/write operations are not stopped or delayed by ongoing data transfers; data movement is performed using asynchronous I/O mechanisms or dedicated threads/processes to prevent interference with critical operations.
Batch processing and optimization: a plurality of data movement operations are batched together. Bundled data transmission reduces overhead by minimizing the number of individual transmission requests and optimizing network utilization. Batching similar or related data moves helps to simplify the flow and reduce the overall impact on system performance.
Data movement priority: establishing priorities, assigning importance or urgency levels to different data movement tasks, ensuring that critical data movements (e.g., related to high priority or frequently accessed data) take precedence over less important transmissions, prioritizing critical data movements helps to maintain system responsiveness and ensures that necessary data is available when needed.
The optimization technology comprises the following steps: the data moving efficiency is improved by adopting parallel processing or pipelining and other technologies; parallel transmission tasks allow multiple transmissions to occur simultaneously, thereby maximizing throughput.
Compression or deduplication mechanisms are implemented to reduce the amount of data transmitted, optimize bandwidth usage, and minimize transmission time.
The present embodiment integrates a data movement engine with an AI analysis engine to receive and act upon data placement decisions, using triggers based on specific conditions (e.g., access patterns) to initiate data movement; minimizing impact on performance, continuously monitoring system load, and scheduling data movement during periods of low usage, implementing rate limiting to control the amount of resources used for data movement, allocating dedicated resources (e.g., network bandwidth) for data movement tasks.
In this embodiment, continuous learning and adaptation are performed on the prediction model; a feedback loop is implemented to periodically re-train the AI model with new data to ensure that the predictions remain accurate as the pattern evolves.
The method specifically comprises the following steps:
implementation of the feedback loop:
(1) Metrics related to storage system performance are collected and analyzed, including data access time, throughput, error rate, and system utilization.
(2) The validity of data placement decisions and their impact on system performance are tracked.
(3) The predictions of the AI model for data access patterns and workload characteristics are recorded.
(4) These predictions are compared to the actual access patterns and workload behavior to determine the accuracy of the model.
(5) It is determined where the prediction of the AI model is and why it is inaccurate.
(6) The data trend is analyzed to learn about patterns of changes or new behavior in the use of the system.
Optimization improvement of AI model:
(1) The training dataset of the model is updated periodically with the latest data, in combination with the new patterns and behaviors observed in the system.
(2) The model is periodically retrained using the updated dataset to improve its predictive accuracy.
(3) The relevance and impact of the different features used in the model are evaluated to determine if new functions should be added or irrelevant functions should be deleted.
(4) New functions are developed to better capture nuances of ever-changing data patterns.
(5) The algorithm of the model is adjusted according to the insights of performance monitoring and error analysis.
(6) The hyper-parameters of the model are optimized using techniques such as grid search, random search, or bayesian optimization to improve performance.
(7) The updated model is continuously validated using the latest data to ensure that its predictions are accurate and reliable.
(8) The impact of model changes on system performance is evaluated, ensuring that updates improve or maintain system efficiency.
(9) Ensuring that the updated model is seamlessly integrated into the operating environment without interrupting ongoing operation.
(10) Version control of the model deployment is maintained, allowing rollback to previous versions if necessary.
In this embodiment, the above-mentioned process is integrated into the operating environment of the storage system, so as to ensure that it is correctly docked with the existing management tool and process; and a comprehensive test is performed to ensure that the system is operating as expected under various load conditions and can recover from potential faults.
Specifically:
(1) Comprehensively analyzing the architecture of the existing storage system, and determining key components such as storage nodes, network infrastructure, data management software and the like;
determining potential integration points at which the data movement engine needs to interact with existing components to achieve seamless operation;
the compatibility of the data movement engine with the hardware, software and network components is evaluated by evaluating the specifications of the hardware, software and network components, the interfaces and the communication protocols.
(2) And determining the priority of the integration point and the key component which need immediate attention according to the evaluation result.
(3) Define the integration policies and mechanisms required to establish connections and interactions between the data movement engine and existing storage system components.
(4) Existing APIs and protocols are developed or used for the data movement engine to communicate with other components of the storage system, middleware is used or developed to facilitate communication and coordination between the data movement engine and the storage system, mechanisms to maintain data consistency throughout the system during and after data movement are implemented, error handling and recovery mechanisms are developed to address potential problems in the data movement process.
Wherein, data consistency assessment: defining rules or criteria to ensure data consistency before, during, and after data movement operations; verification, versioning, or time stamp based verification is implemented to verify the integrity of the data during transmission.
Error handling and recovery mechanism: by means of an error detection mechanism, data transmission errors, network faults or system faults in the mobile operation process are identified.
(5) Implementing a rollback mechanism or checkpoint to resume changes when data movement fails, thereby ensuring that the system returns to a consistent state;
maintaining atomicity, consistency, isolation, and durability (ACID properties) during data movement using transaction protocols or atomic operations;
a logging and reporting system is created to capture and analyze errors to diagnose problems and improve future operation.
The specific process comprises the following steps:
when an error occurs during data movement, the system triggers an error detection mechanism to identify the problem. Depending on the type of error (network failure, data corruption, etc.), an appropriate recovery mechanism is implemented. Rollback outstanding or failed transactions to maintain data consistency, or initiate retries for failed operations. Detailed error logs and reports are generated for post hoc analysis and system improvement. The results of compatibility assessment, critical component identification, and integration points will guide the formulation of policies to seamlessly integrate the data movement engine with existing storage systems. The evaluation results are used to prioritize tasks, define integration protocols, and determine error handling mechanisms to ensure smooth operation and maintain data integrity throughout the integration process.
(6) Finally, performing extensive tests, including simulation of various operation conditions, so as to verify integration and test the influence of data movement under different loads and scenes on the system performance; real-time monitoring is implemented to track the efficiency and impact of the data movement engine, and a feedback loop is established to continuously improve integration based on operational data and performance metrics.
In this embodiment, a gradual deployment system is implemented, starting from a small portion of the storage environment, to monitor the impact and adjust the configuration as needed; powerful monitoring and alerting is implemented to track the performance of the real-time optimization system and to detect any anomalies or problems.
The system deployment step:
(1) Comprehensive testing was performed in a simulated production environment.
(2) IT staff and stakeholders are trained on new system functions and operations.
(3) An incremental deployment strategy is adopted starting from a commissioning deployment in a limited area of the storage environment to monitor system behavior and impact.
(4) The deployment is gradually expanded in the whole storage environment, and the stability and performance of each stage are ensured.
(5) The system is closely monitored during and immediately after deployment to see if there are any immediate problems.
(6) Any problems and solutions in the deployment process are recorded for future reference.
And a system monitoring step:
(1) Monitoring tools and indexes are established, and the system monitoring tools are utilized to track performance indexes such as response time, throughput, error rate and resource utilization.
(2) Custom metrics such as prediction accuracy and data movement efficiency are developed for artificial intelligence driven optimization.
(3) An alarm system is implemented to notify an administrator of critical issues or deviations from normal operation.
(4) A dashboard was developed for real-time monitoring of system metrics and alarms.
A system abnormality detection step:
(1) An anomaly detection mechanism is formulated, and a statistical method is used for identifying deviation from a normal operation mode.
(2) A trained machine learning model is implemented to identify anomalies in system performance or data access patterns.
(3) The anomaly detection mechanism is integrated with an overall system monitoring tool.
(4) An automatic alarm is set for the detected abnormality, prompting immediate investigation.
(5) The system is audited regularly to find hidden problems or inefficient places.
(6) And updating the anomaly detection model and the anomaly detection system periodically according to the audit discovery.
Example 2
The embodiment provides a predictive storage optimization system of a distributed storage system, which comprises:
the prediction module is configured to predict an access mode and the state of each node in the distributed storage system in a future set period by adopting a pre-constructed prediction module;
the data block evaluation module is configured to evaluate the access rate of the data blocks and the proximity between the data blocks according to the predicted access mode;
the node evaluation module is configured to evaluate access delay and resource utilization rate of each node according to the predicted node state;
the storage optimization module is configured to preset the priority of the access rate, the proximity, the access delay and the resource utilization rate, and determine the optimal node for placing each data block according to the access rate and the proximity of the data block and the access delay and the resource utilization rate of the node under the premise of considering the node load balance and the priority, so that the data block is moved to the corresponding node for storage.
It should be noted that the above modules correspond to the steps described in embodiment 1, and the above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
In further embodiments, there is also provided:
an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method described in embodiment 1. For brevity, the description is omitted here.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.
The method in embodiment 1 may be directly embodied as a hardware processor executing or executed with a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.