Disclosure of Invention
The invention aims to provide a heterogeneous computing power fusion and dynamic optimization distribution method and a system, which are used for carrying out real-time data analysis on monitoring indexes of each node, thereby being beneficial to realizing more refined computing power resource management. By analyzing the data, the problems of unbalanced resource use, excessive node load and the like can be found, and the accurate mapping between the data is realized by definitely calculating the key fields in the task allocation data and the calculation node connection data. This helps ensure that tasks are properly assigned to nodes with the proper resources and status. After the data mapping is completed, network layer fusion is carried out, which is helpful for realizing more efficient task execution and resource utilization in computing power distribution, and can solve the problems in the prior art.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the heterogeneous calculation force fusion and dynamic optimization distribution method comprises the following steps:
s1, confirming central computing capacity, namely collecting hardware configuration information in an acquisition interface, processing the collected hardware configuration information, extracting key parameters, performing benchmark test according to the extracted key parameters, and obtaining real-time computing capacity data after the benchmark test;
S2, load balancing strategy formulation, namely analyzing load conditions according to the real-time computing capacity data and the standard computing capacity data, and performing load balancing strategy formulation according to the load conditions, wherein the real-time computing capacity strategy data is obtained after the load balancing strategy formulation is completed;
S3, node task allocation, namely performing task demand analysis on the received task request, performing task allocation of the nodes on the analyzed task by utilizing real-time computing capacity strategy data, and obtaining computing power task allocation data after the task allocation of the nodes is completed;
S4, node information cooperative processing, namely establishing a node communication mechanism for each node in the computing power task allocation data, and obtaining computing power node connection data after the establishment of the communication mechanism is completed;
and S5, processing task network fusion management, namely performing data mapping on the computing task distribution data and the computing node connection data, and performing network layer fusion after the data mapping is completed, so as to obtain standard computing task distribution data after the network layer fusion.
Preferably, for the step S1, collecting the hardware configuration information in the acquisition interface, processing the collected hardware configuration information, extracting key parameters, and performing a benchmark test according to the extracted key parameters, including:
The hardware configuration information in the acquisition interface comprises processor information, memory information, storage equipment information, network interface information, a computing unit and resource use information;
After the hardware configuration information is collected, extracting key parameters of the hardware configuration information;
The key parameters of the processor information comprise a core number, a thread number, a main frequency, a maximum acceleration frequency and a cache size, the key parameters of the memory information comprise a total capacity, a type, a speed and a time sequence, the key parameters of the storage equipment information comprise a type, a capacity, a read-write speed and an interface type, the key parameters of the network interface information comprise a type, a speed, an MAC address and an IP address, the key parameters of the computing unit comprise a model number, a CUDA core number, a video memory size, a video memory type and a computing capacity, and the key parameters of the resource use information comprise a CPU use rate, a memory use rate, a disk frequency, a disk speed and a network flow;
extracting key parameters of the hardware configuration information and then performing a benchmark test;
And confirming the test result according to the reference test result, and generating real-time computing capacity data of the confirmed test result.
Preferably, for the load situation analysis according to the real-time computing capability data and the standard computing capability data in the step S2, the load balancing policy formulation according to the load situation includes:
retrieving standard computing capacity data from a database, wherein the standard computing capacity data is a standard expected performance index or a statistical average value of historical performance data;
performing data comparison on all performance indexes with the same attribute in the real-time computing capacity data and the standard computing capacity data, and calculating a difference value of each performance index according to a data comparison result;
carrying out load condition analysis according to the difference value;
the load condition is divided into normal load, light load and heavy load, wherein the normal load is the case when the percentage of the difference value is in the range of 80% -100%, the light load is the case when the percentage of the difference value is less than 80%, and the heavy load is the case when the percentage of the difference value is greater than 100%;
Constructing a load analysis matrix according to the analyzed load condition, and marking load balancing operation according to the constructed load matrix;
making a load balancing strategy according to the marked load balancing operation in the load analysis matrix, wherein the load balancing strategy comprises task migration, flow regulation and resource redistribution;
And after the load balancing strategy is formulated, generating the real-time computing capacity strategy data.
Preferably, task demand analysis is performed on the task request received in the step S3, and the task to be analyzed is subjected to task allocation of the node by using the real-time computing capability policy data, including:
after the computing power network receives the task request of the upper layer application, analyzing the task requirement of the task request;
The parsed task demands include task type, resource demand, execution time, priority, and data dependency;
matching calculation is carried out on the task demand data by using a task matching calculation method, and a matching plan of the task demand data is obtained after the matching calculation is completed;
sorting matching data in the real-time computing capacity policy data, wherein the matching data comprises node IDs, available resources, current loads and node states;
Distributing the matching plan to the node with the minimum load in the matching data by using a greedy algorithm;
When the nodes in the matching data are overloaded, the dynamic load balancing is utilized to redistribute part of tasks to the nodes with lower loads;
And obtaining the calculation task allocation data after the matching plan is allocated to the matching data.
Preferably, the establishing of the node communication mechanism is performed for each node in the computing power task allocation data in S4, including:
identifying the node ID of each node in the calculation task allocation data;
selecting a communication protocol according to the node ID, wherein the communication protocol comprises TCP, UDP and MPI;
Confirming parameters of the selected communication protocol after the communication protocol is selected, wherein the parameters comprise port numbers, communication modes and data transmission formats;
after the parameter confirmation of the communication protocol is completed, establishing communication connection between nodes, wherein the communication connection comprises heartbeat mechanism connection or bidirectional connection;
the heartbeat mechanism is connected to send heartbeat packets to each node at fixed time, and the survival state of the nodes is confirmed according to the heartbeat packet parameters;
and after the communication connection confirmation is completed, obtaining connection data of the receiving task and the computing power node in the computing network.
Preferably, setting a heartbeat packet sending time interval corresponding to the heartbeat mechanism includes:
Extracting the total number of nodes;
Extracting the data transmission time length of a data packet of a unit data volume corresponding to each node, wherein the value range of the unit data volume is 10BM-100BM;
extracting the data transmission rate corresponding to each node;
Under the current data transmission environment corresponding to the node, acquiring the data transmission time length of the data packet of the unit data quantity of the node and the data transmission rate corresponding to each node, and acquiring the heartbeat packet transmission time interval coefficient corresponding to each node;
The heartbeat packet sending time interval coefficient corresponding to each node is obtained through the following formula:
wherein F represents a heartbeat packet transmission time interval coefficient corresponding to each node, n represents the number of data transmission times corresponding to each node, Bi represents the data transmission rate corresponding to the ith data transmission corresponding to each node, Be represents the data transmission rate of the data packet of the unit data amount corresponding to each node, Ti represents the data transmission time length of the ith data transmission corresponding to each node, Te represents the data transmission time length of the data packet of the unit data amount corresponding to the node, xi represents the number of the unit data amount contained in the ith data transmission corresponding to each node, and Ts represents the theoretical data transmission time length of the data packet of the unit data amount corresponding to the maximum data transmission rate;
acquiring heartbeat packet transmission time intervals corresponding to all nodes according to the heartbeat packet transmission time interval coefficients;
the heartbeat packet sending time interval is obtained through the following formula:
The method comprises the steps of G representing a heartbeat packet sending time interval, G0 representing a preset initial heartbeat packet sending time interval, m representing the total number of nodes, Fi representing a heartbeat packet sending time interval coefficient corresponding to an ith node, Fz representing a heartbeat packet sending time interval coefficient intermediate value corresponding to m nodes, Fcmax representing the maximum value of heartbeat packet sending time interval coefficient difference values between every two nodes with data transmission interactive connection corresponding to m nodes, Fb representing heartbeat packet sending time interval coefficient standard deviation corresponding to m nodes, and Fcb representing heartbeat packet sending time interval coefficient difference value standard deviation between every two nodes with data transmission interactive connection;
Sending heartbeat packets to each node according to the heartbeat packet sending time interval;
And monitoring the node survival rate and the node revival rate corresponding to each heartbeat packet transmission in real time, and adjusting the heartbeat packet transmission time interval according to the node survival rate and the node revival rate.
Preferably, the node survival rate and the node revival rate corresponding to each heartbeat packet sending are monitored in real time, and the heartbeat packet sending time interval is adjusted according to the node survival rate and the node revival rate, including:
Extracting node survival rate and node revival rate which are correspondingly obtained by sending heartbeat packets each time;
comparing the node survival rate with a preset node survival rate threshold;
When the node survival rate is lower than a preset node survival rate threshold value, the node survival rate and the node revival rate are utilized to adjust the heartbeat packet sending time interval, and the adjusted heartbeat packet sending time interval is obtained;
the adjusted heartbeat packet sending time interval is obtained through the following formula:
The method comprises the steps of (1) determining a heartbeat packet sending time interval after adjustment, wherein Gt represents the heartbeat packet sending time interval, G0 represents a preset initial heartbeat packet sending time interval, k represents the heartbeat packet sending times before the node survival rate is lower than a preset node survival rate threshold value, Pc represents the node survival rate corresponding to the node survival rate lower than the preset node survival rate threshold value, Pf represents the node reactivation rate corresponding to the node survival rate lower than the preset node survival rate threshold value, Pci represents the node survival rate corresponding to the ith heartbeat packet sending, Pfi represents the node reactivation rate corresponding to the ith heartbeat packet sending, and Pcy represents the preset node survival rate threshold value;
and sending the heartbeat packet to each node according to the adjusted heartbeat packet sending time interval.
Preferably, in S5, the data mapping is performed on the computing task allocation data and the computing node connection data, and the network layer fusion is performed after the data mapping is completed, including:
Confirming key fields in the computing power task allocation data and the computing power node connection data, wherein the key fields of the computing power task allocation data comprise task IDs, allocated node IDs, task types, required resources, priorities and data dependencies;
Invoking a mapping rule from a database, and carrying out data mapping on the computing task allocation data and the computing node connection data according to the mapping rule;
the data mapping flow comprises extracting task information from the computing power task allocation data according to the task ID and the allocated node ID, finding out corresponding node data from the computing power node connection data, and comparing;
When the comparison result meets the mapping rule, mapping the task and the node, and obtaining task mapping data after the mapping is completed;
and when the comparison result does not meet the mapping rule, retrying the mapping task or reevaluating the task allocation strategy from the standby node until the comparison result meets the mapping rule.
Preferably, in S5, the data mapping is performed on the computing task allocation data and the computing node connection data, and network layer fusion is performed after the data mapping is completed, which further includes:
The task mapping data are subjected to network layer fusion, before the network layer fusion is carried out, the task mapping data are subjected to fusion data set arrangement, and the fusion data set comprises the steps of extracting relevant information of each task from the mapped task mapping data, and extracting the latest state and resource utilization condition of each node from the computing node connection data;
Updating node data according to task execution states in the fusion data set, wherein the node data comprises the task number, the current load and available resources allocated to each node, and when the load of the node exceeds a preset range, the node data is dynamically adjusted;
and carrying out data set integration on the data after node data updating or dynamic adjustment, and obtaining standard calculation task allocation data after the data set integration.
A heterogeneous computing force fusion and dynamic optimization distribution system comprising:
A task execution monitoring unit for:
when the standard calculation task allocation data is used for carrying out task implementation, the standard calculation task allocation data is monitored in real time;
Before real-time monitoring is carried out on the standard calculation task allocation data, monitoring indexes are confirmed, wherein the monitoring indexes comprise task states, execution progress, resource use conditions, node loads and abnormal events;
In the process of carrying out tasks by standard calculation task allocation data, monitoring the monitoring index of each node in real time;
real-time data analysis of standard calculation task allocation data is carried out according to the monitoring index of each node;
performing early warning prompt according to the analyzed real-time data condition, wherein the early warning prompt is that the threshold value of the real-time data exceeds a preset range, and marking the real-time data exceeding the preset range as early warning data;
the early warning data is generated by adjusting reports according to the abnormality degree;
and finally, transmitting the early warning data and the adjustment report to a display terminal for visual display.
Compared with the prior art, the invention has the following beneficial effects:
1. the heterogeneous computing power fusion and dynamic optimization distribution method and system provided by the invention have the advantages that the selected communication protocol has a reliable transmission mechanism and error recovery capability, the integrity and accuracy of data in the transmission process can be ensured, the safety of data transmission can be further enhanced through safety measures such as encryption and authentication, the privacy and the safety of data are protected, the computing power networks of different scales and types can be conveniently adapted through adjusting the parameters and the configuration of the communication protocol, partial tasks are redistributed to nodes with lower loads by utilizing a dynamic load balancing technology, the dynamic adjustment mechanism is favorable for keeping the stability and the high efficiency of the computing power network, and the performance bottleneck caused by single-point overload is avoided.
2. According to the heterogeneous computing power fusion and dynamic optimization distribution method and system, accurate mapping between data is achieved through defining key fields in computing power task distribution data and computing power node connection data. This helps ensure that tasks are properly assigned to nodes with the proper resources and status. After the data mapping is completed, network layer fusion is performed, which helps to achieve more efficient task execution and resource utilization in computing power distribution.
3. According to the heterogeneous computing power fusion and dynamic optimization distribution method and system, real-time data analysis is carried out on the monitoring index of each node, so that finer computing power resource management is facilitated. By analyzing the data, the problems of unbalanced resource use, excessive node load and the like can be found, so that powerful support is provided for optimizing calculation power distribution and improving resource utilization rate.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to solve the problem that in the prior art, no targeted load policy adjustment is performed according to the specific situation of the actual request task, thereby resulting in a decrease in the load capacity of data, referring to fig. 1 and 2, the present embodiment provides the following technical solutions:
the heterogeneous calculation force fusion and dynamic optimization distribution method comprises the following steps:
s1, confirming central computing capacity, namely collecting hardware configuration information in an acquisition interface, processing the collected hardware configuration information, extracting key parameters, performing benchmark test according to the extracted key parameters, and obtaining real-time computing capacity data after the benchmark test;
the method comprises the steps of collecting hardware configuration information in detail and performing benchmark test, so that potential performance problems or hardware faults can be found and solved in time;
S2, load balancing strategy formulation, namely analyzing load conditions according to the real-time computing capacity data and the standard computing capacity data, and performing load balancing strategy formulation according to the load conditions, wherein the real-time computing capacity strategy data is obtained after the load balancing strategy formulation is completed;
The computing power resources can be reasonably allocated and scheduled through the formulation and execution of the load balancing strategy, so that the waste and bottleneck of the resources are avoided;
S3, node task allocation, namely performing task demand analysis on the received task request, performing task allocation of the nodes on the analyzed task by utilizing real-time computing capacity strategy data, and obtaining computing power task allocation data after the task allocation of the nodes is completed;
the dynamic adjustment mechanism is helpful for maintaining the stability and the high efficiency of the power network and avoiding the performance bottleneck caused by single-point overload;
S4, node information cooperative processing, namely establishing a node communication mechanism for each node in the computing power task allocation data, and obtaining computing power node connection data after the establishment of the communication mechanism is completed;
The power calculation network can be conveniently adapted to different scales and types by adjusting the parameters and the configuration of the communication protocol, and the power calculation resource can be intelligently scheduled according to the performance and the load condition of the node;
S5, processing task network fusion management, namely performing data mapping on the computing task distribution data and computing node connection data, and performing network layer fusion after the data mapping is completed, wherein standard computing task distribution data is obtained after the network layer fusion;
the adjustability of the mapping rules enables the system to be customized according to actual requirements, and therefore flexibility and accuracy of task allocation are improved.
Aiming at the hardware configuration information in the acquisition interface in the step S1, the collected hardware configuration information is processed and key parameters are extracted, and the benchmark test is carried out according to the extracted key parameters, comprising the following steps:
The hardware configuration information in the acquisition interface comprises processor information, memory information, storage equipment information, network interface information, a computing unit and resource use information;
After the hardware configuration information is collected, extracting key parameters of the hardware configuration information;
The key parameters of the processor information comprise a core number, a thread number, a main frequency, a maximum acceleration frequency and a cache size, the key parameters of the memory information comprise a total capacity, a type, a speed and a time sequence, the key parameters of the storage equipment information comprise a type, a capacity, a read-write speed and an interface type, the key parameters of the network interface information comprise a type, a speed, an MAC address and an IP address, the key parameters of the computing unit comprise a model number, a CUDA core number, a video memory size, a video memory type and a computing capacity, and the key parameters of the resource use information comprise a CPU use rate, a memory use rate, a disk frequency, a disk speed and a network flow;
extracting key parameters of the hardware configuration information and then performing a benchmark test;
And confirming the test result according to the reference test result, and generating real-time computing capacity data of the confirmed test result.
Specifically, the hardware configuration information is collected in detail through the acquisition interface, and key parameters are extracted, so that the specific configuration and capability of each computing node can be accurately known, more refined resource allocation can be conducted in computing power allocation, tasks are guaranteed to be distributed to the most appropriate computing nodes, accordingly, the utilization rate of computing resources and the efficiency of the whole computing power network are improved, benchmark test is conducted according to the key parameters of the hardware configuration information, the actual performance of each computing node can be evaluated, the Benchmark test results can be used as important basis of computing power allocation, dynamic adjustment and optimization of computing power resources can be conducted according to requirements of the tasks, AI application requirements of different scales and types are met, potential performance problems or hardware faults can be found and solved in time through detailed collection of the hardware configuration information and Benchmark test, the tasks can be guaranteed to be successfully completed, computing power interruption or task failure caused by the hardware problems can be avoided, the continuous and development of the computing power network technology can be promoted, the whole computing power level can be improved, the quality can be improved, the Benchmark test can be more accurately conducted through collection and analysis of the hardware configuration information and Benchmark test results, the Benchmark test unit can be used for testing the performance of the computing power network, and the Benchmark test unit can be used for testing the CPU 35, the Benchmark test unit can be used for carrying out the Benchmark test by means 35, the accurate allocation of the CPU 35, the Benchmark test can be conducted, the performance of the Benchmark test can be timely discovered and solved, the potential performance problem can be successfully, the reliability of the computing power network can be well be guaranteed, the performance can be well broken, and the performance can be well, the performance can be well broken, and can be well caused, the performance can be well, the performance failure can be well broken, and caused by the performance failure can be well, and caused.
Aiming at the load situation analysis according to the real-time computing capacity data and the standard computing capacity data in the step S2, the load balancing strategy formulation according to the load situation comprises the following steps:
retrieving standard computing capacity data from a database, wherein the standard computing capacity data is a standard expected performance index or a statistical average value of historical performance data;
performing data comparison on all performance indexes with the same attribute in the real-time computing capacity data and the standard computing capacity data, and calculating a difference value of each performance index according to a data comparison result;
carrying out load condition analysis according to the difference value;
the load condition is divided into normal load, light load and heavy load, wherein the normal load is the case when the percentage of the difference value is in the range of 80% -100%, the light load is the case when the percentage of the difference value is less than 80%, and the heavy load is the case when the percentage of the difference value is greater than 100%;
Constructing a load analysis matrix according to the analyzed load condition, and marking load balancing operation according to the constructed load matrix;
making a load balancing strategy according to the marked load balancing operation in the load analysis matrix, wherein the load balancing strategy comprises task migration, flow regulation and resource redistribution;
And after the load balancing strategy is formulated, generating the real-time computing capacity strategy data.
Specifically, the current load condition can be rapidly identified through the comparison of the real-time computing capacity data and the standard computing capacity data, so that the adjustment can be timely made. The establishment and execution of the load balancing strategy are based on accurate data analysis and judgment, so that efficient utilization of resources is ensured, statistical average values or standard expected performance indexes are used as standard computing capacity data, and a reliable reference is provided for comparison of the real-time computing capacity data. By calculating the percentage of the difference value, the load conditions (normal load, light load and heavy load) can be accurately divided, accurate information is provided for the subsequent load balancing strategy formulation, and various load balancing strategies including task migration, flow regulation and resource redistribution are formulated according to different load conditions, so that the method can adapt to various complex network environments and service demands. The flexibility of the load balancing strategy is also embodied in that the strategy can be continuously adjusted and optimized according to the generation of the real-time computing capacity strategy data so as to adapt to the dynamic change of the network environment, and the computing power resources can be reasonably allocated and scheduled through the formulation and execution of the load balancing strategy, so that the waste and bottleneck of the resources are avoided. And under heavy load, the pressure can be relieved through resource redistribution, the normal operation of the service is ensured, and the intelligent decision support is provided for the subsequent load balancing strategy formulation by constructing a load analysis matrix and carrying out load balancing operation marking.
Performing task demand analysis on the task request received in the step S3, performing task allocation on nodes on the analyzed task by utilizing the real-time computing capacity policy data, and comprising the following steps:
after the computing power network receives the task request of the upper layer application, analyzing the task requirement of the task request;
The parsed task demands include task type, resource demand, execution time, priority, and data dependency;
matching calculation is carried out on the task demand data by using a task matching calculation method, and a matching plan of the task demand data is obtained after the matching calculation is completed;
sorting matching data in the real-time computing capacity policy data, wherein the matching data comprises node IDs, available resources, current loads and node states;
Distributing the matching plan to the node with the minimum load in the matching data by using a greedy algorithm;
When the nodes in the matching data are overloaded, the dynamic load balancing is utilized to redistribute part of tasks to the nodes with lower loads;
And obtaining the calculation task allocation data after the matching plan is allocated to the matching data.
Specifically, the detailed task demand analysis is performed on the received task request, and the detailed task demand analysis includes key information such as task type, resource demand, execution time, priority, data dependency and the like. The comprehensive analysis is helpful to ensure more accurate and efficient subsequent task allocation, the scheme can quickly generate a matching plan according to task demand data through a task matching calculation method, then, the matching plan is allocated to a node with the smallest load by using a greedy algorithm, the utilization of computational power resources is facilitated to be optimized, waiting time and calculation cost are reduced, and when overload conditions occur to the node in the matching data, the scheme can redistribute partial tasks to the node with lower load by using a dynamic load balancing technology. The dynamic adjustment mechanism is beneficial to maintaining the stability and the high efficiency of the computational power network, avoiding the performance bottleneck caused by single-point overload, and remarkably improving the utilization rate of computational power resources through accurate task matching and distribution and dynamic load balancing adjustment. The method is beneficial to reducing the computational effort cost, improving the overall calculation efficiency, and dynamically adjusting the distribution of computational effort resources according to the change of task demands, thereby enhancing the flexibility and the expandability of the system. This enables the power network to better accommodate AI application requirements of different scales and types.
In order to solve the problem that in the prior art, the received task is not fused with the current network layer by the targeted node, so that the effect of computing force data fusion is poor, referring to fig. 1 and 2, the present embodiment provides the following technical scheme:
Establishing a node communication mechanism for each node in the computing task allocation data in the S4, wherein the node communication mechanism comprises the following steps:
identifying the node ID of each node in the calculation task allocation data;
selecting a communication protocol according to the node ID, wherein the communication protocol comprises TCP, UDP and MPI;
Confirming parameters of the selected communication protocol after the communication protocol is selected, wherein the parameters comprise port numbers, communication modes and data transmission formats;
after the parameter confirmation of the communication protocol is completed, establishing communication connection between nodes, wherein the communication connection comprises heartbeat mechanism connection or bidirectional connection;
the heartbeat mechanism is connected to send heartbeat packets to each node at fixed time, and the survival state of the nodes is confirmed according to the heartbeat packet parameters;
and after the communication connection confirmation is completed, obtaining connection data of the receiving task and the computing power node in the computing network.
Specifically, by identifying the node ID of each node in the computing task allocation data, the system can accurately identify and manage each node, ensure that computing resources are efficiently utilized, select communication protocols (such as TCP, UDP and MPI) according to the node IDs, and confirm parameters of the communication protocols (such as port numbers, communication modes and data transmission formats), so that the system can flexibly adapt to different application scenes and computing requirements, establish heartbeat mechanism connection, enable the system to send heartbeat packets to each node at regular time, timely confirm the survival state of the node, ensure the reliability and stability of computing task allocation, and realize bidirectional real-time transmission of tasks and data between the nodes, improve the response speed and flexibility of computing task allocation, enable the heartbeat mechanism connection to discover and process node faults in time, and ensure the stability and availability of computing network.
When the node fails, the system can quickly adjust the calculation force distribution strategy, schedule the task to other available nodes, avoid the task interruption and the data loss, select the communication protocol with reliable transmission mechanism and error recovery capability, ensure the integrity and accuracy of the data in the transmission process, further enhance the safety of the data transmission through security measures such as encryption and authentication, protect the user privacy and the data safety, and can conveniently adapt to calculation force networks of different scales and types by adjusting the parameters and configuration of the communication protocol, intelligently schedule calculation force resources according to the performance and load condition of the node, and ensure the efficient and balanced processing of the task.
Specifically, setting a heartbeat packet sending time interval corresponding to the heartbeat mechanism includes:
Extracting the total number of nodes;
Extracting the data transmission time length of a data packet of a unit data volume corresponding to each node, wherein the value range of the unit data volume is 10BM-100BM;
extracting the data transmission rate corresponding to each node;
Under the current data transmission environment corresponding to the node, acquiring the data transmission time length of the data packet of the unit data quantity of the node and the data transmission rate corresponding to each node, and acquiring the heartbeat packet transmission time interval coefficient corresponding to each node;
The heartbeat packet sending time interval coefficient corresponding to each node is obtained through the following formula:
wherein F represents a heartbeat packet transmission time interval coefficient corresponding to each node, n represents the number of data transmission times corresponding to each node, Bi represents the data transmission rate corresponding to the ith data transmission corresponding to each node, Be represents the data transmission rate of the data packet of the unit data amount corresponding to each node, Ti represents the data transmission time length of the ith data transmission corresponding to each node, Te represents the data transmission time length of the data packet of the unit data amount corresponding to the node, xi represents the number of the unit data amount contained in the ith data transmission corresponding to each node, and Ts represents the theoretical data transmission time length of the data packet of the unit data amount corresponding to the maximum data transmission rate;
acquiring heartbeat packet transmission time intervals corresponding to all nodes according to the heartbeat packet transmission time interval coefficients;
the heartbeat packet sending time interval is obtained through the following formula:
The method comprises the steps of G representing a heartbeat packet sending time interval, G0 representing a preset initial heartbeat packet sending time interval, m representing the total number of nodes, Fi representing a heartbeat packet sending time interval coefficient corresponding to an ith node, Fz representing a heartbeat packet sending time interval coefficient intermediate value corresponding to m nodes, Fcmax representing the maximum value of heartbeat packet sending time interval coefficient difference values between every two nodes with data transmission interactive connection corresponding to m nodes, Fb representing heartbeat packet sending time interval coefficient standard deviation corresponding to m nodes, and Fcb representing heartbeat packet sending time interval coefficient difference value standard deviation between every two nodes with data transmission interactive connection;
Sending heartbeat packets to each node according to the heartbeat packet sending time interval;
And monitoring the node survival rate and the node revival rate corresponding to each heartbeat packet transmission in real time, and adjusting the heartbeat packet transmission time interval according to the node survival rate and the node revival rate.
The technical effect of the technical scheme is that the transmission time interval of the heartbeat packet is dynamically adjusted by comprehensively considering the factors such as the number of nodes, the data transmission rate, the data transmission time length and the like, so that the network can be ensured to be in a connection state, unnecessary heartbeat packet transmission is avoided, and the network communication efficiency is improved. The primary purpose of the heartbeat mechanism is to monitor the survival status of the node. According to the scheme, the node survival rate and the node revival rate are monitored in real time, and the heartbeat packet sending time interval is dynamically adjusted according to the information, so that node faults can be found and processed in time, and the stability and reliability of a network are enhanced. Frequent transmission of heartbeat packets may occupy network resources. According to the scheme, the sending time interval of the heartbeat packet is dynamically adjusted, so that the sending frequency of the heartbeat packet can be reduced on the premise of ensuring the network stability, and the utilization of network resources is optimized. The proposal considers the data transmission rate and the data transmission time length of different nodes, and can dynamically adjust the heartbeat packet sending time interval according to the actual condition of each node. This enables the mechanism to adapt to different network environments, improving the flexibility and adaptability of network communications. By optimizing the sending strategy of the heartbeat packet, network delay and bandwidth occupation can be reduced, thereby improving user experience. Such optimization is particularly important in application scenarios where real-time requirements are high.
In summary, the technical scheme realizes optimization in the aspects of network communication efficiency, stability, resource utilization, network adaptability, user experience and the like by dynamically adjusting the heartbeat packet sending time interval. This helps to improve the overall performance of network communications, meeting the ever-increasing demands of network communications.
Specifically, the method for monitoring the node survival rate and the node revival rate corresponding to each heartbeat packet sending in real time, and adjusting the heartbeat packet sending time interval according to the node survival rate and the node revival rate includes:
Extracting node survival rate and node revival rate which are correspondingly obtained by sending heartbeat packets each time;
comparing the node survival rate with a preset node survival rate threshold;
When the node survival rate is lower than a preset node survival rate threshold value, the node survival rate and the node revival rate are utilized to adjust the heartbeat packet sending time interval, and the adjusted heartbeat packet sending time interval is obtained;
the adjusted heartbeat packet sending time interval is obtained through the following formula:
The method comprises the steps of (1) determining a heartbeat packet sending time interval after adjustment, wherein Gt represents the heartbeat packet sending time interval, G0 represents a preset initial heartbeat packet sending time interval, k represents the heartbeat packet sending times before the node survival rate is lower than a preset node survival rate threshold value, Pc represents the node survival rate corresponding to the node survival rate lower than the preset node survival rate threshold value, Pf represents the node reactivation rate corresponding to the node survival rate lower than the preset node survival rate threshold value, Pci represents the node survival rate corresponding to the ith heartbeat packet sending, Pfi represents the node reactivation rate corresponding to the ith heartbeat packet sending, and Pcy represents the preset node survival rate threshold value;
and sending the heartbeat packet to each node according to the adjusted heartbeat packet sending time interval.
The technical effect of the technical scheme is that the abnormal conditions in the network, such as node faults or unstable network, can be timely found through monitoring the node survival rate and the node reviving rate in real time. When the node survival rate is lower than a preset threshold value, the node state can be monitored more frequently by adjusting the heartbeat packet sending time interval, so that measures are taken in time to recover the network connectivity, and the reliability of network communication is improved. Under the conditions of stable network and higher node survival rate, the network resource can be saved by reducing the sending frequency of the heartbeat packet. When the node survival rate is reduced, the node fault can be detected more quickly by increasing the sending frequency of the heartbeat packet, and the resource waste on invalid communication is avoided. Such dynamic adjustment mechanisms help optimize the utilization of network resources. According to the scheme, the heartbeat packet sending time interval is dynamically adjusted according to the actual survival rate and the revival rate of the nodes, so that the self-adaptability of the system is embodied. This adaptation allows the system to automatically adjust policies to changes in the network environment, thereby providing more flexibility in coping with various network conditions. By timely detecting and processing node faults, the scheme is beneficial to reducing network interruption and delay and improving user experience. Such optimization is particularly important in application scenarios with high real-time requirements, such as online games, video conferences, etc. By automatically adjusting the heartbeat packet sending time interval, the scheme can reduce the frequency of manual intervention and reduce the cost of network maintenance. Meanwhile, the network problems can be found and processed in time, so that downtime and loss caused by network faults can be reduced.
In summary, by dynamically adjusting the heartbeat packet sending time interval, the technical scheme improves the reliability of network communication, optimizes the resource utilization, enhances the self-adaptability of the system, improves the user experience and reduces the maintenance cost. The technical effects enable the scheme to have wide application prospects in the field of network communication.
Aiming at the data mapping of the power calculation task allocation data and the power calculation node connection data in the S5, the network layer fusion is carried out after the data mapping is completed, and the method comprises the following steps:
Confirming key fields in the computing power task allocation data and the computing power node connection data, wherein the key fields of the computing power task allocation data comprise task IDs, allocated node IDs, task types, required resources, priorities and data dependencies;
Invoking a mapping rule from a database, and carrying out data mapping on the computing task allocation data and the computing node connection data according to the mapping rule;
the data mapping flow comprises extracting task information from the computing power task allocation data according to the task ID and the allocated node ID, finding out corresponding node data from the computing power node connection data, and comparing;
When the comparison result meets the mapping rule, mapping the task and the node, and obtaining task mapping data after the mapping is completed;
and when the comparison result does not meet the mapping rule, retrying the mapping task or reevaluating the task allocation strategy from the standby node until the comparison result meets the mapping rule.
The task mapping data are subjected to network layer fusion, before the network layer fusion is carried out, the task mapping data are subjected to fusion data set arrangement, and the fusion data set comprises the steps of extracting relevant information of each task from the mapped task mapping data, and extracting the latest state and resource utilization condition of each node from the computing node connection data;
Updating node data according to task execution states in the fusion data set, wherein the node data comprises the task number, the current load and available resources allocated to each node, and when the load of the node exceeds a preset range, the node data is dynamically adjusted;
and carrying out data set integration on the data after node data updating or dynamic adjustment, and obtaining standard calculation task allocation data after the data set integration.
Specifically, by defining key fields in the computing task allocation data and the computing node connection data, accurate mapping between the data is realized. This helps ensure that tasks are properly assigned to nodes with the proper resources and status. After the data mapping is completed, network layer fusion is carried out, which is helpful for realizing more efficient task execution and resource utilization in calculation power distribution, and the adjustability of the mapping rule enables the system to be customized according to actual requirements, thereby improving the flexibility and accuracy of task distribution. When the comparison result does not meet the mapping rule, the system can re-try the mapping task or re-evaluate the task allocation strategy from the standby node, so that the adaptability and the robustness of the system are enhanced, the fusion data set contains the task information and the latest data of the node state, and the system is helped to grasp the network state in real time, so that a more intelligent decision is made. When the node load exceeds the preset range, the system dynamically adjusts, so that overload and performance degradation are prevented, the resource utilization rate is improved, and standard calculation task allocation data can be generated by integrating the task mapping data and the node data, so that more efficient calculation allocation is realized. The optimized calculation force distribution can reduce resource waste and improve task execution efficiency, thereby meeting the calculation force demands of more users.
In order to solve the problem that in the prior art, when a receiving task performs task implementation, real-time monitoring is not performed on an implementation process, so that abnormal data in the implementation process cannot be quickly pre-warned, referring to fig. 1 and 2, the embodiment provides the following technical scheme:
A heterogeneous computing force fusion and dynamic optimization distribution system comprising:
A task execution monitoring unit for:
when the standard calculation task allocation data is used for carrying out task implementation, the standard calculation task allocation data is monitored in real time;
Before real-time monitoring is carried out on the standard calculation task allocation data, monitoring indexes are confirmed, wherein the monitoring indexes comprise task states, execution progress, resource use conditions, node loads and abnormal events;
In the process of carrying out tasks by standard calculation task allocation data, monitoring the monitoring index of each node in real time;
real-time data analysis of standard calculation task allocation data is carried out according to the monitoring index of each node;
performing early warning prompt according to the analyzed real-time data condition, wherein the early warning prompt is that the threshold value of the real-time data exceeds a preset range, and marking the real-time data exceeding the preset range as early warning data;
the early warning data is generated by adjusting reports according to the abnormality degree;
and finally, transmitting the early warning data and the adjustment report to a display terminal for visual display.
Specifically, by monitoring the standard calculation task distribution data in real time, the system can acquire key information such as task state, execution progress, resource use condition and the like in real time, so that effective management and scheduling of calculation resources are realized. The early warning mechanism is beneficial to identifying and coping with possible risks in advance, preventing the problem from being enlarged, generating early warning data and adjustment reports, providing timely decision support for management staff, helping the management staff to take measures quickly, reducing the risks, and enabling the early warning data and the adjustment reports to be visually displayed through the display terminal, so that the management staff can intuitively know the running state of the system and the existing problems. The visual display mode not only improves the readability of information, but also reduces the understanding and operation difficulty, so that a manager can make decisions faster, and can conduct real-time data analysis on the monitoring index of each node, which is beneficial to realizing finer calculation power resource management. By analyzing the data, the problems of unbalanced resource use, excessive node load and the like can be found, so that powerful support is provided for optimizing calculation power distribution and improving resource utilization rate, and a real-time monitoring and early warning mechanism is helpful for timely finding and repairing faults and anomalies in the system, so that the stability and reliability of the system are improved. The mechanism can also reduce the loss caused by system breakdown or data loss and the like, and ensure the continuity of service and the integrity of data.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made hereto without departing from the spirit and principles of the present invention.