CN119743496A

Movatterモバイル変換

Info

Publication number: CN119743496A
Application number: CN202411958661.5A
Authority: CN
Inventors: 杨益明; 吴春晖
Original assignee: Changzhou Hexion Technology Co ltd
Current assignee: Changzhou Hexion Technology Co ltd
Priority date: 2024-12-27
Filing date: 2024-12-27
Publication date: 2025-04-01
Anticipated expiration: 2044-12-27
Also published as: CN119743496B

Abstract

The invention discloses a data service system based on the Internet, which relates to the technical field of data management, wherein distributed sensor data is collected in real time through a data acquisition module, a network resource management module ensures high transmission efficiency through dynamic congestion degree monitoring and path optimization, a priority perception task scheduling module adjusts response priority according to task importance and evaluates real-time performance, a cloud processing and storage module performs high-efficiency processing and storage on the data by combining high-performance calculation and distributed storage, a comprehensive analysis module comprehensively analyzes the dynamic congestion degree and response priority deviation, evaluates the influence of network congestion on task delay, an optimization module dynamically adjusts a task transmission path and scheduling priority according to analysis results, congestion risk caused by large-scale concurrent flow is remarkably reduced, and the real-time performance and reliability of data transmission and task execution are improved, so that the integrity of analysis and the accuracy of decision are ensured.

Description

Data service system based on Internet

Technical Field

The invention relates to the technical field of data management, in particular to a data service system based on the Internet.

Background

Internet-based data services are a technical solution for providing data storage, management, transmission and processing through a network. It relies on cloud computing, big data processing, internet of things (IoT), and API interfaces, among other technologies, to store and provide decentralized data centrally to users through standardized protocols. The prior art includes data collection, real-time transmission, cloud storage, intelligent analysis, and various distribution modes (such as HTTP, webSocket or proprietary protocols). These services typically use distributed storage and high performance computing to support the high concurrency and large-scale data access requirements.

Typical implementations include cloud storage services (e.g., AWS S3), streaming media data analysis (e.g., APACHE KAFKA), and API gateway technologies (e.g., RESTful APIs) that support cross-regional, cross-device data sharing and collaboration. Users can access this data directly through a browser, mobile application, or other client without concern for the underlying hardware and infrastructure. For example, the smart city relies on the sensor network to collect traffic flow, air quality, energy consumption and other data in real time, and the data is uploaded to the cloud processing platform through the internet of things. The data service based on the Internet analyzes and pushes the data to the traffic management department in real time, and is used for optimizing traffic light regulation and control or issuing congestion early warning.

The prior art has the following defects:

During peak hours, the sensor network may generate massive data (e.g., traffic flow, air quality monitoring data). When millions of sensors transmit data to the cloud end at the same time, network congestion and transmission delay may be caused, and the data cannot be uploaded to the processing platform in time. Especially in public network dependent scenarios (such as 5G or Wi-Fi), network traffic overload may lead to partial data loss or extensive service interruption. In addition, if peak demands for large-scale concurrent data traffic cannot be effectively addressed, peak network congestion may cause permanent loss of some of the sensor-generated data, directly impairing the integrity and accuracy of the analysis, and having a long-term negative impact on subsequent management decisions.

Disclosure of Invention

The invention aims to provide an internet-based data service system for solving the defects in the background technology.

In order to achieve the above purpose, the invention provides the following technical scheme that the data service system based on the Internet comprises a data acquisition module, a network resource management module, a priority perception task scheduling module, a cloud processing and storage module, a comprehensive analysis module and an optimization module;

The data acquisition module is used for collecting original data from various distributed sensors in real time;

The network resource management module is used for monitoring and optimizing the data transmission path and carrying out real-time path adjustment by calculating the dynamic congestion degree;

the priority perception task scheduling module is used for dynamically adjusting the response priority of the task according to the importance of the task and evaluating the real-time performance of task execution by calculating service response priority deviation;

the cloud processing and storage module comprises a distributed storage system and a high-performance computing platform and is used for processing and storing the uploaded dynamic congestion degree data and response priority offset data;

The comprehensive analysis module is used for comprehensively analyzing the dynamic congestion degree and the service response priority deviation to determine the influence severity of the network congestion on the task response delay;

And the optimizing module is used for immediately switching the transmission of the high-priority task to the low DCL path if the severity is high, preferentially inserting the high-priority task into the task scheduling queue, and carrying out post-processing on the low-priority task, and reducing the data transmission rate of the low-priority task and improving the sequencing position of the high-priority task in the queue if the severity is low.

Preferably, in the network resource management module, n paths are set, and the network performance of each path is described by k indexes to form a data matrix: Wherein X is an n×k matrix, representing k network performance indexes of n paths, X_nk represents a kth index value of an nth path, and normalizing each column of data to zero mean and unit variance: wherein mu_j is the mean value of the jth index, sigma_j is the standard deviation of the jth index, and the normalized data matrix is Z: the covariance matrix Σ is used to describe the correlation between different indices: Wherein, sigma is a k×k matrix, element sigma_ij in the matrix represents covariance of the ith index and the jth index, eigenvalue decomposition is carried out on the covariance matrix Sigma v_i＝λ_iv_i, wherein, lambda_i is the ith eigenvalue, which represents the interpretation variance proportion of the corresponding principal component, v_i is the ith eigenvector, which represents the direction of the ith principal component, the eigenvector v₁ corresponding to the largest eigenvalue is selected as the first principal component according to the arrangement of the eigenvalues from large to small, and the dynamic congestion degree is calculated by the expression: Where DCL is the dynamic congestion level and v_1j is the j-th eigenvector component corresponding to the first principal component.

Preferably, in the network resource management module, a DCL threshold value is set, paths are classified according to a calculation result, namely, a low congestion path is DCL which is less than or equal to 0.5, a medium congestion path is DCL which is less than or equal to 0.5, a high congestion path is DCL which is more than 0.8, real-time DCL data of all available paths in the system are acquired, the DCL value is compared, a path with the lowest DCL is preferentially selected as a substitute, a high priority task is required to be allocated to the low congestion path, if no low congestion path exists, a standby network is started, a medium/low priority task is allocated to the medium congestion path or a suboptimal path, current limiting treatment is carried out on the high congestion path, and bandwidth resources are gradually released.

Preferably, in the priority aware task scheduling module, after analyzing the service response priority shift condition, a response priority shift index is generated, real-time performance of task execution is evaluated, and the method for acquiring the response priority shift index is as follows:

setting the actual response time as T_actual,i, representing the actual completion time of the task, setting the expected response time as T_expected,i, representing the target time when the task should be completed within the specified time, setting the service response offset value SRPD_i, calculating the offset value calculated according to the task response time to obtain SRPD_i＝T_actual,i-T_expected,i, setting the distribution rule of the priority P weight W as P=3 high priority, w=0.6, P=2 medium priority, w=0.3, P=1 low priority, w=0.1, and calculating the weight sum W_total of each task according to the task priority and the corresponding offset value, wherein the expression is as follows: Wherein w_i is task priority weight, n is total task number, and the weighted offset values of all tasks are summed and divided by the weight sum to obtain a response priority offset index, and the expression is: Where SRPDI is the response priority offset index.

Preferably, in the comprehensive analysis module, comprehensive analysis is performed by combining the dynamic congestion degree and the service response priority offset, and the influence severity of the network congestion on the task response delay is determined, which specifically includes:

Converting the dynamic congestion degree and the response priority shift index into comprehensive feature vectors, taking the comprehensive feature vectors as input of a machine learning model, taking the influence severity value labels of each group of comprehensive feature vectors on the task response delay of the predicted network congestion of the machine learning model as a prediction target, taking the sum of prediction errors of the influence severity value labels of the minimum network congestion on the task response delay of the predicted network congestion as a training target, training the machine learning model until the sum of the prediction errors reaches convergence, stopping model training, and determining the influence severity value of the network congestion on the task response delay according to a model output result, wherein the machine learning model is a polynomial regression model.

Preferably, in the optimization module, the obtained influence severity value of the network congestion on the task response delay is compared with the influence severity reference threshold value set according to the historical data, if the influence severity value of the network congestion on the task response delay is greater than or equal to the set influence severity reference threshold value, the influence severity of the network congestion on the task response delay is high, an early warning signal is generated at the moment, the transmission of the high-priority task is immediately switched to a low DCL path, and the high-priority task is preferentially inserted into the task scheduling queue, and if the influence severity value of the network congestion on the task response delay is smaller than the set influence severity reference threshold value, the influence severity of the network congestion on the task response delay is low, the early warning signal is not generated at the moment, the data transmission rate of the low-priority task is reduced, and the ordering position of the high-priority task in the queue is improved.

Preferably, the transmission of the high-priority task is switched to the low DCL path, and the high-priority task is preferentially inserted into the task scheduling queue, specifically:

Switching the transmission path of the high-priority task to the path with the lowest dynamic congestion degree DCL to ensure that the transmission delay is minimized, wherein the formula is as follows: Wherein P_opt is the optimal transmission path, P is the set of all available paths, DCL (P) is the dynamic congestion degree of path P, and the DCL (P) values of all paths in the current path set P are collected according to a common ruleThe path P_opt with the lowest dynamic congestion degree is selected, and the data flow of the high-priority task T_high is transferred to the path P_opt:

Inserting a high-priority task T_high into a priority position of a task scheduling queue Q to ensure execution instantaneity of the high-priority task, wherein the formula is that Q=insert (Q, T_high, priority), Q is the task scheduling queue, T_high is the high-priority task to be inserted, the priority is a priority attribute and used for sequencing adjustment, and a queue sequencing rule is that Sort (Q) =argmax_T∈_Q priority (T), wherein the priority (T) is a priority value of the task T, the higher the value is, the higher the task priority is, the priority weight priority (T) is defined, and the priority weight is set for each task T: w_high>w_low indicates that the weight of the high-priority task is greater than that of the low-priority task, w_high、w_low indicates that the weights of the high-priority task and the low-priority task are respectively inserted into the priority positions in the queue according to the task priority, and the task queue Q is ordered from high to low according to the priority value.

Preferably, the data transmission rate of the low-priority task is reduced, and the ordering position of the high-priority task in the queue is improved, specifically:

The data transmission rate R_low of the low-priority task is adjusted, so that the instantaneity of the high-priority task is ensured, and the new transmission rate of the low-priority task is ensured: in the formula,For an adjusted low priority task data transmission rate,In order to adjust the data transmission rate of the low-priority task before adjustment, alpha is a rate reduction coefficient, the value range is 0< alpha <1, the transmission rate of the low-priority task is dynamically adjusted according to a calculation formula, a high-priority task T_high is moved to a front position through a reordering task scheduling queue Q, calculation resources and execution time are preferentially allocated, a scheduling queue reordering rule is that Sort (Q) =argmax_T∈Q priority (T), a task priority definition function is that: And defining the priority weight w_high、w_mid、w_low of each task, sequencing the task queue Q according to the priority value, leading the high-priority task to be in front, and generating a new task scheduling queue.

In the technical scheme, the invention has the technical effects and advantages that:

1. According to the invention, the real-time acquisition of multi-source data is realized through the data acquisition module, the network resource management module is combined with the dynamic congestion Degree (DCL) to carry out real-time path adjustment, the priority sensing task scheduling module is used for dynamically adjusting the task response priority and evaluating the real-time performance according to the task importance, the cloud processing and storage module is used for ensuring the efficient processing and reliable storage of the data by adopting distributed storage and high-performance calculation, the comprehensive analysis module is combined with the DCL and response priority deviation index to analyze the influence severity of network congestion on task delay, and the optimization module is used for dynamically adjusting the transmission path and the task scheduling strategy according to the analysis result, so that the stability and the real-time performance of the system under a high concurrency scene are improved.

2. The invention obviously reduces the transmission delay and the data loss risk caused by network flow overload, ensures the timely response of the high-priority task, and optimizes the utilization rate of system resources and the fairness of task scheduling. The system can dynamically adapt to the change of network environment, effectively cope with peak load, and promote overall performance through multi-level data optimization strategies, thereby providing reliable and extensible technical support for the fields of smart cities, industrial Internet and the like, and having extremely high practical value and popularization prospect.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a block diagram of a system according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

An embodiment, referring to fig. 1, the data service system based on the internet in this embodiment includes a data acquisition module, a network resource management module, a priority aware task scheduling module, a cloud processing and storage module, a comprehensive analysis module and an optimization module;

In the data acquisition module, the sensor interface unit communicates with the distributed sensor, supporting different types of hardware interfaces and communication protocols. The supported protocols are lightweight protocols, namely MQTT and CoAP (suitable for low-power consumption equipment). Conventional network protocol HTTP/HTTPs (for general internet devices). Industrial internet protocol Modbus, zigbee, loRa (suitable for industrial environments and long distance low power consumption communications). The adapter is designed with a built-in protocol converter to ensure that data of different devices and protocols can be parsed by the system in a standardized format.

The data filtering unit performs preliminary screening on the collected original data, removes redundant data or obvious error data, and improves data quality. The method comprises data integrity check (such as checksum verification). Rule-based filtering (e.g., removing invalid data packets, noisy data). Preliminary validity detection (e.g., numerical range checking).

And the time synchronization unit ensures that the data collected from the distributed sensor has consistent time stamps and supports time sequence analysis and comparison of the data. The technology is realized by synchronizing with a sensor based on NTP (network time protocol). And a local time stamp is added during data transmission, so that the time consistency of the uploaded data is ensured.

The data coding unit performs standardized processing on the sensor data in different formats, and is convenient for subsequent transmission and analysis. JSON, protobuf (lightweight, cross-platform support). Custom binary format (suitable for efficient communication scenarios).

The data security unit encrypts the data acquisition and transmission process, and prevents the data from being leaked or tampered in the transmission process. The technology is realized by using TLS protocol for data link encryption. Data signing and verification (e.g., HMAC-based integrity verification).

And the device supports the simultaneous connection of various types of sensors (such as a traffic flow sensor, air quality monitoring equipment and an energy consumption metering instrument). The data streams from the plurality of sensors are combined to generate a unified real-time data stream. The millisecond-level data acquisition period is realized, and the high-real-time requirement scene (such as traffic light optimization) is satisfied. The acquisition frequency is supported to be adjusted as required so as to adapt to different scenes (such as high-frequency traffic data and low-frequency environment data).

After data are collected, partial non-core data are locally filtered and compressed, so that the data volume uploaded to the cloud is reduced, and the transmission efficiency is improved. Examples are extracting metadata of a traffic video stream (e.g., vehicle count) without uploading a complete video.

The data acquisition module establishes communication with the distributed sensor through the sensor interface unit. And receiving the real-time data packet sent by the sensor. The data filtering unit checks the data packet and filters out noise and invalid data. Ensuring that the uploaded data is complete, accurate and meaningful. And adding a uniform time stamp to the data to ensure the time sequence consistency of the multi-source data. Encoding the data into a standard format (e.g., JSON) facilitates subsequent transmission and processing. The data is encrypted and integrity verification information is appended. And uploading the encrypted data to a network layer.

The synchronous connection and data acquisition of millions of sensors are realized through an asynchronous I/O technology (such as libuv and Netty), and the performance of the system under a high concurrency scene is ensured. When the sensor connection is interrupted, the module supports automatic reconnection and resumes data acquisition. When the network is unstable, data is temporarily cached and uploaded after the network is recovered, so that data loss is avoided.

And the network resource management module is used for monitoring and optimizing the data transmission path and carrying out real-time path adjustment by calculating the dynamic congestion degree.

The calculation of the dynamic congestion degree can quantify the load condition of the network path in real time by comprehensively analyzing network performance indexes (such as flow utilization rate, time delay, packet loss rate and the like). Through the unified measurement value, the network resource management module can rapidly identify a high-load path, and avoid transmission delay and data loss caused by path congestion, thereby improving the transmission efficiency and response speed of the system. In addition, the dynamic congestion degree provides data support for the optimization of multipath transmission, so that the system can intelligently select the optimal path from a plurality of available paths, and the risks of network interruption and service quality degradation are effectively reduced.

The calculation of the dynamic congestion degree is particularly critical in a large-scale concurrency scene, and provides a quantitative basis for real-time flow regulation and resource allocation decisions. For example, in an intelligent traffic system, by dynamically adjusting the transmission path of high priority accident data by means of dynamic congestion, it is possible to ensure that critical data is transmitted preferentially on a low congestion path, reducing delay. Meanwhile, the dynamic congestion degree can be used for monitoring the long-term running state of the network, and by analyzing the congestion trend, an operator is helped to identify potential bottlenecks, the network architecture is optimized, and the service quality and reliability of the whole network are improved.

N paths are set, and the network performance of each path is described by k indexes to form a data matrix: Wherein X is an n X k matrix, which represents k network performance indexes of n paths, X_nk represents the kth index value of the nth path, and common indexes include X₁ is link traffic utilization, X₂ is link delay ratio, and X₃ is packet loss rate P.

To eliminate the effects of different index dimensions, each column of data is normalized to zero mean and unit variance: wherein mu_j is the mean value of the jth index, sigma_j is the standard deviation of the jth index, and the normalized data matrix is Z: the covariance matrix Σ is used to describe the correlation between different indices: Wherein, sigma is a k×k matrix, element sigma_ij in the matrix represents covariance of the ith index and the jth index, eigenvalue decomposition is carried out on the covariance matrix Sigma v_i＝λ_iv_i, wherein, lambda_i is the ith eigenvalue, which represents the interpretation variance proportion of the corresponding principal component, v_i is the ith eigenvector, which represents the direction of the ith principal component, the eigenvector v₁ corresponding to the largest eigenvalue is selected as the first principal component according to the arrangement of the eigenvalues from large to small, and the dynamic congestion degree is calculated by the expression: Where DCL is the dynamic congestion level and v_1j is the j-th eigenvector component corresponding to the first principal component.

Setting a DCL threshold value, and classifying paths according to a calculation result, wherein the low congestion path is DCL less than or equal to 0.5, the medium congestion path is DCL less than or equal to 0.5 and less than or equal to 0.8, and the high congestion path is DCL more than 0.8;

Real-time DCL data of all available paths in the system is acquired. Comparing the DCL values, the path with the lowest DCL is preferentially selected as the alternative. High priority tasks must be assigned to low congestion paths. If there is no low congestion path, an alternate network (e.g., dedicated communication channel) may be enabled. Medium/low priority tasks-to medium congestion paths or sub-optimal paths. And carrying out current limiting treatment on the high-congestion path, and gradually releasing bandwidth resources.

And modifying the routing table by using a network controller (such as an SDN controller) to switch the task flow from the high-congestion path to the low-congestion path, namely modifying the forwarding table entry of the target path to forward the flow according to the new path. And the high-priority data packets are processed preferentially, so that the real-time performance of delay sensitive data is ensured. The remaining data flows are split and high priority tasks are assigned to either the backup path or the low congestion path. The low priority tasks employ a multipath transmission strategy to spread data across multiple paths. The data flow allocation proportion of each path is dynamically adjusted by using a load balancing algorithm (such as weighted polling and minimum connection number).

And reserving special bandwidth for tasks with different priorities, namely reserving more bandwidth for tasks with high priorities, and avoiding preemption by low-priority traffic. And the task with low priority limits the maximum bandwidth occupation and reduces the influence of the task on the path resources. And controlling the transmission rate of the high-congestion path by adopting a flow control algorithm (such as Token socket or leakage socket) to gradually relieve the congestion state. After the adjustment is completed, the DCL data of the path is continuously monitored, and the congestion relief effect is verified. If the desired effect is not achieved after the path switch, the other alternative paths are again evaluated. The path switching frequency and the traffic distribution strategy are dynamically adjusted. And storing the adjustment record into a system log for subsequent training of the machine learning model, and improving a path optimization strategy.

And the priority perception task scheduling module is used for dynamically adjusting the response priority of the task according to the importance of the task and evaluating the real-time performance of task execution through calculating service response priority deviation.

According to the importance and timeliness of the tasks, the tasks are divided into three types of high, medium and low priority tasks, namely, tasks with high priority, such as emergency handling, disaster early warning and the like, which need real-time response. Medium priority tasks, regular monitoring data (e.g. traffic flow monitoring). Low priority tasks, periodic or batch tasks (e.g., environmental data upload). And adjusting the priority of the task in real time according to the system load and the task characteristics.

The deviation of the actual response time and the expected response time is quantified, namely, the deviation is smaller, namely, the task scheduling is normal, and the response meets the requirement. The deviation is large, which indicates that task processing may be affected by network or resource bottleneck and needs optimization. System resources (bandwidth, computing resources and the like) are dynamically allocated according to task priorities, and execution of high-priority tasks is guaranteed preferentially. Throttling, deferred processing, or batch scheduling is performed on low priority tasks.

Static priority scheduling, namely determining the task priority when the task is submitted, and keeping the priority unchanged all the time during scheduling. Dynamic priority scheduling, namely adjusting the priority in real time according to the task response condition, time limit or system load, wherein the preemptive scheduling is that the high-priority task can interrupt the execution of the low-priority task. Non-preemptive scheduling, in which the executed task is not interrupted, but the high priority task preferentially allocates resources. The offset SRPD is calculated by SRPD =t_actual-T_expected;T_actual, which is the actual response time of the task, and T_expected is the expected response time of the task.

The resources are dynamically adjusted according to the priority allocation proportion, for example, 60% of resources with high priority, 30% of resources with medium priority and 10% of resources with low priority.

After analyzing the service response priority deviation condition, generating a response priority deviation index, and evaluating the real-time performance of task execution, wherein the response priority deviation index acquiring method comprises the following steps:

The actual response time is set to be T_actual,i, which represents the actual completion time of the task, and the expected response time is set to be T_expected,i, which represents the target time for the task to complete within the prescribed time. Setting a service response offset value SRPD_i, calculating an offset value according to the task response time, wherein the calculation expression is SRPD_i＝T_actual,i-T_expected,i, and the higher the priority is, the larger the task weight is, so that the influence degree of the task weight on the overall performance of the system is reflected. The weight can be manually set according to the specific scene, and a dynamic adjustment strategy can also be adopted. The allocation rule of the priority P weight W is set to be P=3 (high priority), w=0.6, P=2 (medium priority), w=0.3, P=1 (low priority), w=0.1, and the weight sum W_total of each task is calculated according to the task priority and the corresponding offset value, wherein the expression is as follows: Wherein w_i is task priority weight, reflects importance of tasks, n is total number of tasks, and the weighted offset values of all tasks are summed and divided by the weight sum to obtain a response priority offset index, and the expression is: Where SRPDI is the response priority offset index.

When the response priority shift index (SRPDI) is larger, it is indicated that the actual response time of the task in the system generally deviates from the expected time more, and especially the response delay of the high-priority task is significantly increased. This typically means that tasks are performed in poor real-time, possibly due to network congestion, resource maldistribution or imperfect task scheduling mechanisms. The larger offset index may also cause the overall efficiency of the system to decrease, especially in the scenario where the critical task occupies a relatively high area, the real-time problem may directly affect the stability and quality of service of task execution.

When the response priority shift index (SRPDI) is smaller, the actual response time of the task in the system is closer to the expected time, and particularly, the high-priority task is processed in time. This indicates that the real-time performance of task execution is better, and the system scheduling mechanism can effectively cope with multi-task concurrence and resource competition. Meanwhile, the smaller deviation index reflects the rationality of the resource allocation strategy, the delay of the high-priority task is less, the delay of the low-priority task is also in a controllable range, and the high-efficiency and stability of the system operation are maintained.

The cloud processing and storage module comprises a distributed storage system and a high-performance computing platform and is used for processing and storing the uploaded dynamic congestion degree data and response priority offset data.

And the distributed storage system is used for storing dynamic congestion degree data, response priority offset data and other related metadata. High availability, scalability and persistence of data are ensured. And storing multiple copies of each piece of data in different nodes to prevent single-point faults. Distributed file systems, such as HDFS (Hadoop Distributed FILE SYSTEM), ceph, are used to manage mass data. Key value stores such as APACHE CASSANDRA for quick storage and querying of dynamic index data (e.g., DCL and SRPD). And (3) layering and storing cold and hot data, namely storing DCL and SRPD data required by real-time analysis in a high-performance storage, and storing historical data in a low-cost storage device.

And a high-performance computing platform for processing the uploaded data in real time and computing new dynamic indexes (such as comprehensive DCL and SRPDI). Large-scale batch analysis (e.g., task delay trend prediction) is performed.

Streaming computing frameworks such as APACHE FLINK, APACHE KAFKA STREAMS are used to compute in real time the global average of dynamic congestion levels or the delay profile of high priority tasks. Parallel computing engines such as APACHE SPARK are used for batch analysis of large-scale task response offset data. And (3) containerized deployment, namely realizing dynamic expansion by using Kubernetes, and meeting the concurrent processing requirement.

DCL and SRPD data streams uploaded from the network resource management module and the priority scheduling module are received through message queues (e.g., APACHE KAFKA). Duplicate, erroneous or abnormal data is removed, ensuring data quality. The global DCL distribution is calculated and high congestion paths are identified. And counting SRPDI, and evaluating the real-time performance of the overall task of the system. The calculation result is temporarily stored in a high-performance memory database (such as Redis) for other modules to call in real time.

And analyzing the long-term delay trend of the high-priority task by combining the historical data, and providing a basis for optimizing the resource allocation strategy. Index aggregation, namely calculating average DCL and SRPDI of each path for evaluating the overall performance of the system. The latest DCL and SRPD data are written to a distributed database (e.g., APACHE CASSANDRA or MongoDB). Historical data is transferred to low cost storage (e.g., amazon S3 or Google Cloud Storage) for long term analysis and model training.

And the comprehensive analysis module is used for comprehensively analyzing the dynamic congestion degree and the service response priority deviation to determine the influence severity of the network congestion on the task response delay.

The method for acquiring the severity value of the influence of the network congestion on the task response delay comprises the steps of acquiring a corresponding function expression LQ=F (DCL, SRPDI) from comprehensive feature vector training data of a machine learning model which is trained, wherein F is an output function of the model, DCL is dynamic congestion degree, SRPDI is response priority offset index, and LQ is the severity value of the influence of the network congestion on the task response delay.

Comparing the obtained influence severity value of the network congestion on the task response delay with a set influence severity reference threshold value according to historical data, if the influence severity value of the network congestion on the task response delay is greater than or equal to the set influence severity reference threshold value, indicating that the influence severity of the network congestion on the task response delay is high, generating an early warning signal at the moment, immediately switching the transmission of a high-priority task to a low DCL path, and preferentially inserting the high-priority task into a task scheduling queue, and if the influence severity value of the network congestion on the task response delay is smaller than the set influence severity reference threshold value, indicating that the influence severity of the network congestion on the task response delay is low, not generating the early warning signal at the moment, reducing the data transmission rate of the low-priority task and improving the ordering position of the high-priority task in the queue.

Switching the transmission of the high-priority task to the low DCL path, and preferentially inserting the high-priority task into the task scheduling queue, specifically:

Switching the transmission path of the high-priority task to the path with the lowest dynamic congestion degree DCL to ensure that the transmission delay is minimized, wherein the formula is as follows: Where P_opt is the optimal transmission path, P is the set of all available paths, and DCL (P) is the dynamic congestion level of path P.

Collecting DCL (P) values of all paths in the current path set P according to a common ruleThe path P_opt with the lowest dynamic congestion degree is selected, and the data flow of the high-priority task T_high is transferred to the path P_opt:

The method comprises the steps of reducing the data transmission rate of low-priority tasks and improving the ordering positions of high-priority tasks in a queue, and specifically comprises the following steps:

The data transmission rate R_low of the low-priority task is adjusted, more bandwidth resources are reserved for the high-priority task, the instantaneity of the high-priority task is guaranteed, and the new transmission rate of the low-priority task is guaranteed: in the formula,For an adjusted low priority task data transmission rate,For low priority task data transmission rates prior to adjustment, α is a rate reduction coefficient, and the value range 0< α <1, α=0.5 represents halving the transmission rate. And dynamically adjusting the transmission rate of the low-priority task according to a calculation formula. By reordering the task scheduling queue Q, moving the high priority task T_high to a forward position, computing resources and execution time are preferentially allocated, scheduling queue reordering rules Sort (Q) =argmax_T∈Q priority (T), task priority definition function: And defining the priority weight w_high、w_mid、w_low of each task, sequencing the task queue Q according to the priority value, leading the high-priority task to be in front, and generating a new task scheduling queue.

In the embodiment, a data acquisition module is used for collecting original data from various distributed sensors in real time and providing basic data support for task execution, a network resource management module is used for calculating dynamic congestion Degree (DCL) and adjusting a transmission path in real time through monitoring a data transmission path to optimize network resource allocation, a priority sensing task scheduling module is used for dynamically adjusting response priority according to importance of the tasks and evaluating instantaneity of the tasks through calculating service response priority offset (SRPDI), a cloud processing and storage module is used for processing and storing uploaded dynamic congestion degree data and response priority offset data by combining a distributed storage system and a high-performance computing platform, reliability and high efficiency of the data are guaranteed, a comprehensive analysis module is used for comprehensively analyzing the dynamic congestion degree and the service response priority offset to determine influence severity of network congestion on task response delay, an optimization module is used for immediately switching high priority tasks to low DCL paths and inserting the high priority tasks into scheduling queues in advance and delaying the low priority tasks based on analysis results, and reducing transmission rates of the low priority tasks and improving the priority tasks and efficiency of the high priority tasks if the severity is low, and optimizing the dynamic resource allocation of the system is improved.

The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application.

Claims

Translated fromChinese

1.一种基于互联网的数据服务系统，其特征在于：包括数据采集模块、网络资源管理模块、优先级感知任务调度模块、云端处理与存储模块，综合分析模块以及优化模块；1. An Internet-based data service system, characterized by: comprising a data acquisition module, a network resource management module, a priority-aware task scheduling module, a cloud processing and storage module, a comprehensive analysis module and an optimization module;

数据采集模块，用于从多种分布式传感器中实时收集原始数据；Data acquisition module, used to collect raw data from various distributed sensors in real time;

网络资源管理模块，用于监控并优化数据传输路径，通过计算动态拥塞度进行实时路径调整；Network resource management module, used to monitor and optimize data transmission paths, and make real-time path adjustments by calculating dynamic congestion;

优先级感知任务调度模块，用于根据任务的重要性动态调整任务的响应优先级，并通过计算服务响应优先级偏移评估任务执行的实时性；The priority-aware task scheduling module is used to dynamically adjust the response priority of tasks according to their importance and evaluate the real-time performance of task execution by calculating the service response priority offset;

云端处理与存储模块，包括分布式存储系统和高性能计算平台，用于处理上传的动态拥塞度数据和响应优先级偏移数据并进行存储；Cloud processing and storage module, including distributed storage system and high-performance computing platform, used to process and store uploaded dynamic congestion data and response priority offset data;

综合分析模块，用于结合动态拥塞度和服务响应优先级偏移进行综合分析，确定网络拥塞对任务响应延迟的影响严重程度；A comprehensive analysis module is used to conduct a comprehensive analysis based on the dynamic congestion level and the service response priority offset to determine the severity of the impact of network congestion on task response delay;

优化模块，若严重程度高，立即将高优先级任务的传输切换至低DCL路径，以及在任务调度队列中优先插入高优先级任务，低优先级任务延后处理；若严重程度低，则减少低优先级任务的数据传输速率并提高高优先级任务在队列中的排序位置。The optimization module immediately switches the transmission of high-priority tasks to the low-DCL path if the severity is high, and inserts high-priority tasks into the task scheduling queue first, delaying the processing of low-priority tasks; if the severity is low, reduces the data transmission rate of low-priority tasks and improves the sorting position of high-priority tasks in the queue.

2.根据权利要求1所述的一种基于互联网的数据服务系统，其特征在于：网络资源管理模块中，设定有n条路径，每条路径的网络性能由k个指标描述，形成数据矩阵：其中，X是一个n×k的矩阵，表示n条路径的k个网络性能指标，x_nk表示第n条路径的第k个指标值，将每列数据标准化为平均值和单位方差：其中，μ_j为第j个指标的均值，σ_j为第j个指标的标准差，标准化后的数据矩阵为Z：协方差矩阵Σ用于描述不同指标之间的相关性：式中，Σ是一个k×k的矩阵，矩阵中的元素σ_ij表示第i个指标和第j个指标的协方差，对协方差矩阵Σ进行特征值分解：Σv_i＝λ_iv_i；其中，λ_i为第i个特征值，表示对应主成分的解释方差比例，v_i为第i个特征向量，表示第i个主成分的方向，按特征值从大到小排列，选择最大的特征值对应的特征向量v₁，作为第一主成分，计算动态拥塞度，表达式为：式中，DCL为动态拥塞度，v_1j为第一主成分对应的第j个特征向量分量。2. According to claim 1, an Internet-based data service system is characterized in that: in the network resource management module, n paths are set, and the network performance of each path is described by k indicators to form a data matrix: Where X is an n×k matrix, representing k network performance indicators of n paths,_xnk represents the kth indicator value of the nth path, and each column of data is standardized to the mean and unit variance: Among them, μ_j is the mean of the j-th indicator, σ_j is the standard deviation of the j-th indicator, and the standardized data matrix is Z: The covariance matrix Σ is used to describe the correlation between different indicators: In the formula, Σ is a k×k matrix, and the element σ_ij in the matrix represents the covariance between the ith indicator and the jth indicator. The covariance matrix Σ is subjected to eigenvalue decomposition: Σv_i = λ_i v_i ; where λ_i is the ith eigenvalue, indicating the proportion of explained variance of the corresponding principal component, and_vi is the ith eigenvector, indicating the direction of the ith principal component. The eigenvalues are arranged from large to small, and the eigenvector v₁ corresponding to the largest eigenvalue is selected as the first principal component to calculate the dynamic congestion degree. The expression is: Where DCL is the dynamic congestion degree, and v_1j is the jth eigenvector component corresponding to the first principal component.

3.根据权利要求2所述的一种基于互联网的数据服务系统，其特征在于：网络资源管理模块中，设定DCL阈值，根据计算结果将路径分类：低拥塞路径：DCL≤0.5；中度拥塞路径：0.5<DCL≤0.8；高拥塞路径：DCL>0.8；获取系统内所有可用路径的实时DCL数据，比较DCL值，优先选择DCL最低的路径作为替代，高优先级任务必须分配到低拥塞路径，若无低拥塞路径，启用备用网络；中/低优先级任务分配到中度拥塞路径或次优路径，在高拥塞路径上进行限流处理，逐步释放带宽资源。3. According to claim 2, a data service system based on the Internet is characterized in that: in the network resource management module, a DCL threshold is set, and the paths are classified according to the calculation results: low congestion path: DCL≤0.5; medium congestion path: 0.5<DCL≤0.8; high congestion path: DCL>0.8; real-time DCL data of all available paths in the system are obtained, and the DCL values are compared, and the path with the lowest DCL is preferentially selected as a replacement, and high-priority tasks must be assigned to low-congestion paths. If there is no low-congestion path, the backup network is enabled; medium/low priority tasks are assigned to medium-congestion paths or suboptimal paths, and flow limiting is performed on high-congestion paths to gradually release bandwidth resources.

4.根据权利要求1所述的一种基于互联网的数据服务系统，其特征在于：优先级感知任务调度模块中，对服务响应优先级偏移情况进行分析后生成响应优先级偏移指数，评估任务执行的实时性，响应优先级偏移指数的获取方法为：4. According to claim 1, an Internet-based data service system is characterized in that: in the priority-aware task scheduling module, a response priority offset index is generated after analyzing the service response priority offset to evaluate the real-time performance of task execution, and the response priority offset index is obtained by:

设定实际响应时间为T_actual,i，表示任务的实际完成时间，设定预期响应时间为T_expected,i，表示任务应在规定时间内完成的目标时间，设定服务响应偏移值SRPD_i，根据任务响应时间计算得出的偏移值，计算表达式为：SRPD_i＝T_actual,i-T_expected,i；优先级越高的任务权重越大，设定优先级P权重w的分配规则为：P＝3高优先级，w＝0.6；P＝2中优先级，w＝0.3；P＝1低优先级，w＝0.1；根据任务优先级和对应的偏移值，计算每个任务的权重总和W_total，表达式为：式中，w_i为任务优先级权重，n为总任务数量，将所有任务的加权偏移值求和并除以权重总和，得到响应优先级偏移指数，表达式为：式中，SRPDI为响应优先级偏移指数。Set the actual response time as T_actual,i , which indicates the actual completion time of the task; set the expected response time as T_expected,i , which indicates the target time when the task should be completed within the specified time; set the service response offset value SRPD_i , which is the offset value calculated according to the task response time. The calculation expression is: SRPD_i ＝T_actual,i -T_expected,i ; the higher the priority, the greater the weight of the task. Set the priority P weight w allocation rule as: P＝3 high priority, w＝0.6; P＝2 medium priority, w＝0.3; P＝1 low priority, w＝0.1; calculate the total weight of each task W_total according to the task priority and the corresponding offset value. The expression is: Where_wi is the task priority weight, n is the total number of tasks, and the weighted offset values of all tasks are summed and divided by the total weight to obtain the response priority offset index, which is expressed as: Where SRPDI is the response priority deviation index.

5.根据权利要求4所述的一种基于互联网的数据服务系统，其特征在于：综合分析模块中，结合动态拥塞度和服务响应优先级偏移进行综合分析，确定网络拥塞对任务响应延迟的影响严重程度，具体为：5. According to claim 4, an Internet-based data service system is characterized in that: in the comprehensive analysis module, a comprehensive analysis is performed in combination with dynamic congestion and service response priority offset to determine the severity of the impact of network congestion on task response delay, specifically:

将动态拥塞度和响应优先级偏移指数转换为综合特征向量，将综合特征向量作为机器学习模型的输入，机器学习模型以每组综合特征向量预测网络拥塞对任务响应延迟的影响严重程度值标签为预测目标，以最小化对所有网络拥塞对任务响应延迟的影响严重程度值标签的预测误差之和作为训练目标，对机器学习模型进行训练，直至预测误差之和达到收敛时停止模型训练，根据模型输出结果确定网络拥塞对任务响应延迟的影响严重程度值，其中，机器学习模型为多项式回归模型。The dynamic congestion degree and response priority offset index are converted into comprehensive feature vectors, and the comprehensive feature vectors are used as inputs of the machine learning model. The machine learning model takes each group of comprehensive feature vectors to predict the severity value label of the impact of network congestion on task response delay as the prediction target, and takes minimizing the sum of prediction errors of all severity value labels of the impact of network congestion on task response delay as the training target. The machine learning model is trained until the sum of prediction errors converges, and the model training is stopped. The severity value of the impact of network congestion on task response delay is determined according to the model output results, wherein the machine learning model is a polynomial regression model.

6.根据权利要求1所述的一种基于互联网的数据服务系统，其特征在于：优化模块中，将获取到的网络拥塞对任务响应延迟的影响严重程度值与根据历史数据设定的影响严重程度参考阈值进行比较，若网络拥塞对任务响应延迟的影响严重程度值大于等于设定的影响严重程度参考阈值，说明网络拥塞对任务响应延迟的影响严重程度高，此时生成预警信号，立即将高优先级任务的传输切换至低DCL路径，以及在任务调度队列中优先插入高优先级任务；若网络拥塞对任务响应延迟的影响严重程度值小于设定的影响严重程度参考阈值，说明网络拥塞对任务响应延迟的影响严重程度低，此时不生成预警信号，此时应减少低优先级任务的数据传输速率并提高高优先级任务在队列中的排序位置。6. According to claim 1, a data service system based on the Internet is characterized in that: in the optimization module, the acquired severity value of the impact of network congestion on task response delay is compared with the severity reference threshold set according to historical data; if the severity value of the impact of network congestion on task response delay is greater than or equal to the set severity reference threshold, it means that the severity of the impact of network congestion on task response delay is high, and an early warning signal is generated at this time, and the transmission of high-priority tasks is immediately switched to a low DCL path, and high-priority tasks are preferentially inserted into the task scheduling queue; if the severity value of the impact of network congestion on task response delay is less than the set severity reference threshold, it means that the severity of the impact of network congestion on task response delay is low, and no early warning signal is generated at this time, and the data transmission rate of low-priority tasks should be reduced and the sorting position of high-priority tasks in the queue should be increased.

7.根据权利要求6所述的一种基于互联网的数据服务系统，其特征在于：将高优先级任务的传输切换至低DCL路径，以及在任务调度队列中优先插入高优先级任务，具体为：7. The Internet-based data service system according to claim 6, characterized in that: the transmission of high-priority tasks is switched to a low DCL path, and high-priority tasks are preferentially inserted into the task scheduling queue, specifically:

将高优先级任务的传输路径切换到动态拥塞度DCL最低的路径，确保其传输延迟最小化，公式为：式中，P_opt为最优传输路径，P为所有可用路径的集合，DCL(P)为路径P的动态拥塞度；收集当前路径集合P中所有路径的DCL(P)值，根据公选择动态拥塞度最低的路径P_opt，将高优先级任务T_high的数据流转移到路径P_opt：Switch the transmission path of the high-priority task to the path with the lowest dynamic congestion degree DCL to ensure that its transmission delay is minimized. The formula is: In the formula, P_opt is the optimal transmission path, P is the set of all available paths, and DCL(P) is the dynamic congestion degree of path P. The DCL(P) values of all paths in the current path set P are collected and calculated according to the formula Select the path P_opt with the lowest dynamic congestion and transfer the data flow of the high-priority task T_high to the path P_opt :

将高优先级任务T_high插入任务调度队列Q的优先位置，确保高优先级任务的执行实时性，公式为：Q＝Insert(Q,T_high,priority)；Q为任务调度队列，T_high为需要插入的高优先级任务，priority为优先级属性，用于排序调整，队列排序规则为：Sort(Q)＝argmax_T∈_Qpriority(T)；式中，priority(T)为任务T的优先级值，值越高，任务优先级越高，定义优先级权重priority(T)，为每个任务T设置优先级权重：w_high>w_low表示高优先级任务权重大于低优先级任务，w_high、w_low分别表示高优先级和低优先级任务的权重根据任务优先级，将T_high插入到队列中优先位置，对任务队列Q按priority值从高到低排序。Insert the high priority task T_high into the priority position of the task scheduling queue Q to ensure the real-time execution of the high priority task. The formula is: Q = Insert(Q, T_high , priority); Q is the task scheduling queue, T_high is the high priority task to be inserted, priority is the priority attribute used for sorting adjustment, and the queue sorting rule is: Sort(Q) = argmax_T∈_Q priority(T); where priority(T) is the priority value of task T. The higher the value, the higher the task priority. Define the priority weight priority(T) and set the priority weight for each task T: w_high >w_low means that the weight of the high priority task is greater than that of the low priority task. w_high and w_low represent the weights of the high priority and low priority tasks respectively. According to the task priority, T_high is inserted into the priority position in the queue, and the task queue Q is sorted from high to low according to the priority value.

8.根据权利要求7所述的一种基于互联网的数据服务系统，其特征在于：减少低优先级任务的数据传输速率并提高高优先级任务在队列中的排序位置，具体为：8. The Internet-based data service system according to claim 7, characterized in that: the data transmission rate of low-priority tasks is reduced and the ranking position of high-priority tasks in the queue is increased, specifically:

调整低优先级任务的数据传输速率R_low，保证高优先级任务的实时性，低优先级任务的新传输速率：式中，为调整后的低优先级任务数据传输速率，为调整前的低优先级任务数据传输速率，α为速率缩减系数，取值范围0<α<1，根据计算公式动态调整低优先级任务的传输速率，通过重新排序任务调度队列Q，将高优先级任务T_high移动到靠前位置，优先分配计算资源和执行时间，调度队列重新排序规则：Sort(Q)＝argmax_T∈Qpriority(T)；任务优先级定义函数：w_high、w_mid、w_low为任务优先级权重，其中w_high>w_mid>w_low，T_mid为中优先级任务；定义每个任务的优先级权重w_high、w_mid、w_low，按照优先级值对任务队列Q进行排序，使高优先级任务靠前，生成新的任务调度队列。Adjust the data transmission rate R_low of the low priority task to ensure the real-time performance of the high priority task. The new transmission rate of the low priority task is: In the formula, is the adjusted data transfer rate for low priority tasks, is the data transmission rate of the low-priority task before adjustment, α is the rate reduction coefficient, and the value range is 0<α<1. The transmission rate of the low-priority task is dynamically adjusted according to the calculation formula. By reordering the task scheduling queue Q, the high-priority task T_high is moved to the front position, and computing resources and execution time are allocated preferentially. The scheduling queue reordering rule is: Sort(Q)＝argmax_T∈Q priority(T); Task priority definition function: w_high , w_mid , w_low are task priority weights, where w_high >w_mid >w_low , and T_mid is a medium priority task; define the priority weights w_high , w_mid , w_low of each task, sort the task queue Q according to the priority value, put high priority tasks at the front, and generate a new task scheduling queue.