Movatterモバイル変換


[0]ホーム

URL:


CN119577648A - A multi-factor fusion method for identifying outlier data in the Internet of Things - Google Patents

A multi-factor fusion method for identifying outlier data in the Internet of Things
Download PDF

Info

Publication number
CN119577648A
CN119577648ACN202411644141.7ACN202411644141ACN119577648ACN 119577648 ACN119577648 ACN 119577648ACN 202411644141 ACN202411644141 ACN 202411644141ACN 119577648 ACN119577648 ACN 119577648A
Authority
CN
China
Prior art keywords
data
factor
prediction
value
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411644141.7A
Other languages
Chinese (zh)
Inventor
吴大鹏
魏海超
王汝言
张鸿
张若英
邹虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and TelecommunicationsfiledCriticalChongqing University of Post and Telecommunications
Priority to CN202411644141.7ApriorityCriticalpatent/CN119577648A/en
Publication of CN119577648ApublicationCriticalpatent/CN119577648A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明涉及一种多因素融合的物联网离群数据识别方法,属于物联网技术领域,包括确定识别的目标因素,并融合与其相关联的多种影响因素,获取对应的历史数据并进行预处理;对具体指标进行相关性分析,选择对目标因素有显著影响的指标数据,形成最终的有效指标组合;构建LSTM结合特征注意力机制的预测模型,利用相关性分析筛选后的数据进行预测,输出未来一段时间的预测值,通过反归一化获得实际预测值;采用预测模型,融合多因素综合分析的离群数据识别方法,使用孤立森林算法建模,将预测残差和影响因子作为输入特征,计算每个数据点的异常分数;基于异常分数的分布,采用滑动窗口技术动态设置阈值,将超出阈值的数据点判定为离群数据。

The invention relates to a multi-factor fused outlier data identification method for the Internet of Things, belonging to the technical field of the Internet of Things, and comprising the steps of determining an identified target factor, fusing a plurality of influencing factors associated therewith, acquiring corresponding historical data and performing preprocessing; performing correlation analysis on specific indicators, selecting indicator data having a significant influence on the target factor, and forming a final effective indicator combination; constructing a prediction model combining an LSTM with a feature attention mechanism, using data screened by correlation analysis to perform prediction, outputting a prediction value for a period of time in the future, and obtaining an actual prediction value by denormalization; adopting a prediction model, integrating an outlier data identification method for comprehensive analysis of multiple factors, using an isolation forest algorithm to build a model, taking prediction residuals and influencing factors as input features, and calculating an abnormal score for each data point; and dynamically setting a threshold value based on the distribution of the abnormal score by using a sliding window technology, and determining data points exceeding the threshold value as outlier data.

Description

Multi-factor fusion type internet of things outlier data identification method
Technical Field
The invention belongs to the technical field of the Internet of things, and relates to a multi-factor fusion method for identifying outlier data of the Internet of things.
Background
With the rapid development of the internet of things technology, a large number of sensors and intelligent devices are promoted to be widely applied to various fields and industries. The equipment generally collects data through the internet of things technology and generates a large amount of monitoring data, the monitoring data comprise normal data and outlier data, the outlier data communication indicates equipment faults or environmental pollution and other problems, if the equipment is not found and processed in time, equipment damage or environmental potential safety hazards can be caused, and therefore the identification of the outlier data becomes an extremely important task in an internet of things platform.
The existing method for identifying the outlier data of the Internet of things still has defects in real-time performance and accuracy, generally depends on historical data of a single factor for processing and analysis, and lacks comprehensive consideration on various influencing factors. The outlier data is generally judged by only adopting a fixed threshold value, and the single threshold value setting has the limitation that if the setting is too high, the detection sensitivity is reduced to influence the timely response to the sudden environmental change, and if the setting is too low, the abnormal alarm is frequently caused, the unnecessary operation burden is increased, and the effective monitoring of the environmental change is not facilitated.
Disclosure of Invention
In view of the above, the present invention aims to provide a multi-factor fusion method for identifying outlier data of internet of things, which fuses the influence factors associated with main factors for comprehensive analysis, and dynamically sets a threshold by adopting a sliding window technology based on the distribution of anomaly scores. The method remarkably improves the rationality, accuracy and real-time performance of the outlier data identification, and provides a firmer technical support for the outlier data identification of the Internet of things.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a multi-factor fusion method for identifying outlier data of the Internet of things comprises the following steps:
S1, data fusion and preprocessing, namely determining identified target factors, fusing a plurality of influence factors related to the identified target factors, acquiring corresponding historical data and preprocessing;
S2, data correlation analysis, namely performing correlation analysis on specific indexes in the multi-factor data preprocessed in the step S1 to select index data with obvious influence on target factors so as to form a final effective index combination;
s3, data prediction, namely constructing a prediction model of LSTM combined characteristic attention mechanism, predicting the screened data by utilizing correlation analysis, outputting a predicted value of a period of time in the future, and obtaining an actual predicted value through inverse normalization reduction;
And S4, identifying outlier data, namely adopting the prediction model, fusing an outlier data identification method of multi-factor comprehensive analysis, modeling by using an isolated forest algorithm, taking a prediction residual error and an influence factor as input features, calculating the anomaly score of each data point, dynamically setting a threshold value based on the distribution of the anomaly scores, and judging the data points exceeding the threshold value as outlier data by adopting a sliding window technology.
Further, the step S1 specifically includes the following steps:
s11, data fusion, namely in the identification of the outlier data of the Internet of things, firstly determining target factors to be identified, and integrating a plurality of influence factors related to the target factors;
S12, processing missing values and abnormal points, namely processing data by adopting a linear interpolation method, and estimating the value at a certain moment by using a linear interpolation function when a peak outlier or missing value appears at the moment;
And S13, normalization processing, namely normalizing the data by adopting an extremum normalization method to eliminate the influence of dimension, and performing inverse normalization processing on the result output by the prediction model to obtain an actual predicted value.
Further, in step S2, for nonlinear relations among different indexes, correlation among each pair of indexes is analyzed by adopting a Szelman rank correlation coefficient, and the method specifically comprises the steps of calculating the Szelman rank correlation coefficients of each pair of index data to be analyzed by carrying out rank conversion, reflecting monotone relation strength among the indexes, selecting an influence factor index highly correlated with a main factor index according to a result of correlation analysis, and eliminating indexes weakly correlated with the main factor, so that selection of input characteristics of a subsequent model is optimized.
Further, the step S3 specifically includes the following steps:
s31, dividing the preprocessed data into a training set, a testing set and a verification set;
S32, constructing a prediction model of LSTM combined with a characteristic attention mechanism, defining an input layer, a hidden layer and an output layer, carrying out window construction on a data set, then carrying out multi-element time sequence prediction on the input data serving as the prediction model, setting a time sequence Y= (Y1,…,yT-1,yT)∈RT serving as a prediction target, setting each index data of each factor of history T time as a time sequence matrix X= (X1,x2,…,xN)T∈RT×N) of related characteristic variables, wherein N represents the dimension of parameters and comprises each index parameter in each factor,Representing the value of the nth variable at time t;
The important variables are weighted in the encoding stage by combining the characteristic attention mechanism to obtain the importance weight cN of each hidden state to the predicted output, and the importance weight cN represents the important characteristics of the current input characteristics to the output, and in the encoding stage, the context vector updated by the characteristic attention mechanism is shown in the following formulaFusion with previous history information to outputUsing these weight coefficients, inputting the variables at each time after updating to obtain a matrix x= (c1x1,c2x2,…,cNxN)T∈RT×N;
cN=fattention(x)
the weight updating is carried out on the input vector and each hidden layer state by utilizing a characteristic attention mechanism, so that the time sequence coding hidden layer state at each moment contains the association relation corresponding to the predicted target parameter and other characteristic parameters, thereby obtaining the predicted value of the historical data at the next moment
S33, configuring network parameters, training a prediction model, and taking convergence of an L (theta) loss function as a termination, wherein the formula is as follows:
where θ represents the set of network parameters,AndRespectively representing an actual value and a predicted value of the target parameter i at the time t;
summing and averaging the root mean square errors of the predicted value and the actual value of each target parameter, and measuring the overall prediction accuracy of the model;
S34, predicting real-time data by using a trained prediction model, and predicting index parameters at the time t+1 by using the data at the first n times of the time t+1 to be predicted as an input sequence by using the prediction model, wherein the formula is as follows:
Further, the step S4 includes the steps of:
s41, comparing a predicted value output by a predicted model with an actual value, and calculating to obtain a predicted residual error;
S42, constructing an outlier data identification model of an isolated forest, taking a multidimensional prediction residual and an influence factor as input features, and taking the change of the influence factor into consideration in calculating the abnormal score of a sample point, wherein the updated abnormal score has the following calculation formula:
where s (x) represents the anomaly score for data point x, ht (x) represents the path length of the t-th tree, and c (n) is a constant that normalizes the path length for adjusting for differences between input features;
Counting abnormal scores in a period of time by adopting a sliding window technology, calculating the mean value and standard deviation in a window, and dynamically adjusting a threshold value;
The length of the sliding window is W at the time t, the anomaly score in the window is { st-W+1,st-W+2,...,st }, and after the mean mut and the standard deviation sigmat in the sliding window are calculated, the dynamic threshold formula is as follows:
Ut=μt+k·σt
Wherein Ut is a threshold value set in the current window, and k is a constant for controlling sensitivity of outlier data;
Under the condition of real-time data updating, adopting the mean value and standard deviation of a rolling updating window, introducing an anomaly score st+1 at the time t+1, removing the earliest data st-W+1 of the window, and calculating the method as follows:
Through the recursive mode, the mean value and the standard deviation are dynamically adjusted when the window is updated each time;
S43, setting super parameters of an isolated forest model, including the number of isolated trees, the sampling amount of each tree and the length of a sliding window, training the model by gradually adjusting the super parameters, and comparing performance differences under different settings;
s44, identifying outlier data of the predicted residual by using the trained isolated forest model, comparing the calculated anomaly score with a threshold value, and judging the data point as outlier data if the anomaly score is larger than the threshold value.
The invention has the beneficial effects that:
1. Through linear interpolation and extremum normalization technology, missing values and abnormal values can be effectively processed, inconsistency and dimension differences in data are eliminated, and dimension consistency among different indexes is ensured. The processing step obviously improves the quality and reliability of the data, and provides a more accurate basis for subsequent analysis and modeling.
2. Through analysis of correlation among various indexes in influence factors by the spearman rank correlation coefficient, strong correlation indexes between main factor indexes can be effectively identified, and indexes irrelevant to or weakly related to target factors are removed. The process is helpful for reducing interference of irrelevant features, optimizing selection of input features, and improving performance and efficiency of subsequent models.
3. By combining the feature attention mechanism, the LSTM model can adaptively adjust the weight according to the change of each influence factor, so that the model can dynamically pay attention to key features in an input sequence at each moment, and the accuracy and reliability of prediction are improved. And the precision of the prediction result is effectively improved through the feature selection and dynamic adjustment of the optimization model.
4. And constructing an outlier data identification model by using an isolated forest algorithm by adopting a method based on a prediction model and fusing multi-factor comprehensive analysis. By taking the multidimensional prediction residual and the influence factors as input features, the scale consistency among different features is ensured through standardization processing, and then the relation among the input features is implicitly considered. This relationship will be reflected in the construction of the decision tree, affecting the final anomaly score calculation. Finally, based on the anomaly score distribution, a sliding window technology is adopted to dynamically set a threshold value, and data points exceeding the threshold value are judged to be outlier data. The accuracy, the effectiveness and the reliability of outlier data identification are effectively improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
Fig. 1 is a schematic diagram of the composition of the multi-factor fusion internet of things outlier data identification method according to the invention;
FIG. 2 is a schematic flow chart of a prediction model of the multi-factor fusion method for identifying outlier data of the Internet of things;
fig. 3 is a schematic flow diagram of an anomaly detection model of the multi-factor fusion internet of things outlier data identification method.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In the following description, numerous details are set forth in order to provide a more thorough explanation of embodiments of the present invention, it will be apparent, however, to one skilled in the art that embodiments of the present invention may be practiced without these specific details, in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the embodiments of the present invention.
As shown in fig. 1, the invention provides a multi-factor fusion internet of things outlier data identification method, which comprises the following specific steps:
The first step, data fusion and preprocessing comprises the following steps:
In the step 1, data fusion, namely in the identification of the outlier data of the Internet of things, firstly, determining target factors to be identified and fusing a plurality of influence factors related to the target factors. Then, the corresponding historical data are fused, and a comprehensive data base is provided for subsequent analysis. For example, in water environment detection, the water quality factor can be determined as a target factor to be identified, and influence factors associated with the target factor, such as weather factors, can influence the water quality index data to a certain extent, and the water quality index data can be more comprehensively analyzed by combining the factors.
And 2, processing missing values and abnormal points, wherein in the process of data acquisition of the Internet of things, data breakpoints and noise are often generated due to the influence of factors such as network, weather, overhaul and the like. Thus, a linear interpolation method is used to process the data at two times ti,tj, xi and xj, respectively, and when the peak outlier or missing value at a certain time is used to estimate the value at time t, i.e., L (t).
And 3, carrying out normalization processing on the data by adopting an extremum normalization method after carrying out linear interpolation processing on the data, wherein the data is normalized by adopting an extremum normalization method to eliminate the influence of the dimension, and the formula is shown as follows, wherein xmin and xmax are the minimum value and the maximum value in the data, epsilon is a constant, and the value is taken based on the range and the precision of the data and is used for preventing the model from falling into infinity. Performing inverse normalization processing on the result output by the prediction model to obtain an actual predicted value, whereinRepresenting the predicted value of a certain index after normalization.
And secondly, analyzing the data correlation, namely analyzing the correlation between each pair of indexes by adopting a spearman rank correlation coefficient aiming at the nonlinear relation among different indexes in order to explore the influence of different indexes in all influence factors on main factor indexes.
Specifically, by performing rank conversion on each pair of index data to be analyzed, the spearman rank correlation coefficients thereof are calculated, reflecting the monotonic relationship strength between the indexes. If the correlation coefficient is close to 1, the strong positive correlation exists between the two indexes, if the correlation coefficient is close to-1, the strong negative correlation exists between the two indexes, and if the correlation coefficient is close to 0, the no obvious monotone relation exists between the two indexes. And according to the result of the correlation analysis, selecting an influence factor index highly correlated with the main factor index, and eliminating the index weakly correlated with the main factor, thereby optimizing the selection of the input characteristics of the subsequent model. The highly relevant indexes are used as key input features in the subsequent modeling process, and provide a basis for updating the weights of the input features in the prediction model through a feature attention mechanism, so that the attention of the model to key factors is improved.
Third, referring to fig. 2, constructing LSTM combined with attention mechanism for prediction, comprising the steps of:
and step 1, dividing the preprocessed data into a training set, a testing set and a verification set according to the proportion of 8:1:1.
Step 2, constructing a prediction model of LSTM combined with a characteristic attention mechanism, defining an input layer, a hidden layer and an output layer, performing window construction on a data set, performing multi-component time sequence prediction on the input data serving as the prediction model, setting a time sequence Y= (Y1,…,yT-1,yT)∈RT) of any index as a prediction target, setting each index data of each factor of history T as a time sequence matrix X= (X1,x2,…,xN)T∈RT×N) of related characteristic variables, wherein N represents the dimension of parameters and comprises each index parameter in each factor,Representing the value of the nth variable at time t.
The important variables are weighted in the encoding stage by combining the characteristic attention mechanism to obtain the importance weight cN of each hidden state to the predicted output, and the importance weight cN represents the important characteristics of the current input characteristics to the output, and in the encoding stage, the context vector updated by the characteristic attention mechanism is shown in the following formulaFusion with previous history information to outputUsing these weight coefficients, the updated input variable for each time is obtained as a matrix x= (c1x1,c2x2,…,cNxN)T∈RT×N).
cN=fattention(x)
The weight updating is carried out on the input vector and each hidden layer state by utilizing a characteristic attention mechanism, so that the time sequence coding hidden layer state at each moment contains the association relation corresponding to the predicted target parameter and other characteristic parameters, thereby obtaining the predicted value of the historical data at the next moment
And 3, configuring network parameters such as learning rate, training times and the like. Training a predictive model, the training terminating in an L (θ) loss function convergence, the equation being as follows, where θ represents a set of network parameters,AndRepresenting the actual value and the predicted value of the target parameter i at time t, respectively.
The overall prediction accuracy of the model can be effectively measured by summing and averaging the root mean square errors of the predicted value and the actual value of each target parameter, and the model parameters can be optimized in the training process, so that the overfitting is reduced.
And 4, predicting real-time data by using a trained prediction model, and predicting index parameters at the time t+1 by using the data at the first n times of the time t+1 to be predicted as an input sequence by using the prediction model, wherein the formula is as follows:
fourth, referring to fig. 3, constructing an isolated forest for anomaly detection, comprising the steps of:
Firstly, comparing a predicted value output by a predicted model with an actual value, and calculating to obtain a predicted residual. And then, a Z score standardization method is used for carrying out standardization processing on the multidimensional prediction residual error and the influence factors so as to ensure the consistent scale among different features and facilitate the subsequent model training.
Step 2, constructing an outlier data identification model of an isolated forest, taking multidimensional prediction residual and influence factors as input features, wherein the model implicitly considers the relation between the input features during training, the relation is reflected in the result of a decision tree and reflected in the calculation of abnormal scores, specifically, the change of the influence factors is considered in the calculation of the abnormal scores of sample points, the updated abnormal score calculation formula is as follows, s (x) represents the abnormal score of a data point x, ht (x) represents the path length of a t-th tree, and c (n) is a constant of a standardized path length and is used for adjusting the difference between the input features.
The method comprises the steps of adopting a sliding window technology to count abnormal scores in a period of time, calculating the mean value and standard deviation in the window, dynamically adjusting a threshold value, flexibly setting the threshold value along with abnormal distribution change, improving the adaptability of a model to data fluctuation, setting the length of the sliding window to be W at the moment of time t, setting the abnormal score in the window to be { st-W+1,st-W+2,…,st }, and after calculating the mean value mut and the standard deviation sigmat in the sliding window, calculating a dynamic threshold value formula as follows, wherein Ut is the threshold value set in the current window, k is a constant, and is usually 2 or 3, and the dynamic threshold value formula is used for controlling the sensitivity of outlier data.
Ut=μt+k·σt
In the case of real-time data update, the mean value and standard deviation of the rolling update window can be adopted to reduce the calculated amount, and the abnormal score st+1 is introduced at the time t+1, and the earliest data st-W+1 of the window is removed, so that the calculation mode is as follows:
Through the recursive method, the mean value and the standard deviation can be dynamically adjusted when the window is updated each time, the recalculation cost is reduced, and meanwhile, the dynamic adjustment of the change of the threshold value along with time is ensured so as to adapt to the change of the abnormal score distribution.
And 3, setting super parameters of the isolated forest model, including the number of the isolated trees, the sampling amount of each tree, the length of a sliding window and the like. The model is trained by stepwise adjustment of these hyper-parameters and the performance differences at different settings are compared. When the performance of the model is evaluated, indexes such as ROC curve and AUC value are adopted to analyze the accuracy and stability of the model, so that the model parameter combination with optimal performance is selected.
And 4, identifying outlier data of the prediction residual by using the trained isolated forest model. Comparing the calculated anomaly score with a threshold value, and if the anomaly score is greater than the threshold value, determining the data point as outlier data.
In the foregoing embodiments, references in the specification to "this embodiment" indicate that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least some, but not necessarily all, embodiments. Multiple occurrences of "this embodiment" do not necessarily all refer to the same embodiment.
In the above embodiments, while the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications and variations of these embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory structures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed. The embodiments of the invention are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the methods of the present embodiments.
The embodiment also provides an electronic terminal, which comprises a processor and a memory;
The memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, so that the terminal executes any one of the methods in the present embodiment.
The computer readable storage medium of the present embodiment, those of ordinary skill in the art will appreciate that all or part of the steps of implementing the above-described method embodiments may be implemented by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs the steps comprising the method embodiments described above, and the storage medium described above includes various media capable of storing program code, such as ROM, RAM, magnetic or optical disk.
The electronic terminal provided in this embodiment includes a processor, a memory, a transceiver, and a communication interface, where the memory and the communication interface are connected to the processor and the transceiver and complete communication with each other, the memory is used to store a computer program, the communication interface is used to perform communication, and the processor and the transceiver are used to run the computer program, so that the electronic terminal performs each step of the above method.
In this embodiment, the memory may include a random access memory (Random Access Memory, abbreviated as RAM), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor may be a general-purpose processor, including a central Processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a digital signal processor (DIGITAL SIGNAL Processing, DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components.
The invention is operational with numerous general purpose or special purpose computing system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims (5)

3. The method for identifying the outlier data of the internet of things by multi-factor fusion according to claim 1 is characterized in that in step S2, correlation between each pair of indexes is analyzed by adopting a Szelman rank correlation coefficient according to nonlinear relations among different indexes, specifically comprising the steps of calculating the Szelman rank correlation coefficients of each pair of index data to be analyzed by carrying out rank conversion on the index data to be analyzed, reflecting monotone relation strength among the indexes, selecting an influence factor index highly related to a main factor index according to a result of correlation analysis, and rejecting indexes weakly related to the main factor, thereby optimizing selection of input features of a subsequent model.
CN202411644141.7A2024-11-182024-11-18 A multi-factor fusion method for identifying outlier data in the Internet of ThingsPendingCN119577648A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411644141.7ACN119577648A (en)2024-11-182024-11-18 A multi-factor fusion method for identifying outlier data in the Internet of Things

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411644141.7ACN119577648A (en)2024-11-182024-11-18 A multi-factor fusion method for identifying outlier data in the Internet of Things

Publications (1)

Publication NumberPublication Date
CN119577648Atrue CN119577648A (en)2025-03-07

Family

ID=94803029

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411644141.7APendingCN119577648A (en)2024-11-182024-11-18 A multi-factor fusion method for identifying outlier data in the Internet of Things

Country Status (1)

CountryLink
CN (1)CN119577648A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120296571A (en)*2025-06-122025-07-11交通运输部公路科学研究所 Road slope stability prediction method, device, storage medium and electronic equipment
CN120550951A (en)*2025-07-302025-08-29山东嘉华油脂有限公司 Centrifuge control method and system for soybean protein production

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120296571A (en)*2025-06-122025-07-11交通运输部公路科学研究所 Road slope stability prediction method, device, storage medium and electronic equipment
CN120550951A (en)*2025-07-302025-08-29山东嘉华油脂有限公司 Centrifuge control method and system for soybean protein production

Similar Documents

PublicationPublication DateTitle
CN119577648A (en) A multi-factor fusion method for identifying outlier data in the Internet of Things
CN111723929A (en)Numerical prediction product correction method, device and system based on neural network
CN108919059A (en)A kind of electric network failure diagnosis method, apparatus, equipment and readable storage medium storing program for executing
CN113487223B (en)Risk assessment method and system based on information fusion
CN113935535A (en) A principal component analysis method for medium and long-term forecasting models
CN113190429B (en) Server performance prediction method, device and terminal device
CN107992991A (en)Annual electricity sales amount Forecasting Methodology based on external environmental factor and Co-integration Theory
CN119397163B (en) Fixed pollution source data collection calibration method, system, medium and program product
CN119044778A (en)Lithium battery life prediction method based on DCAE-converter
CN117851953A (en)Water use abnormality detection method, device, electronic apparatus, and storage medium
CN119691759A (en) Information security protection method and system applied to electronic information platform
CN114861800B (en)Model training method, probability determining device, model training equipment, model training medium and model training product
CN120218735A (en) A construction engineering safety monitoring method and system based on machine learning
CN119516713A (en) Adaptive slope disaster early warning method and related device based on multi-source monitoring data
CN119721793A (en) Method, system, storage medium and device for product assembly quality assessment
CN113988709A (en)Medium-voltage distribution line fault rate analysis method and device, terminal equipment and medium
CN117521460A (en) A Bayesian finite element model correction method considering the uncertainty of environmental disturbances
CN111966966B (en) A method and system for analyzing the feasible region of parameters of sensor measurement error model
CN115169496A (en)BVAR model-based construction period bank slope deformation prediction method
CN120561474B (en) AI-based meteorological station data quality control method integrated with numerical forecast
CN112184301A (en)Data prediction method, device, equipment and computer readable storage medium
CN117909927B (en)Precipitation quantitative estimation method and device based on multisource data fusion model
CN118115141B (en) Zero-sample multi-source migration prediction method for storage reliability of electronic equipment
CN119444421B (en) A training method and device for data processing model
CN119669918B (en)Intelligent tea quality traceability system based on Internet of things technology

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp