Disclosure of Invention
The application aims to provide a vehicle fault prediction method and a vehicle fault prediction system, so that the accuracy of vehicle fault prediction can be better, and the technical problem of the size of the fault probability of a vehicle component can be predicted.
In a first aspect, an embodiment of the present application provides a prediction method for a vehicle fault, where the method includes:
s1: and performing missing value processing, threshold value setting processing and characteristic quantity counting processing on the basis of extracting vehicle data comprising platform early warning fault information, vehicle index information, vehicle archive information and vehicle maintenance record information.
S2: responding to the times and frequency of vehicle fault information occurrence in a fixed time interval of statistics, analyzing the correlation between platform early warning fault information and vehicle index information and analyzing the correlation existing between different types of platform early warning fault information, and acquiring a plurality of characteristic data for vehicle fault prediction, wherein the vehicle fault information comprises actually occurred fault information and platform early warning fault information; and (c) and (d).
S3: and converting the category characteristic data into a data identifier, predicting the fault occurrence probability through a first SVM classifier, and further predicting the fault occurrence probability of the vehicle corresponding to the fault type through a second SVM classifier.
In the method, vehicle information data with different dimensions are extracted, missing values, abnormal values and threshold setting processing are carried out, vehicle fault information is counted, and the relevance between the fault information and vehicle index information is analyzed; and analyzing the relevance among the early warning fault information of different types of platforms, extracting a plurality of vehicle fault characteristic data, and training a first SVM classifier and a second SVM classifier to construct a prediction model of the vehicle fault probability and the vehicle component fault probability.
In some embodiments, the platform warning fault information includes barometric pressure sensor warning information, barometric pressure warning information, engine oil pressure warning information, water temperature warning information, engine oil warning information, and brake shoe wear warning information; the vehicle index information comprises vehicle speed, rotating speed and air inlet temperature; the vehicle archive information comprises vehicle purchase date, brand and vehicle type; the vehicle service record information includes a service time, a fault type, a service type, a fault description, and a service item. The platform early warning fault information and the vehicle index information can assist the crew in checking the health state of the vehicle, and the vehicle archive information and the vehicle maintenance record information are introduced to improve the accuracy of the real fault prediction of the vehicle.
In some embodiments, the step S1 specifically includes performing record deletion, zero value replacement, and mean value replacement on missing values and abnormal values in the vehicle data, performing threshold setting processing further based on the processed platform early warning fault information data, and further calculating the mean value p based on the processed vehicle index information1Standard deviation p2Average rate of change p3Sum and pole difference p4Four statistical characteristic quantity processes, and the specific calculation formula is as follows:
p4=xmax-xmin
wherein n represents the number of a group of vehicle index data in a fixed interval time, xiRepresenting the set of i-th vehicle index data, x, over a fixed time intervalmaxMaximum data value, x, representing the set of vehicle index data over a fixed time intervalminA minimum data value representing the set of vehicle index data over a fixed time interval. The ECU of each part of the vehicle can continuously detect and send early warning information when detecting that the parameters exceed the set parameters, but the early warning information may have the condition of false alarm or short-time automatic repair, and the early-stage pretreatment process is carried out on the extracted data aiming at the problem that the extracted data is needed so as to obtain a plurality of accurate characteristic data in the following period.
In some embodiments, the method further includes calculating the correlation in step S2 by using pearson correlation coefficient, where the calculation formula is as follows:
wherein X represents vehicle index data, Y represents platform early warning fault information, rhoXYRepresenting correlation coefficients of variables X and Y, Cov (X, Y) representing covariance of the variables X and Y, D (X) and D (Y) representing variance of the variables X and Y respectively, and setting early warning fault information of a platform to be generated and representing the early warning fault information by a value 1The early warning fault information of the platform is represented by a numerical value 0, rhoXYThe closer the value is to 1, the greater the correlation between the platform early warning fault information and the vehicle index data, and vice versa. And selecting the vehicle index data with large influence on the relevance of the early warning fault information of the vehicle platform as the characteristic data by utilizing the relevance between the vehicle index data and the early warning fault information of the platform.
In some specific embodiments, the method further includes that the association in the step S2 is to calculate an association data set and an association value between the vehicle platform early warning fault information a and B by using a weight association rule algorithm, and the specific operation steps are as follows:
s21: a transaction database D of early warning fault information of a scanning platform sets a minimum support degree min sup and generates a 1-item candidate set C by utilizing a common association rule
1Comparing the support degree of the early warning fault information A and B of the platform
And minimum support degree min sup generates 1-item frequency set L
1。
S22: based on 1-item frequent set L
1Generating a 2-item candidate set C
2Combining with the weight average value W of the platform early warning fault information A and B
(A∪B)Calculating the weighted support degree of early warning fault information A and B of the platform
Generating a weighted frequent set L by taking min sup as a minimum support degree
W2。
S23: setting a minimum degree of association based on the weighted frequent set LW2And generating a correlation data set and a correlation value between the platform early warning fault information A and the platform early warning fault information B by using the minimum correlation degree on the basis of the step (A).
In step S21, calculating the support degree of platform early warning fault information a and B
Comprises the following steps:
in step S22: weighted support degree of early warning fault information A and early warning fault information B of computing platform
Comprises the following steps:
in step S23: relevance degree of early warning fault information A and early warning fault information B of computing platform
Comprises the following steps:
wherein D represents a transaction database of the platform early warning fault information, count (D) is the number of all things,
the method comprises the steps of representing the incidence relation of platform early warning fault information A and platform early warning fault information B, representing the number of simultaneous transactions of the platform early warning fault information A and the platform early warning fault information B in a certain time range by sup-count (A U B), and W
(A∪B)And (3) representing the weight average value of the early warning fault information A and B of the vehicle platform, wherein min sup represents the minimum support degree, sup represents the support degree, and w sup represents the weighted support degree.
In the method, certain relevance exists among different types of platform early warning fault information, the relevance information can be mined through a relevance rule algorithm, and a relevance data set and relevance values among the different types of platform early warning fault information in a fixed time interval are obtained through the operation in combination with weight values converted by the different types of platform early warning fault information in the time interval.
In some embodiments, the method further comprises:
the weight in the weight association rule algorithm is represented by the time interval of the occurrence of the early warning fault information of different types of platforms in a certain time range, the time interval is inversely proportional to the weight, and the specific calculation formula is as follows:
Wti=(max(T)-ti)/(max(T)-min(T))
Wn=(Wt1+Wt2+…Wtn)/n
wherein, WtiWeight values, t, representing two different types of platform early warning fault information in ith time intervaliThe minimum time interval between two different types of platform early warning fault information in the ith time interval is represented, min (T) represents the shortest interval time between two different types of platform early warning fault information in a certain time range, max (T) represents the longest interval time between two different types of platform early warning fault information in a certain time range, WnRepresenting the weighted average of n pairs of different types of two platform early warning fault information within a certain time range (e.g., where the weighted average of vehicle platform early warning fault information A and B represents W(A∪B)). When a vehicle is in fault, the probability of a small-probability event is originally calculated, the weighted values of the early warning fault information of two platforms of different types in a fixed time period are calculated, and the authenticity of data is reflected on the weighted average value of n pairs of early warning fault information of two platforms of different types in a certain time range.
In some specific embodiments, in the first SVM classifier and the second SVM classifier in the step S3, a particle swarm optimization algorithm is used to determine a penalty parameter and a kernel function parameter, and the objective functions in the first SVM classifier and the second SVM classifier are respectively defined as:
s.t.|yi-ω*k-b|<ζi
wherein, omega represents an adjustable weight vector, namely the adjustable weight of each vector in the hyperplane, zeta is more than or equal to 0 and represents a relaxation variable, C represents a penalty parameter, b represents an offset, namely the offset of the hyperplane relative to the origin, and k is equal toK(xi,xj) Denotes the kernel function parameter, i 1, 2., n, j 1, 2., n, y in the first SVM classifieriSet to two values of 0 or 1, y in the second SVM classifieriThe value is composed of vehicle fault categories set to yi1, 2. The punishment parameters and the kernel function parameters are generally determined by experience, and the accuracy of the two parameters can be effectively improved by adopting a particle swarm optimization algorithm.
In some specific embodiments, the vehicle fault information data comprises positive type sample data which actually has a fault and negative type sample data which actually does not have a fault, k neighbors of each positive type sample are obtained by using the Euclidean distance before a training stage, a new sample is constructed based on sampling multiplying power, and vehicle fault information data with balanced data is obtained. The vehicle fault belongs to a small probability time, the number of positive sample data is far less than that of negative sample data under the actual condition, and in order to improve the accuracy of the SVM classifier model, data equalization processing needs to be carried out on the positive sample data and the negative sample data in the training samples.
In another aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and when executed by a processor, the computer program implements the method of the above embodiments.
In a third aspect, an embodiment of the present application provides a prediction system for a vehicle fault, including:
a vehicle data extraction unit: the system is configured to perform missing value processing, threshold value setting processing and statistical characteristic quantity processing based on extracting vehicle data including platform early warning fault information, vehicle index information, vehicle archive information and vehicle maintenance record information.
A feature data extraction unit: the system comprises a configuration unit, a configuration unit and a configuration unit, wherein the configuration unit is used for responding to the times and frequency of vehicle fault information in a statistical fixed time interval, analyzing the correlation between platform early warning fault information and vehicle index information and analyzing the correlation existing between different types of platform early warning fault information, and acquiring a plurality of characteristic data for vehicle fault prediction, wherein the vehicle fault information comprises the actually occurred fault information and the platform early warning fault information; and (c) and (d).
First and second SVM classifier units: the system is configured to convert the category characteristic data into data identification, predict the fault occurrence probability through a first SVM classifier, and further predict the fault occurrence probability corresponding to the vehicle through a second SVM classifier.
The embodiment of the application provides a prediction method and a prediction system for vehicle faults. Extracting vehicle data of platform early warning fault information, vehicle index information, vehicle archive information and vehicle maintenance record information, and processing missing values, set thresholds and statistical characteristic quantities; counting the occurrence frequency and frequency of vehicle fault information in a fixed time interval, analyzing the correlation between platform early warning fault information and vehicle index information and analyzing the correlation among different types of platform early warning fault information, and acquiring a plurality of characteristic data for vehicle fault prediction; and converting the category characteristic data into a data identifier, predicting the fault occurrence probability through a first SVM classifier, and further predicting the fault occurrence probability of the vehicle corresponding to the fault type through a second SVM classifier. The scheme is beneficial to realizing real-time prediction and supervision on the health state and the fault occurrence probability of the vehicle.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows a flow chart of a prediction method for vehicle failure according to an embodiment of the present application. As shown in FIG. 1, the method includes extracting vehicle data, vehicle fault feature extraction and machine learning training.
In step S1: and performing missing value processing, threshold value setting processing and characteristic quantity counting processing on the basis of extracting vehicle data comprising platform early warning fault information, vehicle index information, vehicle archive information and vehicle maintenance record information.
The vehicle data in the step comprises four types of vehicle data with different dimensions, namely platform early warning fault information, vehicle index information, vehicle archive information and vehicle maintenance records. Platform early warning fault information and vehicle index information are monitored through an Electronic Control Unit (ECU), and can assist a crew member to check the normal state of a vehicle to a certain extent, but because indexes of each part of the vehicle need to be manually called and checked, the current monitoring equipment cannot automatically monitor and check whether the indexes of the vehicle are normal, and in addition, the vehicle fault information has misinformation or the condition of automatic repair after early warning in a short time is generated, so that vehicle basic archive information and vehicle maintenance and maintenance information are introduced in a link of extracting vehicle dataAnd processing the extracted vehicle data by using a missing value and an abnormal value, and processing the extracted vehicle data by using modes of record deletion, zero value replacement, mean value replacement and the like so as to improve the accuracy of predicting the real fault of the vehicle. And setting a threshold value for the processed platform early warning fault information, and if 10 seconds are set as a filtering threshold value, only keeping the platform early warning fault information with the repair time exceeding 10 seconds. Counting the number of vehicle types with faults in a certain period of time, the number of times of vehicle faults and the frequency of vehicle faults as actual vehicle fault information, uniformly collecting n times of processed vehicle index information in a fixed time interval, and calculating the average value p of each vehicle index1Standard deviation p2Average rate of change p3Sum pole difference p4The four statistical feature quantities facilitate the operation of the correlation in step S2.
In step S2: and responding to the times and frequency of the occurrence of the vehicle fault information in the fixed time interval, analyzing the correlation between the platform early warning fault information and the vehicle index information and analyzing the correlation existing between different types of platform early warning fault information, and acquiring a plurality of characteristic data for predicting the vehicle fault, wherein the vehicle fault information comprises the actually occurred fault information and the platform early warning fault information.
In the step, the correlation between the early warning fault information of the analysis platform and the vehicle index information is calculated by adopting a Pearson correlation coefficient. And analyzing the relevance among the early warning fault information of different types of platforms through a weighted relevance rule algorithm to obtain a plurality of characteristic data for vehicle fault prediction.
In step S3: and converting the category characteristic data into a data identifier, predicting the fault occurrence probability through a first SVM classifier, and further predicting the fault occurrence probability of the vehicle corresponding to the fault type through a second SVM classifier.
Before training an SVM classifier model, a feature vector is constructed, a plurality of feature data obtained in the step S2 are further processed, category data in the feature vector are converted into data identifications (for example, 16 category maintenance records such as a disconnecting relay, a disconnecting three-way connector and the like in vehicle maintenance record information are converted in sequence according to categories and expressed by integers of 0-15), a prediction model of the vehicle fault probability is obtained through training of a first SVM classifier, a prediction model of the vehicle component fault probability is further obtained through training of a second SVM classifier, and the actual vehicle fault probability and the vehicle component fault probability can be predicted based on the constructed first SVM classifier model and the constructed second SVM classifier model.
In some preferred embodiments, the platform early warning fault information includes multiple kinds of early warning information, such as air pressure sensor early warning information, air pressure early warning information, engine oil pressure early warning information, water temperature early warning information, engine oil early warning information, brake shoe abrasion early warning information and the like; the vehicle index information comprises vehicle speed, rotating speed, air inlet temperature and the like; the vehicle profile information comprises vehicle purchase date, brand, vehicle type and the like; the vehicle service record information includes service time, failure type, service type, failure description, service item, and the like. The platform early warning fault information and the vehicle index information can assist the crew in checking the health state of the vehicle, and the vehicle archive information and the vehicle maintenance record information are introduced to improve the accuracy of the real fault prediction of the vehicle.
In some embodiments, the step S1 specifically includes performing record deletion, zero value replacement, and mean value replacement on missing values and abnormal values in the vehicle data, performing threshold setting processing further based on the processed platform early warning fault information data, and further calculating the mean value p based on the processed vehicle index information1Standard deviation p2Average rate of change p3Sum pole difference p4Four statistical characteristic quantity processes, and the concrete calculation formula is as follows:
p4=xmax-xmin
wherein n represents the number of a group of vehicle index data in a fixed interval time, xiRepresenting the set of i-th vehicle index data, x, over a fixed time intervalmaxMaximum data value, x, representing the set of vehicle index data over a fixed time intervalminA minimum data value representing the set of vehicle index data within a fixed time interval. The ECU of each part of the vehicle can continuously detect and send early warning information when detecting that the parameters exceed the set parameters, but the early warning information may have the condition of false alarm or short-time automatic repair, and the early preprocessing process is carried out on the extracted data so as to acquire a plurality of accurate characteristic data in the following period.
In some preferred embodiments, the correlation in step S2 is calculated by using pearson correlation coefficient, and the specific calculation formula is as follows:
wherein X represents vehicle index data, Y represents platform early warning fault information, rhoXYRepresenting correlation coefficients of variables X and Y, Cov (X, Y) representing covariance of the variables X and Y, D (X) and D (Y) representing variance of the variables X and Y respectively, setting that the information of platform early warning fault occurrence is represented by a value 1, the information of platform early warning fault non-occurrence is represented by a value 0, and rhoXYThe closer the value is to 1, the greater the correlation between the platform early warning fault information and the vehicle index data, and vice versa. And selecting the vehicle index data with high correlation degree with the platform early warning fault information as characteristic data by utilizing the correlation between the vehicle index data and the platform early warning fault information.
In some preferred embodiments, the relevance in the step S2 utilizes a weight relevance rule algorithm to output a relevance data set and a relevance value between the platform early warning fault information a and B, and the specific operation steps are as follows:
step S21: setting a minimum support degree min sup in a database D of early warning fault information of a scanning platform, and generating a 1-item candidate set C by using a common association rule
1Comparing the support degree of the early warning fault information A and B of the platform
And minimum support degree min sup generates 1-item frequency set L
1Support degree of platform early warning fault information A and B
The calculation formula is as follows:
step S22: based on 1-item frequent set L
1Generating a 2-item candidate set C
2Combining with the weight average value W of the platform early warning fault information A and B
(A∪B)Calculating the weighted support degree of early warning fault information A and B of the platform
Generating a weighted frequent set L by taking min sup as a minimum support degree
W2Weighted support of platform early warning fault information A and B
The specific calculation formula is as follows:
step S23: setting a minimum degree of association based on the weighted frequent set L
W2Based on the correlation data set and the correlation value between the platform early warning fault information A and the platform early warning fault information B, the correlation degree of the platform early warning fault information A and the platform early warning fault information B is generated by using the minimum correlation degree
The calculation formula is as follows:
wherein, the transaction database for representing the early warning failure information of the platform, count (D) is the number of all the things,
the method comprises the steps of representing the incidence relation of platform early warning fault information A and platform early warning fault information B, representing A and B as any two different types of platform early warning fault information in a fixed time interval, representing the number of transactions of the platform early warning fault information A and the platform early warning fault information B occurring at the same time in a certain time range by sup-count (Au B), and representing W
(A∪B)And (3) representing the weight average value of the early warning fault information A and B of the platform, min sup represents the minimum support degree, sup represents the support degree, and w sup represents the weighting support degree.
In the method, certain relevance exists among different types of platform early warning fault information, the relevance information can be mined through a relevance rule algorithm, and a relevance data set and a relevance value among different types of vehicle fault information are obtained through the operation by combining weight values converted at different types of platform early warning fault information time intervals. It is noted that A, B represents any different types of platform warning information, and through steps S21-S23, the associated data sets and associated values of any two different types of platform warning fault information are obtained within a certain time range.
In some preferred embodiments, the weights in the weight association rule algorithm are represented by time intervals of occurrence of different types of platform early warning fault information within a certain time range, the time intervals are inversely proportional to the weights, and the specific calculation formula is as follows:
Wti=(max(T)-ti)/(max(T)-min(T))
Wn=(Wt1+Wt2+…Wtn)/n
wherein, WtiWeight values, t, representing two different types of platform early warning fault information in ith time intervaliThe minimum time interval between two different types of platform early warning fault information in the ith time interval is represented, min (T) represents the shortest interval time between two different types of platform early warning fault information in a certain time range, max (T) represents the longest interval time between two different types of platform early warning fault information in a certain time range, WnRepresenting the weighted average of n pairs of early warning fault information of two platforms with different types within a certain time range (for example, the weighted average of the early warning fault information A and B of the vehicle platform represents W(A∪B)). The vehicle fault is originally a small-probability event, after the weight values of two different types of vehicle faults in a fixed time period are calculated, the average value of each weight value in a plurality of fixed time periods is used as the weight average value of the two different types of vehicle faults in a certain time range, and the authenticity of data is reflected.
In some preferred embodiments, the first SVM classifier and the second SVM classifier in S3 determine the penalty parameter and the kernel function parameter using a particle swarm optimization algorithm, wherein the objective function in the first SVM classifier and the second SVM classifier is defined as:
s.t.|yi-ω*k-b|<ζi
where ω represents an adjustable weight vector, i.e., the adjustable weight of each vector in the hyperplane, ζ ≧ 0 represents a relaxation variable, C represents a penalty parameter, b represents an offset, i.e., the displacement of the hyperplane from the origin, and K ═ K (x) where K isi,xj) Denotes the kernel function parameter, i 1, 2., n, j 1, 2., n, y in the first SVM classifieriSet to two values of 0 or 1, y in the second SVM classifieriThe value is composed of vehicle fault categories set to yi1, 2. The punishment parameters and the kernel function parameters are generally determined by experience, and the accuracy of the two parameters can be effectively improved by adopting a particle swarm optimization algorithm.
In some specific embodiments, the vehicle fault information data includes positive type sample data which actually has a fault and negative type sample data which actually does not have a fault, k neighbors of each positive type sample are obtained by using the euclidean distance before the training stage, new samples are constructed based on sampling multiplying power, and vehicle fault information data with balanced data is obtained. The vehicle fault belongs to a small probability time, the number of positive sample data is far less than that of negative sample data under the actual condition, and in order to improve the accuracy of the SVM classifier model, data equalization processing needs to be carried out on the positive sample data and the negative sample data in the training samples.
With continued reference to FIG. 2, a flow diagram of a prediction method for vehicle failure is shown, in accordance with a particular embodiment of the present application. The method comprises the steps of vehicle data extraction, data processing, fault feature extraction, machine learning training, true fault probability prediction and the like.
Step 201-204: vehicle data is extracted. Extracting vehicle data of platform early warning fault information, vehicle index information, vehicle file information and vehicle maintenance record information, wherein the platform early warning fault information comprises various early warning information such as air pressure sensor early warning information, air pressure early warning information, engine oil pressure early warning information, water temperature early warning information, engine oil early warning information, brake shoe abrasion early warning information and the like; the vehicle index information comprises vehicle speed, rotating speed, air inlet temperature and the like; the vehicle archive information comprises vehicle purchase date, brand, vehicle type and the like; the vehicle service record information includes vehicle data such as service time, failure type, service type, failure description, and service item. .
Step 205: and (6) data processing. The extracted vehicle data is processed with missing values and abnormal values, the processing can be performed by recording deletion, zero value replacement, mean value replacement and other modes, the processed platform early warning fault information is processed with a threshold value, for example, 10 seconds is set as a filtering threshold value, and only the platform early warning fault information with the repair time exceeding 10 seconds is reserved. The statistical characteristic processing is carried out on the processed vehicle index information to calculate the average value p1Standard deviation p2Average rate of change p3Sum pole difference p4And (5) four statistical characteristic values.
Step 206: and extracting fault characteristics. The method comprises the steps of counting vehicle failure information of vehicle types, failure times and failure frequency of vehicles which have failures within a certain period of time, calculating the correlation between platform early warning failure information and vehicle index information through Pearson correlation coefficients, and obtaining the correlation between different types of platform early warning failure information through a weight correlation rule algorithm.
Step 207: and (5) machine learning training. And adopting the first SVM classifier to construct a prediction model of the vehicle fault probability, and further constructing the prediction model of the vehicle component fault probability through the second SVM classifier.
Step 208: and predicting the true failure probability. Based on thestep 207, prediction models of the first SVM classifier and the second SVM classifier are constructed, and the probability of whether the actual vehicle is out of order or not and the probability of the vehicle component being out of order can be predicted.
In some preferred embodiments, the vehicle index information is statistically processed to calculate an average value p instep 2051Standard deviation p2Average rate of change p3Sum pole difference p4The four statistical characteristic values are specifically calculated as follows:
p4=xmax-xmin
wherein n represents the number of a group of vehicle index data in a fixed interval time, xiRepresents the set of i-th vehicle index data i-1, 2, 3maxRepresenting the maximum of the set of vehicle index data over a fixed time intervalLarge data value, xminA minimum data value representing the set of vehicle index data within a fixed time interval. ECU can not stop detecting each part of vehicle, when detecting and surpassing the settlement parameter, can send early warning information automatically, but early warning information probably has the condition of false positive or short time automatic repair, to and then need carry out early preprocessing process to the data that extract to follow-up a plurality of accurate characteristic data of acquireing. (for example, uniformly acquiring n times of vehicle index information in a 30-minute time period, namely n in a formula, which is equivalent to acquiring one time of vehicle index information in 30/n minutes, wherein the vehicle index information comprises vehicle speed, engine speed, air inlet temperature, engine oil pressure and the likeiIndicates the vehicle speed, x, of the ith collection in 30 minutesmaxAnd xminRepresenting the maximum and minimum vehicle speed over 30 minutes).
In some preferred embodiments, the correlation between the platform warning fault information and the vehicle index information instep 206 is calculated by using a pearson correlation coefficient, and the specific calculation formula is as follows:
wherein X represents vehicle index data, Y represents vehicle platform early warning fault information, rhoXYRepresenting correlation coefficients of variables X and Y, Cov (X, Y) representing covariance of the variables X and Y, D (X) and D (Y) representing variance of the variables X and Y respectively, setting that the information of platform early warning fault occurrence is represented by a value 1, the information of platform early warning fault non-occurrence is represented by a value 0, and rhoXYThe closer the value is to 1, the greater the correlation between the vehicle platform early warning fault information and the vehicle index data is, and otherwise, the smaller the correlation is. And selecting the vehicle index data with large influence on the platform early warning fault information relevance as the characteristic data by utilizing the relevance between the vehicle index data and the platform early warning fault information.
In some specific embodiments, the association between the different types of platform early warning fault information instep 206 adopts a weight association rule algorithm, for example, the association data set and the association value between the platform early warning fault information a and B, and the specific operation steps are as follows:
step S21: setting a minimum support degree min sup in a database D of early warning fault information of a scanning platform, and generating a 1-item candidate set C by using a common association rule
1Comparing the support degree of the early warning fault information A and B of the platform
And minimum support degree min sup generates 1-item frequency set L
1Support degree of platform early warning fault information A and B
The calculation formula is as follows:
step S22: based on 1-item frequent set L
1Generating a 2-item candidate set C
2Combining with the weight average value W of the platform early warning fault information A and B
(A∪B)Calculating the weighted support degree of early warning fault information A and B of the platform
Generating a weighted frequent set L by taking min sup as a minimum support degree
W2Weighted support of platform early warning fault information A and B
The specific calculation formula is as follows:
step S23: setting a minimum degree of association based on the weighted frequent set L
W2On the basis of the above, a correlation data set and a correlation value between the platform early warning fault information A and the platform early warning fault information B are generated by using the minimum correlation degree, and the correlation degree of the platform early warning fault information A and the platform early warning fault information B
The calculation formula is as follows:
wherein, the transaction database for representing the early warning failure information of the platform, count (D) is the number of all the things,
the method comprises the steps of representing the incidence relation of platform early warning fault information A and platform early warning fault information B, representing A and B as any two different types of platform early warning fault information in a fixed time interval, representing the number of transactions of the platform early warning fault information A and the platform early warning fault information B occurring at the same time in a certain time range by sup-count (Au B), and representing W
(A∪B)And (3) representing the weight average value of the early warning fault information A and B of the platform, min sup represents the minimum support degree, sup represents the support degree, and w sup represents the weighting support degree.
In the method, certain relevance exists among the early warning fault information of different types of platforms, the relevance information can be mined through a relevance rule algorithm, and a relevance data set and a relevance value are obtained through the operation by combining weight values converted at time intervals of the early warning fault information of the different types of platforms. It is noted that A, B represents any different types of platform warning information, and through steps S21-S23, associated data sets and associated values of any two different types of platform warning fault information are obtained within a certain time range.
In some preferred embodiments, D is a transaction database of platform early warning fault information, I ═ I1,i2,...,imIs a 2-item candidate set C among different types of platform early warning fault information within a certain time range2,W={W1,W2,...,WmAnd the values of the weight set can be scaled to be 1 in average value, namely the sum of the weight sets is m, so that the calculation is convenient and the values are more visual.
In some preferred embodiments, it is assumed that within 20 days, the vehicle platform early warning fault information transaction is 200The method includes the steps that a minimum support degree is 0.025, association of platform early warning fault information A and platform early warning fault information B is calculated as an example, the platform early warning fault information A appears in 15 transactions, the platform early warning fault information B appears in 20 transactions, the minimum support degree is met, 10 transactions appearing simultaneously in the platform early warning fault information A and the platform early warning fault information B have the support degree of 0.006, the support degree is multiplied by the weighted average value of the platform early warning fault information A and the platform early warning fault information B, the assumption is 1.2, the weighted support degree is 0.006, the minimum support degree is met simultaneously, the association degree of the platform early warning fault information A and the platform early warning fault information B is 0.06/0.075-0.8, and the association degree is represented as F(A∪B)={(A,B):0.8}。
In some preferred embodiments, there may be a certain correlation between different pieces of vehicle fault information, for example, when platform early warning fault information a occurs, platform early warning fault information B also occurs frequently, and for such correlation information, it may be mined through a correlation rule algorithm. The method has the advantages that the problem exists in analyzing the relevance between the early warning fault information of different types of platforms, most of the fault information of different types of vehicles does not occur simultaneously, the time interval is short, and the shorter the occurrence time between the early warning fault information of two different types of platforms is, the larger the corresponding relevance is. Under normal conditions, the time interval of occurrence of the early warning fault information of all platforms of each vehicle within a certain time range (such as 20 days) is calculated by taking seconds as a unit, the time interval value is converted into a weight, the longer the interval is, the smaller the weight is, and otherwise, the corresponding weight is larger. The method comprises the steps of taking the first occurrence time of platform early warning fault information as a starting point, dividing the time by taking the interval of 2 hours (the time length can be selected according to actual conditions), filtering the fixed time intervals of which the number of the platform early warning fault information occurring within 2 hours is less than 2 different types, and recording the platform early warning fault information within the filtered time intervals as an affair. The following weight value calculation formula is executed:
Wti=(max(T)-t)/(max(T)-min(T))
Wn=(Wt1+Wt2+…Wtn)/n
wherein, WtiRepresenting two different types of platform early warning fault messages in the ith time interval (i 2 hours in 20 days)Weighted value of tiThe method comprises the steps of representing the minimum time interval between two different types of platform early warning fault information in the ith time interval (i 2 hours in 20 days), min (T) representing the shortest interval between two different types of platform early warning fault information in a certain time range (in 20 days), max (T) representing the longest interval between two different types of platform early warning fault information in a certain time range (in 20 days), WnRepresenting the weighted average value of n pairs of early warning fault information of two platforms with different types within a certain time range (within 20 days) (for example, the weighted average value of the early warning fault information A and B of the vehicle platform represents W(A∪B)). When a vehicle breaks down, the original condition is a small probability event, after weight values of two different types of vehicle faults in a fixed time period are calculated, the average value of each weight value in a plurality of fixed time periods is used as the weight value of the two different types of vehicle faults, and the authenticity of data is reflected. It should be noted that, when there is a vehicle fault of the same type within a fixed time interval (within 2 hours), the weight value of the fixed time interval is set to 0.
FIG. 3 is a flowchart illustrating SVM classifier training steps according to an embodiment of the present application. The training specifically comprises characteristic parameter extraction, training set selection and test set selection, data normalization processing, an SVM classifier, classification requirement judgment, SVM parameter optimization, test data input and classification accuracy judgment. :
step 301: and starting.
Step 302: and extracting characteristic parameters. The method comprises the steps of counting the occurrence times and frequency of vehicle fault information in a fixed time interval, analyzing the correlation between platform early warning fault information and vehicle index information and analyzing the correlation existing between different types of platform early warning fault information, acquiring a plurality of characteristic data for vehicle fault prediction and converting the characteristic data into data identification.
Step 303: a training set and a test set are selected. The method comprises the steps of defining data samples with faults which actually occur as positive samples, defining data samples with early warning faults which do not actually occur as negative samples, and training and testing two stages in the fault detection process. The data set is divided into two sets of data (e.g., 80%/20% division, with 20% test data) according to the corresponding ratio.
Step 304: and (6) data normalization processing. Training samples are adopted in the training stage, wherein the number of positive samples is far less than that of negative samples, and before training, the positive samples and the negative samples need to be equalized in the training samples.
Step 305: an SVM classifier. And defining a target function according to a structure risk minimization principle, and selecting a punishment parameter and a kernel function parameter.
Step 306: whether the classification requirements are met. The constructed model for predicting vehicle faults is tested bystep 305 to determine whether the classification requirements are met, if yes, step 308 is performed, and if no, step 307 is performed.
Step 307: and optimizing SVM parameters. And selecting the optimal penalty parameter and kernel function parameter by adopting a particle swarm optimization algorithm, and continuing to execute thestep 306.
Step 308: test data is input. In the testing stage, the trained model is used for identifying whether the vehicle in the test sample has a fault or not, predicting the probability of the vehicle having the fault or identifying the type of the specific fault of the vehicle in the test, and predicting the type probability of the specific vehicle having the fault.
Step 309-310: and (4) classification accuracy. And the classification accuracy of the training model of the SVM classifier is judged by combining the actual distribution conditions of the positive samples and the negative samples in the test samples and the training model of the SVM classifier, so that the classification of whether the vehicle has faults or not and the probability of the fault of the vehicle component can be conveniently predicted in the actual operation.
Step 310: and (6) ending.
In some preferred embodiments, in the data normalization processing instep 304, in order to enhance the classification performance of the SVM classifier, equalization of positive type samples and negative type samples is first performed on training samples, SMOTE resampling is performed on a few positive type samples, K nearest neighbors of each few positive type samples are determined, one of the samples is sequentially selected, the sample is combined with the K nearest neighbors, and the above steps are repeated until the number of the few positive type samples is sufficient, and a new artificially synthesized sample is added to a corresponding training data sample set, specifically as follows:
assuming that each sample in the positive minority samples is represented by X in step 304-1, the Euclidean distance is used as a standard to calculate the distances from X to other positive minority samples, and K nearest neighbors of X are obtained.
In step 304-2, a sampling ratio N is set for the imbalance ratio in the training samples, where N is expressed as a sampling magnification N, that is, each sample is changed to N times the original value by X, and for each minority class sample X, a number of samples are randomly selected from K nearest neighbors of X.
Assume in step 304-2 that the selected neighbor is XnFor each randomly selected neighbor XnAnd respectively constructing new samples with the original sample X, wherein the specific operation formula is as follows:
wherein X
nA new positive type sample of the composition is represented,
denoted as the center value inside the positive class sample, rand (0, 1) is a randomly generated value in the range of 0 to 1.
In some preferred embodiments, in the SVM classifier and the second SVM classifier instep 305, a particle swarm optimization algorithm is used to determine a penalty parameter and a kernel function parameter, and the objective functions in the first SVM classifier and the second SVM classifier are respectively defined as:
s.t.|yi-ω*k-b|<ζi
wherein, omega represents an adjustable weight vector, namely the adjustable weight of each vector in the hyperplane, zeta is more than or equal to 0 and represents a relaxation variable, and C represents a penalty parameterAnd b represents an offset, i.e., the offset of the hyperplane from the origin, K ═ K (x)i,xj) Denotes the kernel function parameter, i 1, 2., n, j 1, 2., n, y in the first SVM classifieriSet to two values of 0 or 1, y in the second SVM classifieriThe value is composed of vehicle fault categories set to yi1, 2. The punishment parameters and the kernel function parameters are generally determined by experience, and the accuracy of the two parameters can be effectively improved by adopting a particle swarm optimization algorithm.
In some preferred embodiments, the optimization of SVM parameters instep 307 adopts a particle swarm optimization algorithm to specifically operate as follows:
in step 307-1, the value ranges of the penalty parameter C and the kernel function parameter k are preliminarily determined. Different SVM algorithm parameters are set in actual operation, repeated training is carried out for multiple times, and parameters with high prediction accuracy are selected. The penalty parameter C and the kernel function parameter k are usually given empirically, but it is often difficult to obtain an optimal value, so a Particle Swarm optimization (PS 0) is adopted for optimization.
In step 307-2, parameters in the PSO algorithm are initially determined, and a screening interval and a step size are determined. And automatically combining and traversing preset parameters. In a specific iteration process, the problem of premature PSO convergence is solved by changing the inertia weight, and the PSO can be well optimized in the whole world due to the fact that the larger inertia weight is set when the algorithm is started; with the increase of the iteration times, the inertia weight is gradually reduced, local optimization can be well achieved, and the performance of the PSO is greatly improved. And selecting the parameter combination with the highest accuracy to form the SVM model. When the accuracy is close, the combination with the smallest penalty parameter C is chosen to avoid overfitting.
With continued reference to FIG. 4, a flowchart of SVM classifier simulation training according to an embodiment of the present application is shown, with specific simulation training including no-fault feature data training and fault feature data training.
Model 401: no fault signature data. And (4) aiming at the platform early warning fault information, training a first SVM classifier (with/without fault).
Model 402: faulty characteristic data. For the actually occurred fault information and the platform early warning fault information, the first SVM classifier (with/without fault) training is firstly carried out, and then the second SVM classifier (vehicle fault category) is carried out.
With continuing reference to FIG. 5, a flow diagram of SVM classifier simulation prediction is shown, in accordance with a specific embodiment of the present application, the method comprising the steps of:
step 501: vehicle characteristic data. Extracting vehicle data with different dimensions, and performing data processing to obtain fault characteristic data;
step 502: first SVM classifier (with/without fault). Converting the fault feature data obtained instep 501 into digital identifications to perform fault prediction on the first SVM classifier trained in FIG. 3;
step 503: the prediction result is output (no failure: 40%). When the predicted probability of no fault occurrence is higher than 50%, the steps 501-502 of vehicle feature data extraction and first SVM classifier (fault existence/non-fault existence) are continuously executed;
step 504: the prediction result (failure: 60%) is output. Outputting a prediction result, and executing astep 505 of a second SVM classifier (vehicle fault category) when the prediction probability of the occurrence of the fault is higher than 50%;
step 505: a second SVM classifier (vehicle failure category). Converting the fault feature data obtained in thestep 501 into digital identifications to further predict the probability of the vehicle fault type of the vehicle fault for the second SVM classifier trained by the method shown in FIG. 3;
step 506: and outputting a prediction result: if the rear door electromagnetic valve leaks air: 50%, air leakage of drying bottle: 45%, HCU failure: 13%, the probability of the rear door electromagnetic valve air leakage fault is 50%, the probability of the drying bottle air leakage fault is 45%, and the probability of the HCU fault is 13%.
In addition, the application also provides a prediction system for the vehicle faults. As shown in fig. 6, the method includes: a vehicledata extraction unit 601, a featuredata extraction unit 602, and first and secondSVM classifier units 603. When the vehicle data of different dimensions are extracted by the vehicledata extraction unit 601 for preprocessing, the featuredata extraction unit 602 counts the occurrence frequency and frequency of vehicle fault information in a fixed time interval on the data extracted by the vehicledata extraction unit 601, analyzes the correlation between platform early warning fault information and vehicle index information and the correlation between different types of platform early warning fault information, extracts feature data by the featuredata extraction unit 602, and combines the firstSVM classifier unit 603 and the secondSVM classifier unit 603 to predict the probability of vehicle fault and predict the probability of vehicle component fault.
In a specific embodiment, the vehicle data extraction unit 601: the system is configured to perform missing value processing, threshold value setting processing and statistical characteristic quantity processing based on extracting vehicle data including platform early warning fault information, vehicle index information, vehicle archive information and vehicle maintenance record information.
The feature data extraction unit 602: the system is configured to respond to the statistics of the occurrence times and frequency of vehicle fault information in a fixed time interval, analyze the correlation between platform early warning fault information and vehicle index information and analyze the correlation existing between different types of platform early warning fault information, and acquire a plurality of characteristic data for vehicle fault prediction, wherein the vehicle fault information comprises actually occurred fault information and platform early warning fault information.
First and second SVM classifier units 603: the system is configured to convert the category feature data into data identification, predict the probability of fault occurrence through a first SVM classifier, and further predict the probability of fault occurrence of the vehicle corresponding to the fault type through a second SVM classifier.
Referring now to FIG. 7, shown is a block diagram of acomputer system 700 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 7, thecomputer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from astorage section 708 into a Random Access Memory (RAM) 703. In theRAM 703, various programs and data necessary for the operation of thesystem 700 are also stored. TheCPU 701, theROM 702, and theRAM 703 are connected to each other via abus 704. An input/output (I/O)interface 705 is also connected tobus 704.
The following components are connected to the I/O interface 705: aninput portion 706 including a keyboard, a mouse, and the like; anoutput section 707 including a display such as a Liquid Crystal Display (LCD) and a speaker; astorage section 708 including a hard disk and the like; and acommunication section 709 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 709 performs communication processing via a network such as the internet. Adrive 710 may also be connected to the I/O interface 705 as desired. Aremovable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 710 as necessary, so that a computer program read out therefrom is mounted into thestorage section 708 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 709, and/or installed from theremovable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application.
It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a vehicle data extraction unit, a feature data extraction unit, and first and second SVM classifier units. The names of these modules do not constitute a limitation on the modules themselves in some cases, and for example, the vehicle data extraction unit may also be described as "a unit that performs missing value processing, threshold value setting processing, and statistical feature amount processing based on extraction of vehicle data including platform warning failure information, vehicle index information, vehicle profile information, and vehicle maintenance record information".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: the system comprises a vehicle data extraction unit, a feature data extraction unit and a first SVM classifier unit and a second SVM classifier unit. Wherein the vehicle data extraction unit: the system is configured to perform missing value processing, threshold value setting processing and statistical characteristic quantity processing based on vehicle data extracted from the system, wherein the vehicle data comprises platform early warning fault information, vehicle index information, vehicle archive information and vehicle maintenance record information; a feature data extraction unit: the system comprises a configuration unit, a configuration unit and a configuration unit, wherein the configuration unit is used for responding to the times and frequency of vehicle fault information in a statistical fixed time interval, analyzing the correlation between platform early warning fault information and vehicle index information and analyzing the correlation existing between different types of platform early warning fault information, and acquiring a plurality of characteristic data for vehicle fault prediction, wherein the vehicle fault information comprises the actually occurred fault information and the platform early warning fault information; first and second SVM classifier units: the system is configured to convert the category characteristic data into data identification, predict the fault occurrence probability through a first SVM classifier, and further predict the fault occurrence probability corresponding to the vehicle through a second SVM classifier.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements in which any combination of the features described above or their equivalents does not depart from the spirit of the invention disclosed above. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.