S203, aiming at the recorded message ID, normalizing the occurrence time of the message event, pairing the event time of different message IDs according to the similar time, respectively calculating the correlation coefficient of each byte pair, extracting the byte pair of which the absolute value of the correlation coefficient is greater than a preset value, and marking the byte pair as a correlation relation group; the approach time includes the same time or a time within a preset range.

Specifically, for a selected packet, a line graph and a time series scatter diagram of actual packet byte values corresponding to the packet in the same time period can be drawn by using a visualization system, verification and rechecking are performed by combining the graph, and if the packet value change trends of the packet are consistent or just opposite and the correlations are very stable along with the time lapse, the packet has a strong correlation relationship, and the packet is determined and marked. If pairwise correlation exists among the multiple correlation relation groups, if correlation relations AB, BC and AC exist, ABC is merged into one correlation relation group. Or, only partial intersection relations exist between the groups, such as AB and AC, and the groups can also be combined into one group, but special marks are needed, and when training is performed later, only B and C can be selected as input items, and a is selected as output items.

Referring to fig. 3, calculated correlation coefficients of the 6 th byte and the 8 th byte of the message ID XXFEYYEE and the 3 rd byte of the message ID XXF003YY are 0.95 and 0.96, respectively, a message change time series scatter diagram of a certain sampling period is drawn by using a visualization system, wherein fig. 3(a) is a message value change diagram of the message XXFEYYEE, and fig. 3(b) is a message value change diagram of the message XXF003 YY. It can be seen from the figure that the three change trends are very consistent, the previous calculation results are verified, it is proved that pairwise correlation exists between the bytes corresponding to the two messages, a correlation group can be formed, any two bytes can be selected as input items, and the other one is an output item.

Referring to fig. 4, calculated correlation coefficients of the 2 nd and 3 rd bytes with message IDs of XXYYF030 and the 6 th byte of XXFEYY02 are 0.7 and 0.6, respectively, and a message change time sequence scatter diagram of a certain sampling period is drawn by using a visualization system, where fig. 4(a) is an XXYYF030 message value change diagram, and fig. 4(b) is an XXFEYY02 message value change diagram. The three changes are basically consistent, and by judging that the 2 nd and 3 rd bytes of XXYYF030 are combined by certain calculation rules to be more consistent with the 6 th byte change of XXFEYY02, only the 2 nd and 3 th bytes of XXYYF030 can be selected as input items, and the 6 th byte of XXFEYY02 can be selected as output items.

Further, the preset value of the correlation coefficient is 0.5.

The correlation coefficient calculation method is based on covariance and standard deviation, and a calculation formula of correlation coefficients of two-dimensional variables x and y is as follows:

wherein r is_xyRepresenting the sample correlation coefficient, S_xyRepresents the sample covariance, S_yDenotes the sample standard deviation of x, S_ySample standard deviations for y are indicated. Below are respectively S_xyCovariance sum S_x、S_yAnd (5) a calculation formula of standard deviation.

Wherein, x represents the kth (k value is generally 1 to 8) message byte value with message ID of A in the method, and y represents the mth message byte value with message ID of B. For example, x represents the message value of the 6 th byte with the message ID XXFEYYEE, and y represents the message value of the 3 rd byte with the message ID XXF003 YY.

S204, training the message data in each correlation grouping by using an LSTM neural network according to the time sequence, and establishing a prediction model of each correlation grouping.

Specifically, one of the pair of packets is arbitrarily selected as an input item, and the other is selected as an output item. If more than two objects are contained in the group, one of the objects is selected as an output item, and the other objects are selected as input items. The selection of input and output items may be adjusted according to the training effect. If there is a pairwise correlation between the message a _1 (1 st byte indicating a message ID of a), B _2 (2 nd byte indicating B), and C _5 (5 th byte indicating C), two of them, i.e., a _1 and B _2, can be arbitrarily selected as input items, and C _5 as an output item.

Further, after the prediction model is built, a plurality of segments of CAN bus messages collected by normal driving records are selected to test the prediction model, the standard deviation between the prediction message value of a byte corresponding to the message ID and the original message value is calculated, and a proper detection threshold value is set according to the standard deviation and the normal data range of the corresponding message. Specifically, the detection threshold may be set to 2 times the standard deviation, and in practical application, the detection threshold may be adjusted according to the training data condition and the fluctuation range of the normal message value itself, so as to avoid false alarm.

Further, based on the relevance grouping, a byte value corresponding to the output item of the model is calculated and predicted in real time by using a prediction model, and if the deviation of the data value of the predicted output item and the data value of the actually received message exceeds the detection threshold value obtained by training, the group of messages is considered to be abnormal, and the system is possibly subjected to malicious and illegal injection attacks. Continuing with the example in S204, the message sequences corresponding to A _1 and B _2 in a certain small time range are input during real-time detection, the predicted message value of C _5 in the corresponding time period is output, the error between the predicted value and the actual received value is calculated, and if the error is larger than the detection threshold value, abnormal behavior is prompted to be detected.

The invention relates to a vehicle network anomaly detection method based on correlation analysis, which is used for detecting vehicle CAN bus or other bus anomaly messages, obtaining a message combination with a strong correlation relationship by directly extracting original message byte data and carrying out correlation analysis, carrying out regression analysis on grouped message data, and establishing various normal message correlation models, wherein variables of the grouped models have a forward consistency relationship or an anti-correlation relationship, are an expression of a corresponding state of a vehicle sensor in a digital form in the vehicle driving process, and CAN be used for detecting the problem of data inconsistency caused by malicious data injection attack in real time.

The specific implementation of each module of the vehicle-mounted network abnormality detection system based on the correlation analysis is consistent with a vehicle-mounted network abnormality detection method based on the correlation analysis, and the description of the embodiment is not repeated.

The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using the design concept should fall within the scope of infringing the present invention.

Claims

1. A vehicle-mounted network anomaly detection method based on correlation analysis is characterized by comprising the following steps:

2. The correlation analysis-based vehicle-mounted network abnormality detection method according to claim 1, wherein the prediction model establishment method comprises:

3. The vehicle-mounted network anomaly detection method based on correlation analysis according to claim 2, characterized by calculating Hamming distance, analyzing Hamming distance data, and eliminating message ID with no change in message content and bytes with no change in message content in the message ID; recording the message ID and the corresponding byte sequence with the changed message content specifically comprises the following steps:

4. The correlation analysis-based abnormality detection method for the in-vehicle network according to claim 2, wherein the preset value of the correlation coefficient is 0.5.

5. The correlation analysis-based vehicle-mounted network abnormality detection method according to claim 1, wherein the detection threshold acquisition setting method comprises:

6. A vehicle network anomaly detection system based on correlation analysis is characterized by comprising:

7. The correlation analysis-based vehicle-mounted network abnormality detection system according to claim 6, wherein the establishment method of the prediction model includes:

8. The correlation analysis-based vehicle-mounted network anomaly detection system according to claim 7, wherein a Hamming distance is calculated, Hamming distance data is analyzed, and message IDs with no change in message content and bytes with no change in message content in the message IDs are removed; recording the message ID and the corresponding byte sequence with the changed message content specifically comprises the following steps:

9. The correlation analysis-based vehicle-mounted network abnormality detection system according to claim 7, wherein the preset value of the correlation coefficient is 0.5.

10. The correlation analysis-based vehicle-mounted network abnormality detection method according to claim 6, wherein the acquisition and setting method of the detection threshold comprises: