DoH server detection and identification method based on access modeTechnical field:
The invention relates to the technical field of DoH flow detection, in particular to a DoH server identification method based on WEB access mode driving.
The background technology is as follows:
DoH (DNSoverHTTPS) is a technique for running DNS using the secure HTTPS protocol, the main purpose of which is to enhance the security and privacy of users. By using an encrypted HTTPS connection, the third party will no longer affect or monitor the parsing process. However, since the DoH protocol encrypts DNS traffic, this also presents challenges to network administrators because suspicious network traffic generated by malware and malicious tools cannot be detected.
In order to support network administrators to maintain network security, researchers have proposed various methods based on DoH traffic detection and identification. The existing DoH flow detection and analysis method mainly starts from two aspects of feature extraction and feature classification. DMITRIIVEKSHIN et al studied which information can be obtained from HTTPS extension data using machine learning and evaluated five popular machine learning methods, and found the best DoH classifier with higher accuracy for DoH identification, but relatively more time consuming. JonasBushart et al propose a method of analyzing DNS traffic based entirely on padding and encryption, which can combine traffic size and time information to infer websites visited by users, but may vary in practice due to the simulation of a particular user behavior. For example, assuming the client waits before the web site is fully loaded, without any background DNS traffic, the final evaluation is done with empty browser cache and DNS cache, only a valid TLD will be included. In practice, the DNS cache and browser cache states of the user may be different, which may result in fewer DNS requests being sent. DrewHjelm studied the C & C attack messages sent with DoH, discussing its impact on network security and providing some analysis tools, but it is mainly detected based on the IP address of the server, the detection effect of which may be reduced when the DoH server is deployed over a large area worldwide.
The invention utilizes the principle that the client side of the DoH protocol can access one or a plurality of physical servers of the selected DoH service provider before accessing various WEB resources, and the response modes adopted by the servers and the same client side for establishing TLS connection are always the same, and achieves the purpose of identifying the DoH server by detecting the TLS negotiation between the client side and the DoH server through fingerprint identification. Therefore, the DoH server fingerprint database established by the invention can also provide help and reference for the identification of the DoH server.
In the patent of DoH flow analysis method based on data frame extraction, message level encryption flow characteristics are extracted from the DoH flow, a data frame extraction classifier is constructed and trained to identify TLS messages bearing data frames, the TLS messages bearing the data frames are recombined into TLS flows, the flow level encryption flow characteristics are extracted, and a DoH flow fingerprint identification classifier is constructed and trained to identify specific webpages corresponding to the DoH flow. The method not only needs to know the data frame format and TLS protocol to a certain extent, but also needs a great deal of data set and time to train the data frame extraction classifier and the like.
In the patent of 'a method for detecting DoH flow in HTTPS flow', a method for detecting DoH flow in HTTPS flow is provided, by establishing an IP address library corresponding to a public DoH domain name, the public DoH flow is identified, and then the DoH flow of a non-public address is identified. The method needs to utilize the strong characteristics of the network data packet, so that the network data packet needs to be deeply checked, and certain performance problems exist.
Considering the emphasis and advantages and disadvantages of the above method, the inventor proposes a DoH server identification method based on access mode driving, which can query corresponding fingerprints in a DoH server fingerprint database for abnormal network flows detected by using information entropy, and can efficiently identify the DoH server fingerprints by combining with a mode of accessing WEB resources by a user, thereby judging whether the method belongs to the DoH server.
The invention comprises the following steps:
the invention aims to overcome the defects of the prior art, and provides a DoH server identification method based on a WEB access mode, which mainly solves the identification and classification problems of the DoH server.
Technical problem in order to confirm the nature of the DoH traffic after locating the abnormal traffic, it is necessary to further confirm the nature of the DoH traffic. Because clients using the DoH protocol first access one or more physical servers of a selected DoH service provider before accessing various WEB resources, and these servers always respond in the same manner as the same client establishes TLS connections, TLS negotiations between the client and the DoH server can be detected by fingerprinting. And the number of the DoH servers in the Internet is far smaller than that of the conventional HTTP servers, so that the invention establishes and uses the DoH server fingerprint database to provide references for the identification of the DoH servers. This is the object of the present invention.
The technical scheme adopted by the invention is that the method for identifying the DoH server based on the WEB access mode comprises the following steps of:
step 1, after locating abnormal flow by using an abnormal flow monitoring method based on information entropy and the like, establishing a fingerprint database of a DoH server to provide reference for the identity recognition of the DoH server;
step 2, the client performs TLS handshake after TCP three-way handshake to start TLS session, and the information in the handshake stage is used for generating fingerprints for any server, so as to query a server fingerprint database to verify the identity of the server fingerprint database;
Step 2-1, the Client sends a Client Hello message to the server, wherein the Client comprises TLS version supported by the Client, a supported encryption algorithm list, random numbers generated by the Client and the like;
step 2-2, after receiving the request, the Server sends a Server Hello message to the client, the Server sends a certificate to the client, if related encryption algorithms such as Diffie-HELLMAN KEY Exchange are used, the Server also sends SERVER KEY Exchange message to the client, and finally, the Server sends Server Hello Done message to indicate that the handshake message of the Server is sent completely;
Step 2-3, the client verifies whether the certificate is trusted after receiving the certificate, and sends CLIENT KEY Exchange message to the server after confirming the trusted, wherein the client sends CHANGE CIPHER SPEC message to inform the server that the message is sent by using an encryption mode;
Step 2-4, the server decrypts the message by the calculated session key, verifies the hash value and the MAC value, and sends CHANGE CIPHER SPEC the message to the client after verification, and informs the client that the message is sent by using an encryption mode;
and 3, calculating the fingerprint of the server in the abnormal flow by using the available TLS handshake message for identifying the identity of the DoH server. Before being applied to server identification, the common DoH server fingerprint is calculated and stored in a database for periodic maintenance;
analyzing the pcap file, and reading values of all fields in the fields for generating fingerprints in the TLS handshake message from each TLS session;
step 3-2, performing connection operation on the values, wherein 'separation of different fields is performed, and' separation of different values in the same field is performed;
step 3-3, hashing the connected text by using a fuzzy hash algorithm to obtain a TLS fingerprint of the server;
And 4, carrying out a DoH server identification algorithm based on an access mode on the abnormal network flow detected by using the information entropy, and calculating a fingerprint i of a server in the network flow, wherein if the corresponding fingerprint cannot be inquired in a DoH server fingerprint database, and the fingerprint cannot be judged to be DoH by other DoH flow characteristics, the abnormal network flow is judged to be a conventional WEB server, and the inquired fingerprint is successfully matched and counted. Using the above statistics, the average access interval time, i.e., the average number of detection cycles, is calculated using formula lnAi/Si, and classified into a DoH server and a conventional WEB server according to it.
And 4-1, measuring that the average domain name resolution time using the DoH protocol in the experimental network environment is not equal in 50 ms-300 ms, and the webpage loading time is not equal in 3 s-6 s, setting the values of a proper detection period T and a proper detection time window W for the recognition algorithm by referring to the two measurement results, and outputting the values as input to judge the fingerprint of the DoH server.
Step 4-2 initializing the counter of fingerprints Ai and Si to 0, Ai to how many detection cycles fingerprint i is present together, and Si represents the total number of occurrences of fingerprint i over the time window. For each detection period of length T within the time window W, the ai 'and Si' counters are initialized to 0.
Step 4-3, traversing each network flow in the captured traffic Traf_Cap [ T, t+T ], calculating the fingerprint i of the server, updating Ai 'and Si' counters, and increasing T by T.
Step 4-4, for each fingerprint i in the candidate fingerprint set, calculating the average access interval time, namely the average detection period number, by using lnAi/Si;
step 4-5, determining t_mode, which represents a threshold value for judging whether the server of a certain network flow is a DoH server, classifying the fingerprint i as the DoH server if the frequency of the fingerprint i is less than the threshold value t_mode, and classifying the fingerprint i as a conventional WEB server if the fingerprint i is not less than the threshold value t_mode.
The method has the advantages that the fingerprint i of the server in the network flow can be calculated according to the abnormal network flow detected by the information entropy, the server is classified into the DoH server and the conventional WEB server according to the average detection cycle number of the statistical information formic acid after successful matching, and the DoH server classification is realized by using higher efficiency.
Compared with the existing technology for verifying the identity of the server, such as SSL/TLS certificate, DNSSEC, SSH key fingerprint and the like, the TLS handshake information is used for generating the server fingerprint, so that the method has the advantages of high security, uniqueness, non-tamper property, portability and the like. In summary, TLS handshake information generation server fingerprinting is a more accurate, reliable, and portable server authentication method than other methods.
The TLS fingerprint of the server is obtained by hashing the connected text by using a fuzzy hash algorithm, and the method has the advantages of higher privacy protection, uniqueness, integrity verification capability, high efficiency, robustness and the like, and is a reliable method for protecting the connection security.
The method for identifying the DoH server has the advantages that the method can identify the DoH server more accurately, and avoids misclassification of some conventional WEB servers as the DoH server, thereby improving the accuracy of identification. This is because the conventional technology mainly judges whether a server is a DoH server based on an IP address and a port number, but this method has a risk of erroneous judgment because many DoH servers use the same IP address and port number as a conventional WEB server.
In contrast, fingerprint-based methods can better identify a DoH server because the fingerprint of a DoH server is typically unique. By frequency analysis of the fingerprints in the network stream, an appropriate threshold t_mode can be determined, classifying the fingerprints as DoH servers or conventional WEB servers. The method has higher accuracy and can better help network security team to identify and monitor the DoH traffic.
1. The method has the high efficiency that the server fingerprint is calculated by using the abnormal network flow detected by the information entropy, and the server can be rapidly and accurately classified as a DoH server or a conventional WEB server according to the statistics information of successful matching.
2. Unlike traditional IP address and port number based method, the method only needs to deploy information entropy detection algorithm in network, no extra hardware equipment is needed, and implementation cost is reduced.
3. The method is adaptable to different types of DoH servers and conventional WEB servers because it classifies based on fingerprints, which are typically unique.
4. The method can be easily expanded to a large-scale network, and a large number of network flows can be rapidly processed because information entropy detection and fingerprint calculation can be performed in parallel.
5. The method can help network security team to monitor and identify the DoH server better, so as to prevent malicious communication by using the DoH and improve network security.
Description of the drawings:
FIG. 1 is a general flow chart for DoH server identification based on WEB access patterns;
Fig. 2 is a specific flowchart of DoH server identification based on WEB access mode.
The specific embodiment is as follows:
The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention.
The embodiment comprises the following specific steps of a DoH server identification method based on a WEB access mode according to the positioned abnormal flow:
step 1, after locating abnormal flow by using an abnormal flow monitoring method based on information entropy and the like, establishing a fingerprint database of a DoH server to provide reference for the identity recognition of the DoH server;
step 2, the client performs TLS handshake after TCP three-way handshake to start TLS session, and the information in the handshake stage is used for generating fingerprints for any server, so as to query a server fingerprint database to verify the identity of the server fingerprint database;
Step 2-1, the Client sends a Client Hello message to the server, wherein the Client comprises TLS version supported by the Client, a supported encryption algorithm list, random numbers generated by the Client and the like;
step 2-2, after receiving the request, the Server sends a Server Hello message to the client, the Server sends a certificate to the client, if related encryption algorithms such as Diffie-HELLMAN KEY Exchange are used, the Server also sends SERVER KEY Exchange message to the client, and finally, the Server sends Server Hello Done message to indicate that the handshake message of the Server is sent completely;
Step 2-3, the client verifies whether the certificate is trusted after receiving the certificate, and sends CLIENT KEY Exchange message to the server after confirming the trusted, wherein the client sends CHANGE CIPHER SPEC message to inform the server that the message is sent by using an encryption mode;
Step 2-4, the server decrypts the message by the calculated session key, verifies the hash value and the MAC value, and sends CHANGE CIPHER SPEC the message to the client after verification, and informs the client that the message is sent by using an encryption mode;
and 3, calculating the fingerprint of the server in the abnormal flow by using the available TLS handshake message for identifying the identity of the DoH server. Before being applied to server identification, the common DoH server fingerprint is calculated and stored in a database for periodic maintenance;
analyzing the pcap file, and reading values of all fields in the fields for generating fingerprints in the TLS handshake message from each TLS session;
step 3-2, performing connection operation on the values, wherein 'separation of different fields is performed, and' separation of different values in the same field is performed;
step 3-3, hashing the connected text by using a fuzzy hash algorithm to obtain a TLS fingerprint of the server;
And 4, carrying out a DoH server identification algorithm based on an access mode on the abnormal network flow detected by using the information entropy, and calculating a fingerprint i of a server in the network flow, wherein if the corresponding fingerprint cannot be inquired in a DoH server fingerprint database, and the fingerprint cannot be judged to be DoH by other DoH flow characteristics, the abnormal network flow is judged to be a conventional WEB server, and the inquired fingerprint is successfully matched and counted. Using the above statistics, the average access interval time, i.e., the average number of detection cycles, is calculated using formula lnAi/Si, and classified into a DoH server and a conventional WEB server according to it.
And 4-1, measuring that the average domain name resolution time using the DoH protocol in the experimental network environment is not equal in 50 ms-300 ms, and the webpage loading time is not equal in 3 s-6 s, setting the values of a proper detection period T and a proper detection time window W for the recognition algorithm by referring to the two measurement results, and outputting the values as input to judge the fingerprint of the DoH server.
Step 4-2 initializing the counter of fingerprints Ai and Si to 0, Ai to how many detection cycles fingerprint i is present together, and Si represents the total number of occurrences of fingerprint i over the time window. For each detection period of length T within the time window W, the ai 'and Si' counters are initialized to 0.
Step 4-3, traversing each network flow in the captured traffic Traf_Cap [ T, t+T ], calculating the fingerprint i of the server, updating Ai 'and Si' counters, and increasing T by T.
Step 4-4, for each fingerprint i in the candidate fingerprint set, calculating the average access interval time, namely the average detection period number, by using lnAi/Si;
Step 4-5, determining t_mode, which represents judging whether the server of a certain network flow is a threshold value of the DoH server, classifying the fingerprint i as the DoH server if the average detection cycle number of the fingerprint i is smaller than the threshold value t_mode, and classifying the fingerprint i as a conventional WEB server if the average detection cycle number of the fingerprint i is smaller than the threshold value t_mode.
It should be noted that the above-mentioned embodiments are not intended to limit the scope of the present invention, and equivalent changes or substitutions made on the basis of the above-mentioned technical solutions fall within the scope of the present invention as defined in the claims.