Disclosure of Invention
The application aims to provide a DNS log coding and packaging method, which designs a coding format supporting various lengths on the basis of using UDP to package DNS logs, replaces TOP domain names and service IPs through a mapping dictionary, compresses the occupied space of TOP domain name character strings and service IPs, improves the utilization rate of IP packet heads and UDP packet heads, and achieves the aims of simultaneously meeting the delay-free transmission requirement and saving transmission bandwidth.
The technical solution for realizing the purpose of the application is as follows:
a method of DNS log encoding encapsulation, the method comprising the steps of:
establishing a domain name and a service IP mapping dictionary, wherein the domain name mapping dictionary is formed by one-to-one correspondence of each domain name and a set domain name number, and the service IP mapping dictionary is formed by one-to-one correspondence of each service IP and a set service IP number;
searching a domain name and service IP mapping dictionary for domain name/service IP in any DNS log, and replacing original character string representation in the DNS log with the matched domain name/service IP number to complete replacement compression of the DNS log;
and carrying out UDP encapsulation on a plurality of replaced and compressed DNS logs, generating at least one UDP packet and sending the UDP packet to the receiving end.
Further, the setting method of the domain name number is as follows: and sequencing the domain names from large to small according to the domain name request times in a unit time, and numbering the domain names according to sequence, wherein the domain name number is 2 bytes.
Further, the setting method of the service IP mapping number is as follows: a number is set for each service IP address, and the service IP mapping number is 2 bytes.
Further, when there is no matching domain name/service IP number in the domain name/service IP mapping dictionary, the original string representation is retained.
Further, after receiving the UDP packet, the receiving end unpacks the UDP packet, searches the DNS log after compression for a domain name and a service IP mapping dictionary, replaces the domain name/service IP number with the matched string representation, obtains the original string representation of the DNS log after compression, and completes restoration of the DNS log after compression.
Further, the DNS log includes an IP type, a user IP, a domain name, a resolution time, a record resolution address, a resolution result code, a DNS record type, a CNAME domain name, and a service IP.
Further, for the field with non-unique result in the domain name/service IP, the domain name/service IP numbers corresponding to a plurality of results in the same field are divided by using a first separator, and different fields are divided by using a second separator.
Further, the UDP encapsulation process flow is:
setting a UDP packet, wherein the head part of the UDP packet comprises a source port and a destination port, and the message data of the UDP packet comprises at least one compressed DNS log, wherein: the source port corresponds to the user IP of the DNS log, and the destination port corresponds to the service IP of the DNS log;
and according to the length of the message data of the UDP packet, calculating the number of the DNS logs which can be accommodated and storing the DNS logs to finish encapsulation.
A DNS log encoding packaging apparatus, the apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method of DNS log encoding packaging when executing a computer program.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a method of DNS log encoding encapsulation.
Compared with the prior art, the application has the remarkable advantages that: the application adopts a coding format of fixed length and unfixed length, saves the occupation of invalid request data, greatly compresses domain name character strings and service IP occupation space by replacing TOP domain names and service IP by using a mapping dictionary, and realizes the encapsulation of a plurality of DNS logs into one UDP packet, improves the utilization rate of the IP packet header and the UDP packet header, reduces the quantity of transmission data messages, not only meets the transmission requirement without time delay, but also achieves the purpose of saving transmission bandwidth.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, a method for packaging DNS log codes includes the following steps:
establishing a domain name and a service IP mapping dictionary, wherein the domain name mapping dictionary is formed by one-to-one correspondence of each domain name and a set domain name number, and the service IP mapping dictionary is formed by one-to-one correspondence of each service IP and a set service IP number;
searching a domain name and service IP mapping dictionary for domain name/service IP in any DNS log, and replacing original character string representation in the DNS log with the matched domain name/service IP number to complete replacement compression of the DNS log;
and carrying out UDP encapsulation on a plurality of replaced and compressed DNS logs, generating at least one UDP packet and sending the UDP packet to the receiving end.
Specifically, the setting method of the domain name number is as follows: and sequencing the domain names from large to small according to the domain name request times in a unit time, and numbering the domain names according to sequence, wherein the domain name number is 2 bytes.
Specifically, the setting method of the service IP mapping number is: a number is set for each service IP address, and the service IP mapping number is 2 bytes.
Specifically, when there is no matching domain name/service IP number in the domain name/service IP mapping dictionary, the original string representation is retained.
Specifically, after receiving the UDP packet, the receiving end unpacks the UDP packet, searches the DNS log after compression for a domain name and a service IP mapping dictionary, replaces the domain name/service IP number with the matched string representation, obtains the original string representation of the DNS log after compression, and completes the restoration of the DNS log after compression.
Specifically, the DNS log includes an IP type, a user IP, a domain name, a resolution time, a record resolution address, a resolution result code, a DNS record type, a CNAME domain name, and a service IP.
Specifically, for the field with non-unique result in the domain name/service IP, the domain name/service IP numbers corresponding to a plurality of results in the same field are divided by using a first separator, and different fields are divided by using a second separator.
Specifically, the UDP encapsulation process flow is:
setting a UDP packet, wherein the head part of the UDP packet comprises a source port and a destination port, and the message data of the UDP packet comprises at least one compressed DNS log, wherein: the source port corresponds to the user IP of the DNS log, and the destination port corresponds to the service IP of the DNS log;
and according to the length of the message data of the UDP packet, calculating the number of the DNS logs which can be accommodated and storing the DNS logs to finish encapsulation.
A DNS log encoding packaging apparatus, the apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method of DNS log encoding packaging when executing a computer program.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a method of DNS log encoding encapsulation.
Specific examples are as follows:
the DNS request data packet is acquired in any time period through a DNS (Domain Name System ) acquisition server, and is processed and resolved, so that basic parameter information including an IP type, a user IP, a request domain name, resolving time, a record resolving address, a resolving result code, a request DNS record type, a CNAME domain name and a service IP is extracted as a DNS log to be transmitted.
The format of the DNS log supports customization, and in this embodiment, a ten-tuple DNS log format is taken as an example for illustration, and DNS logs in other formats than this embodiment also support encoding and packaging according to the method of the present application.
The specific log format is shown in table 1:
0|111.123.222.12|www.163.com|20170308055153|111.26.137.143;11.26.137.144|0|1|ww w.163.com.cloudcdn.net;www.163.com.cloudglb.com;c01.i05.cmbhl.lv3.cloudglb.com
||117.180.180.180
the above is a specific section of DNS log, each field record is a character string, and is separated by a separator "|", and the total occupies 135 bytes. After the coding is carried out by adopting the method of the application, the occupied space is described as follows:
1. creating a mapping dictionary
(1) Establishing TOP5 ten thousand domain name mapping dictionary
And obtaining TOP5 ten thousand domain names according to the ranking of the request times in any time period, and sequentially sequencing numbers 1, 2, 3 and … for each domain name. The TOP5 ten thousand domain names appearing in the log are replaced with the sort number, then 65536 is represented by a maximum of 2 bytes, which is the full representation. The original traditional method for completely representing the domain name by using the character string at least needs tens to tens of bytes. The domain name of non-TOP 5 ten thousand is replaced by a code, and the original character string is used for representing. If the TOP5 ten thousand domain names in the domain name request of the user occupy a larger proportion, a large amount of space can be saved.
The number of n in the TOP n domain name can be determined according to practical situations. If n is too large, more byte space is needed to represent the value of the number; if n is too small, the partial domain name still occupies more byte space because of the inability to map the replacement. The total resolution frequency of TOP5 ten thousand domain names selected in the embodiment can reach more than 90% of the total resolution frequency of all domain names in the log, and the ordering number only needs 2 bytes, so that the compression effect is obvious.
(2) Establishing a service IP mapping dictionary
The service IP is an operator DNS server IP address providing a service, including an IPv4 address and an IPv6 address. The DNS server IP addresses of one province operator are typically tens to tens, and these IP addresses are set to one byte in the mapping dictionary module, that is, the maximum denoted 256 instead of the entire service IP. If the service IP is an IPv4 address, 4 bytes are compressed to 1 byte, and the compression rate is 75%; if the service IP is an IPv6 address, the service IP is compressed from 16 bytes to 1 byte, and the compression rate is 93.75 percent. Compared with the domain name, the service IP realizes full replacement, and the replacement rate reaches 100 percent.
2. Fixed length coding
Fixed length coding is to store fixed length fields into a planned fixed length space. And when the space is stored, the space is stored according to the planned space, and the insufficient space is filled with zero. When reading, the space is sequentially read according to the planned space, and the space occupied by the labels and the Length in the TLV (Tag Length Value mark Length Value) code is saved in this way. However, this method is only suitable for fields with fixed lengths, and if the length changes greatly, for example, the analysis result of the record a may be one IP or multiple IPs, the length is between a few bytes and several cross joints, and if the record a is stored with a fixed upper limit, more space is wasted.
The fields specifically coded with fixed length and the allocated space are as in table 2:
the 3 fields of the IP type, the parsing time and the parsing result code are stored in combination, and occupy 48 bits and 6 bytes in total. The fixed field code deposit plan is as follows. If the user IP is of IPv4 type, a total of 16 bytes are taken up. If the user IP is of IPv6 type, a total of 28 bytes are occupied. As in table 3:
3. non-fixed length coding
Such as a record resolving address, an AAAA record resolving address, a CNAME domain name, and the like, is stored in a non-fixed length code + separator manner. Multiple results within the same field are separated by "|", and different fields are separated by spaces.
If the CNAME domain name uses the TOP5 ten thousand domain name coding table to carry out replacement compression, each CNAME domain name takes 2 bytes, n CNAME domain names take 2n bytes, n-1 "" separator is needed between the domain names, and the CNAME domain name fields take 3n-1 bytes in total. Similarly, the A record analysis result occupies a minimum of 5m-1 bytes, and the AAAA record analysis result occupies a minimum of 17x-1 bytes. As in table 4:
by compressing the method of the application, it is assumed that all domain names and CNAME domain names are in TOP5 ten thousand domain name coding table, log
“0|111.123.222.12|www.163.com|20170308055153|111.26.137.143;11.26.137.144|0|1|www.163.com.cloudcdn.net;www.163.com.cloudglb.com;c01.i05.cmbhl.lv3.cloudglb.com
The fixed length encoded portion of i 117.180.180.180 "occupies 16 bytes, the CNAME domain name portion occupies 8 bytes, the a record resolution result portion occupies 9 bytes, the AAAA record resolution result portion is absent, and the separator portion between fields occupies 1 byte, then the DNS log occupies 34 bytes in total. More than three quarters of space is saved compared with the original 135 bytes in the form of character strings.
4. Multiple log package
Since the ethernet maximum data frame is 1518 bytes, the frame header 14 bytes and frame trailer CRC check portion 4 bytes of the ethernet frame are removed, leaving the maximum transmission unit carrying the upper layer protocol to be 1500 bytes. For the IP layer not to packetize, the maximum size of the UDP packet data portion should be 1500 bytes-IP header (20 bytes) -UDP header (8 bytes) =1472 bytes. When a conventional UDP encapsulates DNS logs, the UDP message data portion typically only stores one DNS log. After the DNS log coding compression is adopted, 5 DNS logs can be stored in the UDP message data part, and the 5 DNS logs share the same UDP message packet header and IP message packet header, so that the occupied space of a certain UDP packet header and an IP packet header is saved for transmission data.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.