Disclosure of Invention
The invention aims to solve the problem of comprehensively collecting the main domain name under the unit name as much as possible according to the basic information of a target unit, and provides a target unit main domain name discovery, expansion and verification technology and method based on the seed main domain name of the target unit acquired by a third-party website.
The invention provides a main domain name acquisition and verification method, which comprises the following steps:
step 1: establishing a main domain name seed of a target unit through a third-party website;
step 2: establishing a domain name seed set by acquiring a secondary domain name of a main domain name seed, wherein the main domain name seed set and the secondary domain name seed set form a domain name seed set;
and step 3: acquiring whois information of a domain name seed, and constructing a whois information white list including registration information;
and 4, step 4: acquiring DNS records of domain name seeds, and constructing a DNS information white list; the CNAME record is added to the main domain name seed as a newly discovered main domain name; adding a source host and a contact mailbox in the SOA record into a white list;
and 5: obtaining an HTML document by accessing a domain name website based on a domain name seed, obtaining a webpage link in the accessible website, extracting a domain name in the link, further obtaining a main domain name set to be verified, and expanding the main domain name; if the obtained main domain name set to be verified is not empty, executing thestep 6, otherwise, executing the step 9;
step 6: acquiring whois information of the domain name in the main domain name set to be verified in thestep 5, comparing the registration information of the domain name to be verified with the information in the domain name whois information white list established in thestep 3, and verifying whether the domain name to be verified belongs to a target unit; if the verification is successful, the domain name is the main domain name of the target unit, the domain name is added into a main domain name seed set, and the step 9 is executed; otherwise, executing step 7;
and 7: obtaining DNS information of domain names in the main domain name set to be verified in thestep 5, and verifying whether the domain names belong to the target unit by judging whether the CNAME records are domain names in the domain name seeds of the target unit or not, whether suffixes of the NS and MX record host names are domain names in the main domain name seeds of the target unit or not, and comparing the SOA records with the DNS information white list established in thestep 4; if the verification is successful, the domain name is the main domain name of the target unit, the domain name is added into a main domain name seed set, and the step 9 is executed; otherwise, executing step 8;
and 8: acquiring the filing information of the main domain name to be verified in thestep 5, and verifying whether the main domain name is the main domain name of the target unit; if the verification is successful, executing step 9; if the verification fails, the main domain name, the domain name and the link are put in a warehouse for future reference, and the step 9 is executed;
and step 9: constructing a domain name prefix dictionary P, and executing the step 10;
step 10: constructing a domain name suffix dictionary L, and executing the step 11;
step 11: constructing a main domain name to be detected, and creating a new main domain name through a domain name prefix dictionary P and a domain name suffix dictionary L; the new main domain name forms a main domain name set DS1 to be detected;
step 12: performing DNS query on the seed domain name to obtain a corresponding IP set, requesting to obtain a corresponding certificate for a 443 port of each IP, and obtaining a main domain name set DS2 to be tested through a domain name served by the certificate; executing step 13;
step 13: the main domain name set to be tested DS = DS1+ DS2, the domain name in the main domain name set to be tested is verified, and the step 14 is executed;
step 14: acquiring the whois information of the domain name in the main domain name set to be verified in the step 13, comparing the registration information of the domain name to be verified with the information in the domain name whois information white list established in thestep 3, and verifying whether the domain name belongs to a target unit; if the verification is successful, the domain name is the main domain name of the target unit, the domain name is added into the main domain name seed set, and the step 16 is executed on the newly added domain name; otherwise, executing step 15;
step 15: obtaining DNS information of domain names in the main domain name set to be verified in the step 13, if the CNAME is the domain name in the main domain name seed, or suffixes of host names recorded by the NS and the MX are the domain names in the main domain name seed, or the SOA record can be matched with the DNS information white list established in thestep 4, considering the domain names as the main domain names of the target unit, and adding the domain names into the main domain name seed set; step 16 is executed;
step 16, if the new main domain name exists in the main domain name seed set, executingstep 2; otherwise, the extension process ends.
Preferably, the domain name prefix dictionary P = P1+ P2, includes:
i. the prefix dictionary P1 is used for extracting a main domain name seed prefix;
ii. The prefix dictionary P2 is for collecting the full-simplified Chinese Pinyin, traditional Chinese and English writing method of the target unit.
Preferably, the domain name suffix dictionary L = L1+ L2+ L3, comprising:
a. constructing a top level domain dictionary L1 through a top level domain issued by IANA;
b. constructing a domain name suffix dictionary L2 from the seed domain name set;
c. the secondary domain constructs a domain name suffix dictionary L3.
Preferably, the registration information includes registrants, registration phones, registration mailboxes, and registration organization names.
The invention has the beneficial effects that: firstly, acquiring a main domain name seed through a third-party website, secondly, realizing domain name expansion through a page extraction link acquired by an accessible website of a target unit, a combination of a domain name prefix and a universal domain name suffix and the like on the basis of the main domain name seed, and then finishing accurate judgment on the affiliated relationship of the expanded main domain name and the target unit through whois information comparison verification, DNS information analysis verification and the like; the invention completes the discovery, the expansion and the verification of the main domain name of the target unit through various ways, provides guarantee for researching the domain name assets of the target unit, realizes automation by utilizing a computing technology, improves the efficiency and the feasibility of obtaining the main domain name of the target unit, and saves human resources.
Detailed Description
The present invention is further described below with reference to the drawings and examples so that those skilled in the art can easily practice the present invention.
As shown in FIG. 1, when the system of the present invention works, the name of the target unit is inputted after the system starts, and the main domain name seed of the target unit is established through a third-party website such as an icp record, a sitter's home, a search engine, etc. And establishing a secondary domain name seed set by a secondary domain name module for acquiring the main domain name seed, and establishing the domain name seed set. And constructing a white list of the whois information through the whois information of the domain name seeds, wherein the white list comprises registrants, registered telephones, registered mailboxes and registered organization names. And constructing a DNS white list through DNS information of the domain name seed. And accessing a website corresponding to the domain name, acquiring a link of the website, extracting the domain name in the link, and further acquiring the main domain name to be verified. And judging whether the domain name belongs to a target unit or not by the whois information, and if so, adding the domain name to a newly added main domain name seed set. And if not, judging the relation of the record information verification target unit, wherein the record information comprises ICP record and public security record. Judging the relation of the record information verification target unit, if the record information verification target unit is unsuccessful, warehousing the unconfirmed domain name and the related link for being checked, and then constructing a prefix dictionary P; and if the prefix dictionary is successful, directly constructing a prefix dictionary P.
Prefix dictionary P = P1+ P2, including the following aspects:
i. extracting a main domain name seed prefix as a prefix dictionary P1;
ii. The full simplified Chinese Pinyin, traditional Chinese and English writing method of the target unit is collected as the domain name prefix dictionary P2.
After completing the construction of the prefix dictionary P, a suffix dictionary L = L1+ L2+ L3 is constructed, which includes
a. A top level domain dictionary L1 is constructed in a top level domain of https:// www.iana.org/domains/root/db/published by an IANA (The Internet Assigned number Authority);
b. constructing a domain name suffix dictionary L2 from the seed domain name set;
c. the secondary domain constructs a domain name suffix dictionary L3; for example, com.cn,. net.cn,. gov.cn,
Cn, etc.
The method comprises the steps of establishing a new main Domain name through a Domain name prefix dictionary P and a Domain name suffix dictionary L established by the Domain names, establishing a main Domain name Set DS1 to be tested by the new main Domain name, obtaining an IP Set corresponding to the Domain name Set through Domain name resolution, obtaining a Domain name Set CDS (namely a Certificate Domain Set) of Certificate service by sending a request to a 443 port of an IP, obtaining a corresponding main Domain name Set DS2 for the Domain name Set in the CDS, verifying each Domain name in the Domain name Set to be tested, and obtaining a Domain name Set DS = DS1+ DS2 to be tested. When the domain name to be detected is verified, whois information judges whether the domain name belongs to a target unit, if so, the whois information is added to a newly added main domain name seed set; if not, judging whether DNS information verification belongs to a target unit, if the DNS information verification is passed, adding the new main domain name seed set, and if the DNS information verification is not passed, judging whether the new main domain name set is empty. Meanwhile, judging whether the newly added main domain name seed set is empty, if the newly added main domain name seed set is empty, ending the system; and if not, adding the newly added main domain name set into the main domain name seed, constructing a secondary domain name set through the secondary domain name module again, and repeating the operation.
As shown in fig. 2, the system architecture diagram of the present invention includes adatabase 1, a domainname extension sub-module 2, a domainname verification sub-module 3, a DNSinformation acquisition sub-module 4, a whoisinformation acquisition sub-module 5, and a secondary domainname acquisition sub-system 6. Thedatabase 1 comprises a main domain name seed, a DNS record library, a whois registration information white list, a whois registration information black list, a target unit information dictionary, a domain name prefix dictionary, a domain name suffix dictionary, a top level domain dictionary, a second level domain dictionary and a domain name black list. The domainname expansion sub-module 2 comprises a domainname obtaining module 21 based on the internet basic resource relation, a domainname obtaining sub-module 22 based on the web page link and a main domainname constructing sub-module 23; the domainname verification submodule 3 includeswhois information verification 31 andDNS information verification 32.
The domainname obtaining module 21 based on the internet basic resource relationship includes obtaining an IP set of a main domain name, obtaining a certificate by requesting a 443 port of the IP, and further obtaining a certificate service domain name set as a main domain name set to be tested. The domainname obtaining sub-module 22 includes obtaining HTML documents of a domain name website, extracting a web link, extracting a main domain name, and further constructing a main domain name set to be tested. The main domainname constructing sub-module 23 constructs a main domain name set to be tested by combining a prefix, a suffix and a prefix and suffix.
Thewhois information validator 31 includes registrant validator, registered telephone validator, registered mailbox validator, and registered organization validator. TheDNS information validation 32 comprises CNAME record validation, NS record validation, MX record validation and source host and mailbox validation in SOA; CNAME is whether the direct verification result is a main domain name or not, MX and NS record verification is whether the result suffix is a main domain name or not, source host verification in SOA record is actually verification of NS, and contact mailbox verification in SOA record is verified according to a history white list.
Example (b):
taking the business bank behavior example:
step 1: inquiring a Chinese industrial and commercial Bank limited company in an ICP filing website to obtain a main domain name seed, performing whois reverse check on the obtained main domain name seed in a station leader to expand the main domain name seed, and storing invalid information in a reverse check option in a blacklist.
Step 2: and solving a secondary domain name from the main domain name seed through a secondary domain name acquisition system, wherein the main domain name seed set and the secondary domain name seed set form a domain name seed set.
And step 3: and acquiring a white list of whois information of the domain name seeds. The white list was constructed as follows:
1. constructing registration information exposed in whois information of the seed domain name, wherein the registration information comprises a registrant, a registration mailbox, a registration telephone and a registration organization name;
2. the method is constructed by the full name, short name, English, pinyin and traditional Chinese of a target unit.
Therefore, there are industry and Commercial Bank of China, and China Industrial Bank of China, etc. in the whois white list that we build for the industry registrars, and the registration mailboxes contain the strings of icbc.
And 4, step 4: and acquiring a white list of DNS records of the domain name seed, wherein the white list comprises information such as SOA (service oriented architecture) and CNAME (network access management) records of the domain name. And E, putting the mailbox and the DNS in the SOA record into a white list, and putting the CNAME record into a domain name seed set.
And 5: and acquiring HTML (hypertext markup language) texts of websites corresponding to the known domain names of the workshops by using a selenium automation test tool based on the domain name seeds of the workshops. And extracting all webpage links from the HTML, acquiring the domain name from the webpage links and further acquiring the main domain name. If a new primary domain name is obtained,step 6 is performed, otherwise step 9 is performed.
Step 6: obtaining whois information of a domain name needing to be verified, verifying whether a main domain name exists or not and whether the main domain name is the main domain name of an industrial and commercial bank or not, wherein the verification process comprises the following steps:
1. and verifying whether the main domain name is registered or not, and inquiring whether registration time information exists in whois information of the domain name or not. If so, the domain name is registered.
2. Whether the domain name is the main domain name of the Chinese Industrial and commercial Bank, Inc. can be verified by the following three methods:
1) and judging through a registered mailbox in the whois information. If the registered mailbox suffix is the domain name of the Chinese Industrial and commercial Bank Ltd, the mail server assets of the Chinese Industrial and commercial Bank Ltd in use are proved to be the main domain name of the Chinese Industrial and commercial Bank Ltd. The authentication is successful, and the domain name is marked as the main domain name of the Chinese industrial and commercial Bank of China, and is added into the main domain name seed set of the industry;
2) authentication is performed by a registered phone in the whois message. The certification is successful, the certification is marked as the main domain name of the Chinese Industrial and commercial Bank resources company, the domain name is added into the main domain name seed set of the industry, and the white list sources of the registered telephones are as follows:
i. constructing a white list of registered telephones according to whois information of a main domain name seed set;
ii. And the telephone white list is constructed by the contact telephone information extracted from the webpage content of the seed domain name set.
3) Authentication is performed by registrants and registrars in the whois registry. If the verification is successful, the domain name is marked as the main domain name of the Chinese industrial and commercial Bank of China, and the domain name is added into the main domain name seed set of the industry. The specific process is as follows:
i. and verifying the registrant. Whether the registrant field contains: the complete name, short name, traditional Chinese, Chinese pinyin and English of the work bank and the name information of the registrant obtained by the query of the work bank seed domain name whois;
ii. And registering and organizing verification. Whether the registration organization field contains: the full name, simplified body, traditional body, Chinese phonetic alphabet and English of the industry and the trade.
And 7: acquiring a DNS record of a domain name needing to be verified, and verifying through the following four aspects:
1. acquiring a CNAME record of a domain name to be detected, and if the CNAME record is the domain name of a target unit, considering the domain name as a main domain name of a Chinese industrial and commercial Bank limited company;
2. acquiring NS records of domain names to be detected, and if the acquired host name suffix of the authoritative DNS is a main domain name in a main domain name seed set of a business bank, considering the domain name as the main domain name of the Chinese industrial and commercial bank corporation;
3. acquiring MX records of a domain name to be detected, and if the acquired host name suffix of the mailbox server is a main domain name in a main domain name seed set of a business bank, considering the domain name as the main domain name of a Chinese industrial and commercial banking company;
4. acquiring SOA records of a domain name to be detected, checking whether a mailbox and a DNS are in a white list, if so, proving that a host and an administrator are workers of a worker, and considering that the domain name is a main domain name of a Chinese industrial and commercial Bank limited company;
if one of the four aspects is met, the verification is successful, the domain name is added into the main domain name seed, and the step 9 is executed; otherwise, executing step 8, and verifying through the filing information.
And 8: and acquiring the filing information (ICP filing and public security filing) of the main domain name to be verified in thestep 5, and verifying whether the main domain name is the main domain name of the target unit. If the verification is successful, executing step 9; and (5) if the verification fails, storing the main domain name, the domain name and the link for future reference, and executing the step 9.
And step 9: a prefix dictionary P is constructed. The prefix dictionary P includes the following aspects:
i. extracting a main domain name seed prefix as a prefix dictionary P1, e.g., an icbc join prefix dictionary in icbc.
ii. Collecting the simplified Chinese Pinyin, traditional Chinese AND English writing OF Chinese Industrial AND COMMERCIAL Bank, Inc., as the domain name prefix dictionary P2, such as ICBC, industry AND COMMERCIAL Bank, INDUSTRIAL AND COMMERCIAL BANK OF CHINA, INDUSTRIAL AND COMMERCIAL Bank OF China, LTD, INDUSTRIAL AND COMMERCIAL Bank OF China Co., Ltd;
iii, prefix dictionary P = P1+ P2.
Step 10: a suffix dictionary L is constructed. The suffix dictionary L includes the following aspects.
a. The top level domain serves as a suffix dictionary. A top level domain dictionary L1 is constructed by a top level domain published by IANA (The Internet Assigned Numbers Authority) at The website https:// www.iana.org/domains/root/db/which comprises 1581 suffixes in total;
b. constructing a domain name suffix dictionary L2, such as the suffix of icbc.com.cn, from the seed domain name set;
c. the secondary domain constructs a domain name suffix dictionary L3;
.com.cn、.net.cn、.gov.cn、.org.cn
d. then L = L1+ L2+ L3;
step 11: and constructing the main domain name. By the domain name prefix dictionary P and domain name suffix dictionary L constructed by the domain name, a new main domain name can be created, for example, the known icbc.com.cn is the domain name of the Chinese industrial and commercial Bank corporation, and the prefix of the icbc in the domain name is extracted and combined with any suffix such as asia to create the new main domain name icbc. Creating a new main domain name to form a main domain name set DS1 to be detected;
step 12: performing DNS query on a seed domain name of a Chinese industrial and commercial Bank (GmbH) company to obtain a corresponding IP set, requesting to obtain a corresponding certificate for a 443 port of each IP, and obtaining a main domain name set (DS 2) to be detected through a domain name served by the certificate; step 13 is performed.
Step 13: judging whether the domain name in the main domain name set S = DS1+ DS2 to be tested belongs to a target unit through whois verification and DNS verification, wherein the verification process is similar to thesteps 6 and 7, and adding a main domain name seed set after the verification is successful; and if the verification fails, discarding the card. Step 14 is performed.
Step 14: acquiring the whois information of the domain name in the main domain name set to be verified in the step 13, comparing the registration information of the domain name to be verified with the information in the domain name whois information white list established in thestep 3, and verifying whether the domain name belongs to a target unit; if the verification is successful, the domain name is the main domain name of the target unit, the domain name is added into the main domain name seed set, and the step 16 is executed on the newly added domain name; otherwise, executing step 15;
step 15: obtaining DNS information of domain names in the main domain name set to be verified in the step 13, if the CNAME is the domain name in the main domain name seed, or suffixes of host names recorded by the NS and the MX are the domain names in the main domain name seed, or the SOA record can be matched with the DNS information white list established in thestep 4, considering the domain names as the main domain names of the target unit, and adding the domain names into the main domain name seed set; step 16 is executed;
step 16, if the new main domain name exists in the main domain name seed set, executingstep 2; otherwise, the extension process ends.
The above description is only for the purpose of illustrating preferred embodiments of the present invention and is not to be construed as limiting the present invention, and it is apparent to those skilled in the art that various modifications and variations can be made in the present invention. All changes, equivalents, modifications and the like which come within the scope of the invention as defined by the appended claims are intended to be embraced therein.