Movatterモバイル変換


[0]ホーム

URL:


CN114036364B - Method, apparatus, device, medium, and system for identifying crawlers - Google Patents

Method, apparatus, device, medium, and system for identifying crawlers
Download PDF

Info

Publication number
CN114036364B
CN114036364BCN202111316197.6ACN202111316197ACN114036364BCN 114036364 BCN114036364 BCN 114036364BCN 202111316197 ACN202111316197 ACN 202111316197ACN 114036364 BCN114036364 BCN 114036364B
Authority
CN
China
Prior art keywords
crawler
request information
identification
target
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111316197.6A
Other languages
Chinese (zh)
Other versions
CN114036364A (en
Inventor
何永玄
薛志方
谭瑞兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co LtdfiledCriticalBeijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111316197.6ApriorityCriticalpatent/CN114036364B/en
Publication of CN114036364ApublicationCriticalpatent/CN114036364A/en
Application grantedgrantedCritical
Publication of CN114036364BpublicationCriticalpatent/CN114036364B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本公开提供了用于识别爬虫的方法、装置、设备、介质和产品,涉及计算机技术领域,具体为信息安全技术领域。具体实现方案为:获取请求访问页面数据的请求信息;按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定针对请求信息的目标反爬虫操作;基于目标反爬虫操作,对请求信息进行爬虫识别,得到识别结果;响应于确定识别结果指示请求信息为爬虫,将识别结果确定为目标爬虫识别结果。本实现方式可以提高网页版本的小程序的数据安全。

Figure 202111316197

The present disclosure provides a method, apparatus, device, medium and product for identifying a crawler, and relates to the field of computer technology, in particular to the field of information security technology. The specific implementation scheme is: obtaining the request information for requesting access to page data; determining the target anti-crawler operation for the request information from the preset anti-crawler operation set according to the preset crawler identification sequence; Perform crawler identification to obtain an identification result; in response to determining that the identification result indicates that the requested information is a crawler, the identification result is determined as the identification result of the target crawler. This implementation manner can improve the data security of the applet of the webpage version.

Figure 202111316197

Description

Translated fromChinese
用于识别爬虫的方法、装置、设备、介质和系统Method, apparatus, apparatus, medium and system for identifying crawlers

技术领域technical field

本公开涉及计算机技术领域,具体为信息安全技术领域。The present disclosure relates to the field of computer technology, in particular to the field of information security technology.

背景技术Background technique

爬虫是一种按照一定的规则,自动地抓取万维网信息的程序或者脚本。网页版本的小程序中通常具有一些公开数据供用户浏览,而爬虫攻击会导致这些公开数据被恶意使用。A crawler is a program or script that automatically crawls information on the World Wide Web according to certain rules. The web version of the applet usually has some public data for users to browse, and the crawler attack will cause these public data to be used maliciously.

然而,对于网页版本的小程序而言,并未配置相应的反爬虫手段,从而导致网页版本的小程序中的公开数据存在着一定的安全隐患。However, for the applet of the webpage version, corresponding anti-crawling means are not configured, so that there are certain security risks in the public data in the applet of the webpage version.

发明内容SUMMARY OF THE INVENTION

本公开提供了一种用于识别爬虫的方法、装置、设备、介质和产品。The present disclosure provides a method, apparatus, device, medium and product for identifying crawlers.

根据本公开的一方面,提供了一种用于识别爬虫的方法,包括:获取请求访问页面数据的请求信息;按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定针对请求信息的目标反爬虫操作;基于目标反爬虫操作,对请求信息进行爬虫识别,得到识别结果;响应于确定识别结果指示请求信息为爬虫,将识别结果确定为目标爬虫识别结果。According to an aspect of the present disclosure, there is provided a method for identifying a crawler, including: obtaining request information for requesting access to page data; Based on the target anti-crawler operation, perform crawler identification on the request information to obtain the identification result; in response to determining that the identification result indicates that the request information is a crawler, the identification result is determined as the target crawler identification result.

根据本公开的另一方面,提供了一种用于识别爬虫的装置,包括:信息获取单元,被配置成获取请求访问页面数据的请求信息;操作确定单元,被配置成按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定针对请求信息的目标反爬虫操作;爬虫识别单元,被配置成基于目标反爬虫操作,对请求信息进行爬虫识别,得到识别结果;结果确定单元,被配置成响应于确定识别结果指示请求信息为爬虫,将识别结果确定为目标爬虫识别结果。According to another aspect of the present disclosure, there is provided an apparatus for recognizing a crawler, comprising: an information acquisition unit configured to acquire request information for requesting access to page data; an operation determination unit configured to identify a crawler according to a preset sequence, determine the target anti-crawler operation for the request information from the preset anti-crawler operation set; the crawler identification unit is configured to perform crawler identification on the request information based on the target anti-crawler operation, and obtain the identification result; the result determination unit, which is is configured to, in response to determining that the identification result indicates that the requested information is a crawler, determine the identification result as a target crawler identification result.

根据本公开的另一方面,提供了一种电子设备,包括:一个或多个处理器;存储器,用于存储一个或多个程序;当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上任意一项用于识别爬虫的方法。According to another aspect of the present disclosure, there is provided an electronic device, comprising: one or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors, One or more processors are caused to implement any of the above methods for identifying crawlers.

根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,计算机指令用于使计算机执行如上任意一项用于识别爬虫的方法。According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to perform any one of the above methods for recognizing a crawler.

根据本公开的另一方面,提供了一种计算机程序系统,包括计算机程序,计算机程序在被处理器执行时实现如上任意一项用于识别爬虫的方法。According to another aspect of the present disclosure, there is provided a computer program system, comprising a computer program, which when executed by a processor implements any one of the above methods for recognizing a crawler.

根据本公开的技术,提供一种用于识别爬虫的方法,能够提高网页版本的小程序的数据安全。According to the technology of the present disclosure, a method for identifying a crawler is provided, which can improve the data security of the applet of the webpage version.

应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or critical features of embodiments of the disclosure, nor is it intended to limit the scope of the disclosure. Other features of the present disclosure will become readily understood from the following description.

附图说明Description of drawings

附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used for better understanding of the present solution, and do not constitute a limitation to the present disclosure. in:

图1是本公开的一个实施例可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure may be applied;

图2是根据本公开的用于识别爬虫的方法的一个实施例的流程图;2 is a flowchart of one embodiment of a method for identifying a crawler according to the present disclosure;

图3是根据本公开的用于识别爬虫的方法的一个应用场景的示意图;3 is a schematic diagram of an application scenario of the method for identifying a crawler according to the present disclosure;

图4是根据本公开的用于识别爬虫的方法的另一个实施例的流程图;4 is a flowchart of another embodiment of a method for identifying a crawler according to the present disclosure;

图5是根据本公开的用于识别爬虫的装置的一个实施例的结构示意图;5 is a schematic structural diagram of an embodiment of an apparatus for recognizing a crawler according to the present disclosure;

图6是用来实现本公开实施例的用于识别爬虫的方法的电子设备的框图。FIG. 6 is a block diagram of an electronic device used to implement the method for recognizing a crawler according to an embodiment of the present disclosure.

具体实施方式Detailed ways

以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted from the following description for clarity and conciseness.

需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。It should be noted that the embodiments of the present disclosure and the features of the embodiments may be combined with each other under the condition of no conflict. The present disclosure will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.

如图1所示,系统架构100可以包括终端设备101、102、103,网络104、网页小程序代理服务器105、网络106和开发者服务器107。网络104用以在终端设备101、102、103和网页小程序代理服务器105之间提供通信链路的介质,网络106用以在网页小程序代理服务器105和开发者服务器107之间提供通信链路的介质。网络104、网络106可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , thesystem architecture 100 may includeterminal devices 101 , 102 , and 103 , anetwork 104 , a webapplet proxy server 105 , anetwork 106 and adeveloper server 107 . Thenetwork 104 is used to provide a medium of communication links between theterminal devices 101, 102, 103 and the webapplet proxy server 105, and thenetwork 106 is used to provide a communication link between the webapplet proxy server 105 and thedeveloper server 107 medium.Networks 104, 106 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备101、102、103通过网络104与网页小程序代理服务器105交互,以接收或发送消息等。终端设备101、102、103可以安装有小程序客户端,用户通过运行该小程序客户端,可以获取网页小程序代理服务器105和开发者服务器107为该小程序客户端提供的相应服务。The user can use theterminal devices 101 , 102 and 103 to interact with the webapplet proxy server 105 through thenetwork 104 to receive or send messages and the like. Theterminal devices 101, 102, and 103 may be installed with an applet client. By running the applet client, the user can obtain the corresponding services provided by the webpageapplet proxy server 105 and thedeveloper server 107 for the applet client.

终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是各种电子设备,包括但不限于手机、电脑、平板等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。Theterminal devices 101, 102, and 103 may be hardware or software. When theterminal devices 101, 102, and 103 are hardware, they may be various electronic devices, including but not limited to mobile phones, computers, tablets, and the like. When theterminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (eg to provide distributed services), or as a single software or software module. There is no specific limitation here.

网页小程序代理服务器105可以是提供各种小程序代理服务的服务器,例如,网页小程序代理服务器105可以获取终端设备101、102、103发送的、上述小程序客户端对应的请求信息,并将请求信息通过网络106发送给开发者服务器107,并接收开发者服务器107返回的与请求信息对应的服务内容,以及将服务内容返回给终端设备101、102、103。The webpageapplet proxy server 105 may be a server that provides various applet proxy services. For example, the webpageapplet proxy server 105 may obtain the request information sent by theterminal devices 101, 102, and 103 and corresponding to the above-mentioned applet clients, and send the request information to the applet client. The request information is sent to thedeveloper server 107 through thenetwork 106, and the service content corresponding to the request information returned by thedeveloper server 107 is received, and the service content is returned to theterminal devices 101, 102, and 103.

并且,在网页小程序代理服务器105获取到终端设备101、102、103发送的请求信息之后,以及在将请求信息通过网络106发送给开发者服务器107之前,为了提高数据安全性,还可以按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定针对请求信息的目标反爬虫操作,并基于目标反爬虫操作对请求信息进行爬虫识别,得到识别结果。如果识别结果指示请求信息为爬虫,则将识别结果确定为目标爬虫识别结果。可选的,网页小程序代理服务器105可以根据目标爬虫识别结果指示请求信息为爬虫,对请求信息进行拦截,或者,也可以向开发者服务器107发送提示消息,以使开发者服务器107对识别为爬虫的请求信息进行相应的处理。In addition, after the webapplet proxy server 105 obtains the request information sent by theterminal devices 101, 102, and 103, and before sending the request information to thedeveloper server 107 through thenetwork 106, in order to improve data security, it is also possible to pre- Set the crawler identification sequence, determine the target anti-crawler operation for the request information from the preset anti-crawler operation set, and perform crawler identification on the request information based on the target anti-crawler operation to obtain the identification result. If the identification result indicates that the requested information is a crawler, the identification result is determined as the identification result of the target crawler. Optionally, the webapplet proxy server 105 can indicate that the request information is a crawler according to the target crawler identification result, and intercept the request information, or can also send a prompt message to thedeveloper server 107, so that thedeveloper server 107 can identify the information as a crawler. The crawler's request information is processed accordingly.

需要说明的是,网页小程序代理服务器105和开发者服务器107可以是硬件,也可以是软件。当网页小程序代理服务器105和开发者服务器107为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当网页小程序代理服务器105和开发者服务器107为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the webapplet proxy server 105 and thedeveloper server 107 may be hardware or software. When the webpageapplet proxy server 105 and thedeveloper server 107 are hardware, they may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server. When the webapplet proxy server 105 and thedeveloper server 107 are software, they may be implemented as multiple software or software modules (for example, for providing distributed services), or may be implemented as a single software or software module. There is no specific limitation here.

开发者服务器107可以是提供各种服务的服务器,例如,开发者服务器107可以接收网页小程序代理服务器105基于网络106发送的请求信息,并对请求信息进行响应。Thedeveloper server 107 may be a server that provides various services. For example, thedeveloper server 107 may receive request information sent by the webapplet proxy server 105 based on thenetwork 106 and respond to the request information.

需要说明的是,本公开实施例所提供的用于识别爬虫的方法通常由网页小程序代理服务器105执行,用于识别爬虫的装置通常设置于网页小程序代理服务器105中。It should be noted that the method for identifying a crawler provided by the embodiments of the present disclosure is generally executed by the webapplet proxy server 105 , and the apparatus for identifying a crawler is usually set in the webpageapplet proxy server 105 .

应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

继续参考图2,示出了根据本公开的用于识别爬虫的方法的一个实施例的流程200。本实施例的用于识别爬虫的方法,包括以下步骤:With continued reference to FIG. 2 , aflow 200 of one embodiment of a method for identifying crawlers according to the present disclosure is shown. The method for identifying a crawler of this embodiment includes the following steps:

步骤201,获取请求访问页面数据的请求信息。Step 201: Obtain request information for requesting access to page data.

在本实施例中,执行主体(如图1所示的网页小程序代理服务器105等电子设备)可以获取请求访问页面数据的请求信息,并对请求信息进行校验,识别请求信息为正常用户发出的请求,或者为爬虫发出的请求,从而实现对爬虫的拦截,保证页面数据的数据安全。这里的页面数据可以为网页小程序对应的页面数据,也可以为其他应用对应的页面数据等,本实施例对此不做限定。其中,网页小程序指的是H5(一系列制作网页互动效果的技术集合)版本的小程序。In this embodiment, the execution subject (such as the electronic device such as the webapplet proxy server 105 shown in FIG. 1 ) can obtain the request information for requesting access to the page data, verify the request information, and identify that the request information is sent by a normal user requests, or requests issued by crawlers, so as to achieve interception of crawlers and ensure data security of page data. The page data here may be page data corresponding to the webpage applet, or page data corresponding to other applications, etc., which is not limited in this embodiment. Among them, the webpage applet refers to the H5 (a series of technology collections for making webpage interactive effects) version of the applet.

步骤202,按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定针对请求信息的目标反爬虫操作。Step 202 , according to a preset crawler identification sequence, determine a target anti-crawler operation for the request information from a preset anti-crawler operation set.

在本实施例中,预设的爬虫识别顺序是针对预设的反爬虫操作集合中的各个反爬虫操作的识别顺序,例如,如果预设的反爬虫操作集合中的反爬虫操作包括反爬虫操作A、反爬虫操作B、反爬虫操作C以及反爬虫操作D,预设的爬虫识别顺序可以为先利用反爬虫操作D进行爬虫识别,再利用反爬虫操作C进行爬虫识别,再利用反爬虫操作B进行爬虫识别,再利用反爬虫操作A进行爬虫识别。In this embodiment, the preset crawler identification sequence is the identification sequence for each anti-crawler operation in the preset anti-crawler operation set. For example, if the anti-crawler operations in the preset anti-crawler operation set include anti-crawler operations A. Anti-crawler operation B, anti-crawler operation C, and anti-crawler operation D. The preset crawler identification sequence can be: firstly use anti-crawler operation D for crawler identification, then use anti-crawler operation C for crawler identification, and then use anti-crawler operation B performs crawler identification, and then uses the anti-crawler operation A for crawler identification.

其中,预设的反爬虫操作集合中的各个反爬虫操作可以为应对不同级别的爬虫所采取的反爬虫操作。执行主体可以预先建立各个反爬虫操作与相应的爬虫场景之间的对应关系,之后,按照各个爬虫场景的级别,对各个反爬虫操作进行排序,得到上述预设的爬虫识别顺序。其中,各个爬虫场景的级别可以基于爬虫场景的场景特征的复杂程度确定,复杂程度越高,爬虫场景的级别越高。例如,这里可以按照爬虫场景的级别由低至高的顺序,对各个反爬虫操作进行排序,得到上述预设的爬虫识别顺序。通过按照这种预设的爬虫识别顺序选取目标反爬虫操作进行爬虫识别,能够逐级增强爬虫防护,安全性更高。Wherein, each anti-crawler operation in the preset anti-crawler operation set may be an anti-crawler operation taken to deal with crawlers of different levels. The executing subject can pre-establish the correspondence between each anti-crawling operation and the corresponding crawling scene, and then sort each anti-crawling operation according to the level of each crawling scene to obtain the above-mentioned preset crawler identification sequence. The level of each crawler scene may be determined based on the complexity of the scene features of the crawler scene, and the higher the complexity, the higher the level of the crawler scene. For example, each anti-crawler operation can be sorted in order of the level of the crawler scene from low to high, so as to obtain the above-mentioned preset crawler identification order. By selecting the target anti-crawler operation for crawler identification according to this preset crawler identification sequence, crawler protection can be enhanced step by step, and the security is higher.

并且,执行主体可以按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定目标反爬虫操作,并利用目标反爬虫操作对请求信息进行爬虫识别。可选的,执行主体可以按照爬虫识别顺序,从预设的反爬虫操作集合中确定出初次进行爬虫识别的反爬虫操作,作为目标反爬虫操作,如果目标反爬虫操作识别结果指示请求信息为爬虫,则不再重新确定目标反爬虫操作。如果目标反爬虫操作识别结果指示请求信息为用户,则进一步的重新从反爬虫操作集合中确定出第二次进行爬虫识别的反爬虫操作,作为目标反爬虫操作。执行主体可以重复确定目标反爬虫操作、对请求信息进行爬虫识别得到识别结果的过程,直至判定出请求信息为爬虫,或者直至取完反爬虫操作集合中的各个反爬虫操作判定出请求信息为用户。In addition, the executing subject may determine a target anti-crawler operation from a preset anti-crawler operation set according to a preset crawler identification sequence, and use the target anti-crawler operation to perform crawler identification on the request information. Optionally, the execution subject may determine the anti-crawler operation for initial crawler identification from the preset anti-crawler operation set according to the crawler identification sequence, as the target anti-crawler operation, if the target anti-crawler operation recognition result indicates that the request information is a crawler. , the target anti-crawler operation will not be re-determined. If the identification result of the target anti-crawling operation indicates that the request information is the user, the anti-crawling operation for the second crawling identification is further determined from the anti-crawling operation set as the target anti-crawling operation. The executing subject can repeat the process of determining the target anti-crawler operation, and performing crawler identification on the request information to obtain the identification result, until it is determined that the request information is a crawler, or until each anti-crawler operation in the anti-crawler operation set is finished and it is determined that the request information is the user. .

在本实施例的一些可选的实现方式中,按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定针对请求信息的目标反爬虫操作可以包括:对请求信息的请求参数进行分析,确定请求信息对应的场景特征;基于请求信息对应的场景特征与爬虫场景的特征之间的相似度以及预设的爬虫识别顺序,从反爬虫操作集合中确定与请求信息对应的目标反爬虫操作。其中,预设的爬虫识别顺序可以为基于爬虫场景的特征确定的反爬虫操作的顺序,执行主体可以确定与请求信息对应的场景特征的相似度最高的爬虫场景的特征,并将该爬虫场景的特征按照预设的爬虫识别顺序对应的反爬虫操作确定为目标反爬虫操作。通过实施这种可选的实施方式,可以针对请求信息的特征确定与之最相匹配的目标反爬虫操作,基于这种目标反爬虫操作进行爬虫识别,能够提高爬虫识别的精准度。In some optional implementations of this embodiment, according to a preset crawler identification sequence, determining a target anti-crawler operation for the request information from a preset anti-crawler operation set may include: analyzing request parameters of the request information , determine the scene characteristics corresponding to the request information; based on the similarity between the scene characteristics corresponding to the request information and the characteristics of the crawler scene and the preset crawler identification sequence, determine the target anti-crawler operation corresponding to the request information from the anti-crawler operation set . The preset crawler identification sequence may be the sequence of anti-crawler operations determined based on the characteristics of the crawler scene, and the execution subject may determine the characteristics of the crawler scene with the highest similarity to the scene characteristics corresponding to the request information, and use the crawler scene The anti-crawler operation corresponding to the feature according to the preset crawler identification sequence is determined as the target anti-crawler operation. By implementing this optional implementation, a target anti-crawler operation that best matches the characteristics of the request information can be determined, and crawler identification based on this target anti-crawler operation can improve the accuracy of crawler identification.

步骤203,基于目标反爬虫操作,对请求信息进行爬虫识别,得到识别结果。Step 203 , based on the target anti-crawler operation, perform crawler identification on the request information to obtain an identification result.

在本实施例中,执行主体在确定出目标反爬虫操作之后,可以按照目标反爬虫操作对请求信息进行爬虫识别,得到识别结果。其中,识别结果可以指示请求信息为用户或者指示请求信息为爬虫。其中,不同的反爬虫操作,对应着不同的爬虫识别手段,通过不同的爬虫识别手段对请求信息进行爬虫识别,能够进一步增强数据安全。In this embodiment, after determining the target anti-crawler operation, the execution body may perform crawler identification on the request information according to the target anti-crawler operation, and obtain the identification result. The identification result may indicate that the request information is a user or that the request information is a crawler. Among them, different anti-crawler operations correspond to different crawler identification methods, and crawler identification of request information through different crawler identification methods can further enhance data security.

步骤204,响应于确定识别结果指示请求信息为爬虫,将识别结果确定为目标爬虫识别结果。Step 204, in response to determining that the identification result indicates that the requested information is a crawler, determine the identification result as the identification result of the target crawler.

在本实施例中,如果识别结果指示请求信息为爬虫,则直接将识别结果确定为最终的目标爬虫识别结果。在目标爬虫识别结果指示请求信息为爬虫的情况下,执行主体可以将该请求信息进行拦截处理,避免爬虫的异常访问。如果识别结果指示请求信息为用户,执行主体可以重复执行步骤202至步骤204,直至得到指示请求信息为爬虫的目标爬虫识别结果,或者完成反爬虫操作集合中所有反爬虫操作的遍历,得到指示请求信息为用户的目标爬虫识别结果。In this embodiment, if the identification result indicates that the request information is a crawler, the identification result is directly determined as the final target crawler identification result. When the identification result of the target crawler indicates that the request information is a crawler, the execution subject can intercept the request information to avoid abnormal access of the crawler. If the identification result indicates that the request information is the user, the execution body can repeatsteps 202 to 204 until the target crawler identification result indicating that the request information is a crawler is obtained, or the traversal of all anti-crawler operations in the anti-crawler operation set is completed, and the instruction request is obtained The information is the user's target crawler identification result.

继续参见图3,其示出了根据本公开的用于识别爬虫的方法的一个应用场景的示意图。在图3的应用场景中,执行主体可以执行步骤301,在用户或者爬虫请求访问网页(如小程序网页)的情况下,获取请求信息,并按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定出目标反爬虫操作302,利用目标反爬虫操作302对请求信息进行爬虫识别,得到识别结果303。如果识别结果303指示请求信息为爬虫,则将识别结果303确定为目标爬虫识别结果304。如果识别结果303指示请求信息为用户,则重新从按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定出目标反爬虫操作302,直至得到识别结果为爬虫的目标爬虫识别结果,或者预设的反爬虫操作集合中的每个反爬虫操作均已遍历完成、得到识别结果为用户的目标爬虫识别结果。Continue to refer to FIG. 3 , which shows a schematic diagram of an application scenario of the method for identifying a crawler according to the present disclosure. In the application scenario shown in FIG. 3 , the execution body may executestep 301 to obtain request information when a user or a crawler requests to access a webpage (such as a small program webpage), and according to the preset crawler identification sequence, from the preset response The targetanti-crawler operation 302 is determined in the crawler operation set, and the targetanti-crawler operation 302 is used to perform crawler identification on the request information, and anidentification result 303 is obtained. If theidentification result 303 indicates that the requested information is a crawler, theidentification result 303 is determined as the targetcrawler identification result 304 . If theidentification result 303 indicates that the request information is the user, then according to the preset crawler identification sequence, the targetanti-crawler operation 302 is determined from the preset anti-crawler operation set, until the target crawler identification result that the identification result is a crawler is obtained, Or, each anti-crawler operation in the preset anti-crawler operation set has been traversed, and the target crawler identification result obtained as the user's identification result is obtained.

本公开上述实施例提供的用于识别爬虫的方法,能够预设有爬虫识别顺序和反爬虫操作集合,通过按照爬虫识别顺序在反爬虫操作集合中确定目标反爬虫操作,基于目标反爬虫操作对请求信息进行爬虫识别,能够实现对页面数据(如网页版本的小程序的页面数据)的安全防护,从而提高网页版本的小程序的数据安全。The method for identifying crawlers provided by the above embodiments of the present disclosure can preset a crawler identification sequence and an anti-crawler operation set. The request information is identified by the crawler, which can realize the security protection of page data (such as the page data of the applet in the webpage version), thereby improving the data security of the applet in the webpage version.

继续参见图4,其示出了根据本公开的用于识别爬虫的方法的另一个实施例的流程400。如图4所示,本实施例的用于识别爬虫的方法可以包括以下步骤:Continuing to refer to FIG. 4 , aflow 400 of another embodiment of the method for identifying crawlers according to the present disclosure is shown. As shown in FIG. 4 , the method for identifying a crawler in this embodiment may include the following steps:

步骤401,获取请求访问页面数据的请求信息,请求信息用于请求访问网页小程序的页面数据。Step 401: Obtain request information for requesting access to page data, where the request information is used for requesting access to page data of the webpage applet.

在本实施例中,请求信息用于访问网页小程序的页面数据。其中,网页小程序指的是H5(一系列制作网页互动效果的技术集合)版本的小程序。对于步骤401的详细描述请参照对于步骤201的详细描述,在此不再赘述。In this embodiment, the request information is used to access page data of the webpage applet. Among them, the webpage applet refers to the H5 (a series of technology collections for making webpage interactive effects) version of the applet. For the detailed description ofstep 401, please refer to the detailed description ofstep 201, which is not repeated here.

在本实施例的一些可选的实现方式中,还可以执行以下步骤:确定请求信息对应的加密网络地址;确定加密网络地址中的第一加密索引和第二加密索引;基于第一加密索引和第二加密索引,对加密网络地址进行解密,得到解密网络地址;基于解密网络地址,进行网络访问。In some optional implementations of this embodiment, the following steps may also be performed: determine an encrypted network address corresponding to the request information; determine a first encrypted index and a second encrypted index in the encrypted network address; based on the first encrypted index and The second encryption index decrypts the encrypted network address to obtain the decrypted network address; and performs network access based on the decrypted network address.

在本实现方式中,为了避免爬虫攻击对第三方服务进行攻击,采用了URL(UniformResource Locator, 统一资源定位器)加密的方式,对第三方服务对应的URL进行加密。在进行网络访问时,执行主体可以先对请求信息中加密的URL进行解密,得到解密网络地址,并基于解密网络地址进行网络访问。In this implementation manner, in order to avoid the crawler attack from attacking the third-party service, a URL (Uniform Resource Locator, Uniform Resource Locator) encryption method is adopted to encrypt the URL corresponding to the third-party service. When performing network access, the execution subject may first decrypt the encrypted URL in the request information to obtain a decrypted network address, and perform network access based on the decrypted network address.

其中,用户发出的请求信息中的加密网络地址基于以下步骤确定得到:获取随机生成的两个随机数,得到上述的第一加密索引和第二加密索引;基于第一加密索引和第二加密索引,将初始网络地址划分为第一网络子地址和第二网络子地址;对于第一网络子地址中的每个字符,按照第一加密索引对应的偏移量和该字符在第一网络子地址中的位置对应的偏移量,将该字符进行偏移处理,得到偏移处理后的第一网络子地址;对于第二网络子地址中的每个字符,按照第二加密索引对应的偏移量和该字符在第二网络子地址中的位置对应的偏移量,将该字符进行偏移处理,得到偏移处理后的第二网络子地址;将偏移处理后的第一网络子地址和第二网络子地址进行拼接,得到加密后的加密网络地址。Wherein, the encrypted network address in the request information sent by the user is determined based on the following steps: obtaining two random numbers generated randomly, and obtaining the above-mentioned first encrypted index and second encrypted index; based on the first encrypted index and the second encrypted index , divide the initial network address into a first network sub-address and a second network sub-address; for each character in the first network sub-address, according to the offset corresponding to the first encryption index and the character in the first network sub-address the offset corresponding to the position in the offset corresponding to the position of the character in the second network sub-address, perform offset processing on the character to obtain the offset-processed second network sub-address; offset the offset-processed first network sub-address Splicing with the second network sub-address to obtain an encrypted encrypted network address.

举例而言,初始网络地址如果为https://api.tusij.com/v2/get-category

Figure DEST_PATH_IMAGE001
token=&source=baidu_app,随机生成的两个随机数为22、1,根据各个字母的排序和随机数,得到的第一加密索引为w(a向后偏移22),得到的第二加密索引为B(A向后偏移1)。之后,将第一加密索引和第二加密索引组成“/wB”,将组成的“/wB”插入初始网络地址,将初始网络地址划分为第一网络子地址和第二网络子地址。例如,插入“/wB”的初始网络地址为“https://api.tusij.com/wB/v2/get-category
Figure 383979DEST_PATH_IMAGE001
token=&source=baidu_app”,此时,第一网络子地址为“https://api.tusij.com”,第二网络子地址为“/v2/get-category
Figure 268759DEST_PATH_IMAGE001
token=&source=baidu_app”。对于第一网络子地址中的每个字符,根据第一加密索引对应的偏移量(22)以及字符在第一网络子地址中的位置对应的偏移量(例如,a对应着8),对各个字符进行偏移处理,得到偏移处理后的第一网络子地址“dqros://euo.bdctv.qdc”。对于第二网络子地址中的每个字符,根据第二加密索引对应的偏移量(1)以及字符在第二网络子地址中的位置对应的偏移量,对各个字符进行偏移处理,得到偏移处理后的第二网络子地址“/x2/lka-lkeqtcgo
Figure 598109DEST_PATH_IMAGE001
lhezj=&rovtfi=hhqme_mcd”,将第一网络子地址和第二网络子地址进行拼接,最终得到的加密后的加密网络地址为“dqros://euo.bdctv.qdc/wB/x2/lka-lkeqtcgo
Figure 124905DEST_PATH_IMAGE001
lhezj=&rovtfi=hhqme_mcd”。For example, if the initial network address is https://api.tusij.com/v2/get-category
Figure DEST_PATH_IMAGE001
token=&source=baidu_app, the two randomly generated random numbers are 22 and 1. According to the sorting of each letter and the random number, the obtained first encrypted index is w (a is shifted backward by 22), and the obtained second encrypted index is B (A is offset back by 1). After that, the first encrypted index and the second encrypted index are formed into "/wB", the formed "/wB" is inserted into the initial network address, and the initial network address is divided into the first network sub-address and the second network sub-address. For example, the initial web address to insert "/wB" is "https://api.tusij.com/wB/v2/get-category
Figure 383979DEST_PATH_IMAGE001
token=&source=baidu_app", at this time, the first network sub-address is "https://api.tusij.com", and the second network sub-address is "/v2/get-category
Figure 268759DEST_PATH_IMAGE001
token=&source=baidu_app". For each character in the first network subaddress, according to the offset corresponding to the first encryption index (22) and the offset corresponding to the position of the character in the first network subaddress (for example, , a corresponds to 8), perform offset processing on each character to obtain the offset-processed first network sub-address "dqros://euo.bdctv.qdc". For each character in the second network sub-address, According to the offset (1) corresponding to the second encryption index and the offset corresponding to the position of the character in the second network sub-address, perform offset processing on each character to obtain the second network sub-address after offset processing" /x2/lka-lkeqtcgo
Figure 598109DEST_PATH_IMAGE001
lhezj=&rovtfi=hhqme_mcd", splicing the first network sub-address and the second network sub-address, and the encrypted encrypted network address finally obtained is "dqros://euo.bdctv.qdc/wB/x2/lka-lkeqtcgo
Figure 124905DEST_PATH_IMAGE001
lhezj=&rovtfi=hhqme_mcd”.

进一步,在对URL进行解密的情况下,执行主体可以确定请求信息对应的加密网络地址,再对加密网络地址进行分析,确定加密网络地址中的第一加密索引和第二加密索引。通过第一加密索引和第二加密索引,可以对加密网络地址进行相对应的偏移处理,得到解密网络地址。执行主体可以通过解密网络地址去访问第三方服务。Further, in the case of decrypting the URL, the execution body can determine the encrypted network address corresponding to the request information, and then analyze the encrypted network address to determine the first encrypted index and the second encrypted index in the encrypted network address. Through the first encryption index and the second encryption index, the encrypted network address can be subjected to corresponding offset processing to obtain the decrypted network address. The execution subject can access third-party services by decrypting the network address.

步骤402,按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定针对请求信息的目标反爬虫操作,预设的反爬虫操作集合中的反爬虫操作至少包括以下一项:端特征识别操作、令牌识别操作、人机特征识别操作、数据分析识别操作和签名识别操作。Step 402, according to the preset crawler identification sequence, determine the target anti-crawler operation for the request information from the preset anti-crawler operation set, and the anti-crawler operation in the preset anti-crawler operation set includes at least one of the following: Identification operation, token identification operation, human-machine feature identification operation, data analysis identification operation and signature identification operation.

在本实施例中,端特征识别操作可以为针对伪造端特征的爬虫场景的识别操作,其中,端特征指的是请求信息中的特定参数,特定参数可以包括但不限于UA(User Agent,终端的环境信息)、referer(上一个页面的地址)、header(标头)等。具体的,端特征识别操作可以对请求信息中的特定参数进行分析,校验特定参数是否为用户对应的参数,如果不为用户对应的特定参数,则确定请求信息为爬虫,如果为用户对应的特定参数,则确定请求信息为用户。In this embodiment, the terminal feature identification operation may be an identification operation for a crawler scenario with fake terminal characteristics, wherein the terminal characteristic refers to a specific parameter in the request information, and the specific parameter may include but is not limited to a UA (User Agent, terminal) environment information), referer (address of the previous page), header (header), etc. Specifically, the terminal feature identification operation can analyze the specific parameters in the request information, and verify whether the specific parameters are the parameters corresponding to the user. If the specific parameters are not corresponding to the user, the request information is determined to be a crawler. specific parameters, the request information is determined to be the user.

进一步的,令牌识别操作可以为针对伪造端特征请求重放的爬虫场景的识别操作。其中,请求重放指的是不断重试同一个请求的行为。具体的,为了应对该爬虫场景,请求信息中可以携带加密后的令牌信息,执行主体可以通过对令牌信息进行解密,来实现令牌校验。如果令牌校验通过,则确定请求信息为用户;如果令牌校验未通过,则确定请求信息为爬虫。Further, the token identification operation may be an identification operation for the crawler scene that requests the replay of the forged terminal feature. Among them, request replay refers to the behavior of continuously retrying the same request. Specifically, in order to cope with the crawler scenario, the encrypted token information may be carried in the request information, and the execution subject may implement token verification by decrypting the token information. If the token verification passes, the request information is determined to be the user; if the token verification fails, the request information is determined to be a crawler.

进一步的,人机特征识别操作可以为针对使用真实浏览器实现自动化脚本的爬虫场景的识别操作。具体的,执行主体可以通过对请求信息对应的设备身份进行检测、对请求信息对应的网际互连协议进行检测、对请求信息对应的用户代理进行检测等方式,得到识别结果。例如,如果设备身份指示为爬虫对应的设备,则得到识别结果是请求信息为爬虫。Further, the human-machine feature identification operation may be an identification operation for a crawler scenario that uses a real browser to implement automated scripts. Specifically, the execution subject can obtain the identification result by detecting the device identity corresponding to the request information, detecting the Internet interconnection protocol corresponding to the request information, and detecting the user agent corresponding to the request information. For example, if the device identity is indicated as a device corresponding to a crawler, the identification result is that the requested information is a crawler.

进一步的,数据分析识别操作可以为针对上述的伪造端特征、伪造端特征请求重放、使用真实浏览器实现自动化脚本等爬虫场景均无法识别的爬虫场景,进行应对的识别操作。具体的,执行主体可以基于历史爬虫数据、当前请求信息的场景特征等信息,进行统计学处理,对请求信息进行打分,基于打分结果判定请求信息为用户或者爬虫。Further, the data analysis and identification operation may be a corresponding identification operation for a crawler scene that cannot be recognized by the crawler scene such as the above-mentioned fake terminal features, request replay of fake terminal characteristics, and automatic script implementation using a real browser. Specifically, the execution subject may perform statistical processing based on historical crawler data, scene characteristics of the current request information, and other information, score the request information, and determine whether the request information is a user or a crawler based on the scoring result.

进一步的,签名识别操作可以为针对绕过执行主体(网页小程序代理服务器),直接对开发者服务器进行攻击的爬虫场景的识别操作。具体的,为了应对该爬虫场景,执行主体和开发者服务器之间可以使用同一种签名生成算法。在对请求信息中的签名生成算法进行校验时,即使绕过执行主体,开发者服务器端也可以基于同样的方式进行校验,并能够将校验结果返回给执行主体。如果校验结果指示签名校验通过,则确定识别结果是请求信息为用户,如果校验结果指示签名校验未通过,则确定识别结果是请求信息为爬虫。并且,NA(Native App,一种基于智能手机本地操作系统如iOS、Android、WP并使用原生程式编写运行的第三方应用程序,也叫本地app。一般使用的开发语言为JAVA、C++、Objective-C)端小程序和网页小程序代理服务器均会向开发者服务器发送请求信息,对此,NA端小程序和网页小程序代理服务器之间可以使用上述的同一种签名生成算法。对于开发者服务器而言,可以接收来自NA端小程序和网页小程序代理服务器这两种来源的签名,并对签名进行校验,将校验结果返回给NA端小程序和网页小程序代理服务器。Further, the signature identification operation may be an identification operation for a crawler scenario that bypasses the execution body (web applet proxy server) and directly attacks the developer server. Specifically, in order to cope with this crawler scenario, the same signature generation algorithm can be used between the execution body and the developer server. When verifying the signature generation algorithm in the request information, even if the execution body is bypassed, the developer's server can perform the verification based on the same method, and can return the verification result to the execution body. If the verification result indicates that the signature verification has passed, it is determined that the identification result is that the request information is the user, and if the verification result indicates that the signature verification has not passed, it is determined that the identification result is that the request information is a crawler. In addition, NA (Native App, a third-party application based on the local operating system of smartphones such as iOS, Android, WP and written and run using native programs, also called local app. The commonly used development languages are JAVA, C++, Objective- C) Both the terminal applet and the web applet proxy server will send request information to the developer server. For this, the same signature generation algorithm mentioned above can be used between the NA terminal applet and the webpage applet proxy server. For the developer server, it can receive signatures from the NA-side applet and the web applet proxy server, verify the signature, and return the verification result to the NA-side applet and the webpage applet proxy server .

并且,按照预设的爬虫识别顺序对上述的反爬虫操作集合中的各个反爬虫操作进行排序,得到的排序结果可以为:端特征识别操作、令牌识别操作、人机特征识别操作、数据分析识别操作和签名识别操作。执行主体可以按照排序结果依次提取目标反爬虫操作,实现对反爬虫操作集合的遍历。Moreover, according to the preset crawler identification sequence, each anti-crawler operation in the above-mentioned anti-crawler operation set is sorted, and the obtained sorting results may be: terminal feature identification operation, token identification operation, human-machine feature identification operation, data analysis Identify operations and signature identify operations. The execution body can sequentially extract the target anti-crawling operations according to the sorting results, so as to traverse the set of anti-crawling operations.

步骤403,在目标反爬虫操作为令牌识别操作的情况下,确定请求信息对应的令牌索引信息;基于令牌索引信息,确定目标字符;响应于确定目标字符和预设的字符不匹配,确定识别结果为请求信息为爬虫。Step 403, when the target anti-crawler operation is a token recognition operation, determine the token index information corresponding to the request information; determine the target character based on the token index information; in response to determining that the target character does not match the preset character, It is determined that the recognition result is that the request information is a crawler.

在本实施例中,如果目标反爬虫操作为令牌识别操作,则可以对请求信息中携带的令牌信息(token)进行校验。具体的,执行主体可以包括网页渲染模块(web-xrender)和网页接口管理模块(webapi)。在用户请求渲染页面时,执行主体中的网页渲染模块可以向用户发送相应的令牌索引信息,例如,可以向用户发送无填充的、base64url(一种任意二进制到文本字符串的编码方法)加密的令牌信息,令牌信息中包含着上述的令牌索引信息。用户在请求访问页面数据时,会在请求信息中携带该令牌信息。执行主体中的网页接口管理模块可以通过令牌索引信息,确定出需要进行校验的目标字符,并将目标字符和预设的字符进行匹配,如果不匹配,确定识别结果为请求信息为爬虫,如果匹配,确定识别结果为请求信息为用户。In this embodiment, if the target anti-crawling operation is a token identification operation, the token information (token) carried in the request information can be verified. Specifically, the execution body may include a web page rendering module (web-xrender) and a web page interface management module (webapi). When a user requests to render a page, the web page rendering module in the execution body can send the corresponding token index information to the user, for example, it can send the user an unfilled, base64url (an arbitrary binary to text string encoding method) encrypted The token information contains the above token index information. When a user requests to access page data, the token information will be carried in the request information. The web interface management module in the execution body can determine the target characters that need to be verified through the token index information, and match the target characters with the preset characters. If it matches, it is determined that the identification result is that the request information is the user.

其中,预设的字符可以为用户请求渲染页面时,下发的令牌字符信息。如果请求信息是用户发出的,则此时请求信息中的令牌字符信息会和预设的字符相同。如果请求信息是爬虫发出的,则此时请求信息中的令牌字符信息会和预设的字符不同。通过令牌识别操作,可以基于执行主体中预设的字符对请求信息进行校验,如果请求信息被攻破,可以通过修改预设的字符重新校验,安全性修改更加方便。可选的,执行主体还可以对网页小程序前端显示的内容进行代码混淆,用以防止前端的代码被破解,进一步提高网页小程序的安全性。The preset character may be the token character information issued when the user requests to render the page. If the request message is sent by the user, the token character information in the request message at this time will be the same as the preset character. If the request information is sent by a crawler, the token character information in the request information at this time will be different from the preset characters. Through the token recognition operation, the request information can be verified based on the preset characters in the execution body. If the request information is compromised, it can be re-verified by modifying the preset characters, which makes the security modification more convenient. Optionally, the execution body may also code obfuscate the content displayed on the front end of the web page applet, so as to prevent the front end code from being cracked, and further improve the security of the web page applet.

在本实施例的一些可选的实现方式中,还可以执行以下步骤:确定请求信息对应的目标小程序标识和目标时间戳;响应于确定目标小程序标识和预设的小程序标识不匹配、或者目标时间戳已过期,确定识别结果为请求信息为爬虫。In some optional implementations of this embodiment, the following steps may also be performed: determining a target applet identifier and a target timestamp corresponding to the request information; in response to determining that the target applet identifier does not match the preset applet identifier, Or the target timestamp has expired, and it is determined that the recognition result is that the requested information is a crawler.

在本实现方式中,请求信息可以对应着相应的目标小程序标识和目标时间戳。可选的,目标小程序标识、目标时间戳和令牌信息可以关联存储。其中,目标小程序标识为小程序的唯一标识信息,目标时间戳用于描述令牌信息的时效性。并且,执行主体可以基于对当前的网页域名进行分析,得到预设的小程序标识。如果目标小程序标识和预设的小程序标识不匹配,则说明识别结果为请求信息为爬虫。如果目标小程序标识和预设的小程序标识匹配,则说明识别结果为请求信息为用户。以及,执行主体还可以预先存储有效时间,如果当前时间和目标时间戳之间的时间差值大于预设的有效时间,则说明请求信息已过期,可以确定识别结果为爬虫,并拦截请求信息。In this implementation manner, the request information may correspond to the corresponding target applet identifier and target timestamp. Optionally, the target applet identifier, target timestamp and token information may be stored in association. The target applet identifier is the unique identification information of the applet, and the target timestamp is used to describe the timeliness of the token information. In addition, the execution body may obtain a preset applet identifier based on the analysis of the current web page domain name. If the target applet identifier does not match the preset applet identifier, it means that the identification result is that the request information is a crawler. If the target applet identifier matches the preset applet identifier, it means that the identification result is that the request information is the user. In addition, the execution body can also store the valid time in advance. If the time difference between the current time and the target timestamp is greater than the preset valid time, it means that the request information has expired, the recognition result can be determined to be a crawler, and the request information can be intercepted.

举例而言,在用户请求对小程序网页进行渲染时,网页渲染模块向请求渲染的浏览器下发无填充的、base64url加密的令牌信息,令牌信息可以对应有目标小程序标识和目标时间戳。之后,用户可以再向网页接口管理模块发送请求访问页面数据的请求信息,并在请求信息中携带上述的令牌信息、目标小程序标识和目标时间戳。执行主体对无填充的、base64url加密的令牌信息进行解密,并验证目标小程序标识是否正确,以及验证目标时间戳是否过期。如果解密得到的令牌信息正确、目标小程序标识正确以及目标时间戳未过期,可以确定识别结果为用户。如果解密得到的令牌信息错误、目标小程序标识错误以及目标时间戳过期,可以确定识别结果为爬虫。For example, when the user requests to render the applet webpage, the webpage rendering module sends unfilled, base64url-encrypted token information to the browser requesting rendering, and the token information can correspond to the target applet identification and target time. stamp. After that, the user may send request information for requesting access to page data to the web interface management module, and carry the above token information, target applet identifier and target timestamp in the request information. The execution body decrypts the unpadded, base64url-encrypted token information, and verifies whether the target applet identification is correct, and whether the target timestamp has expired. If the token information obtained by decryption is correct, the target applet identification is correct, and the target timestamp has not expired, it can be determined that the identification result is the user. If the decrypted token information is incorrect, the target applet identification is incorrect, and the target timestamp expires, it can be determined that the recognition result is a crawler.

其中,加密的令牌信息可以基于以下步骤生成:生成随机字母数,并将随机字母数转换为二进制的随机数;基于当前时间,生成目标时间戳;基于用户请求的小程序标识,生成目标小程序标识;将随机数、目标时间戳、目标小程序标识拼接,计算无填充的、base64url字符串;按照预设的索引,确定在字符串中插入字符的指定位置;在指定位置插入预设的字符,得到插入字符后的、无填充的、base64url加密的令牌信息。The encrypted token information can be generated based on the following steps: generating a random letter number and converting the random letter number into a binary random number; generating a target timestamp based on the current time; generating a target small program based on the applet identifier requested by the user Program identification; splicing the random number, target timestamp, and target applet identification to calculate the unfilled, base64url string; according to the preset index, determine the specified position to insert characters in the string; insert the preset in the specified position character, get the token information after the inserted character, no padding, base64url encrypted.

进一步的,执行主体对令牌信息进行解密的步骤如下:确定请求信息中的令牌信息,以及针对令牌信息的令牌索引信息,这里的令牌索引信息可以为用于读取字符的索引信息,可以预先存储在执行主体中;基于令牌索引信息,确定令牌信息中相应位置的目标字符。将目标字符和预先存储在执行主体中的预设的字符进行匹配,如果字符匹配、且令牌信息中的目标小程序标识正确以及目标时间戳未过期,确定识别结果为请求信息为用户。如果字符不匹配、或者令牌信息中的目标小程序标识不正确、或者目标时间戳已过期,则确定识别结果为请求信息为爬虫。Further, the step of decrypting the token information by the execution body is as follows: determining the token information in the request information and the token index information for the token information, where the token index information can be an index used for reading characters The information can be pre-stored in the execution body; based on the token index information, the target character at the corresponding position in the token information is determined. Match the target character with the preset character pre-stored in the execution body. If the characters match, and the target applet identification in the token information is correct and the target time stamp has not expired, it is determined that the identification result is that the request information is the user. If the characters do not match, or the target applet identification in the token information is incorrect, or the target timestamp has expired, it is determined that the identification result is that the requested information is a crawler.

步骤404,在目标反爬虫操作为数据分析识别操作的情况下,获取爬虫分析数据;基于爬虫分析数据,对请求信息进行爬虫识别,得到请求信息对应的识别结果。Step 404, in the case that the target anti-crawler operation is a data analysis and identification operation, obtain crawler analysis data; based on the crawler analysis data, perform crawler identification on the request information, and obtain an identification result corresponding to the request information.

在本实施例中,执行主体可以预先获取爬虫分析数据,其中,爬虫分析数据可以为对历史爬虫数据、当前请求信息的特征、不同爬虫场景的特征进行分析得到的数据。基于爬虫分析数据,可以对请求信息进行爬虫识别,得到识别结果。识别结果可以为指示请求信息为爬虫,也可以为指示请求信息为用户。可选的,执行主体还可以基于爬虫分析数据和请求信息,生成相对应的等级打分,例如,对于请求信息为爬虫的概率越高的情况,相对应的等级打分可以越高。In this embodiment, the execution body may obtain crawler analysis data in advance, wherein the crawler analysis data may be data obtained by analyzing historical crawler data, characteristics of current request information, and characteristics of different crawler scenarios. Based on the crawler analysis data, the request information can be identified by crawler, and the identification result can be obtained. The identification result may indicate that the request information is a crawler, or may indicate that the request information is a user. Optionally, the execution subject may also generate a corresponding grade score based on the crawler analysis data and the request information. For example, for a situation where the probability that the request information is a crawler is higher, the corresponding grade score may be higher.

步骤405,在目标反爬虫操作为签名同步识别操作的情况下,确定请求信息中的签名信息;基于签名信息和预设的签名信息,得到识别结果。Step 405 , when the target anti-crawling operation is a signature synchronization identification operation, determine the signature information in the request information; obtain the identification result based on the signature information and the preset signature information.

在本实施例中,执行主体可以和开发者服务器共用签名生成算法。在对请求信息进行校验时,可以将请求信息中的签名信息和预设的、执行主体和开发者服务器共用签名生成算法生成的签名信息进行比对,如果签名相同,则确定识别结果为请求信息为用户,如果签名不相同,则确定识别结果为请求信息为爬虫。其中,对于开发者服务器而言,也可以基于同样的签名比对方式,对请求信息进行识别。In this embodiment, the execution body and the developer server may share the signature generation algorithm. When verifying the request information, the signature information in the request information can be compared with the preset signature information generated by the signature generation algorithm shared by the executive body and the developer server. If the signatures are the same, the identification result is determined to be the request. The information is the user, and if the signatures are not the same, it is determined that the identification result is that the requested information is a crawler. Wherein, for the developer server, the request information can also be identified based on the same signature comparison method.

其中,共用签名生成算法基于以下步骤生成签名:获取小程序密钥;计算小程序密钥对应的md5(一种被广泛使用的密码散列函数)值;将网址中的指定部分(如由后至前若干字符)进行解码,得到第一解码值;将查询信息中的键值对进行解码,并对解码后的键值对进行排序,得到第二解码值;将上述的md5值、第一解码值、第二解码值和时间戳进行拼接,得到拼接后的字符串;将拼接后的字符串进行md5加密,生成加密后的签名。Among them, the shared signature generation algorithm generates the signature based on the following steps: obtaining the applet key; calculating the md5 (a widely used cryptographic hash function) value corresponding to the applet key; to the first few characters) to obtain the first decoded value; decode the key-value pairs in the query information, and sort the decoded key-value pairs to obtain the second decoded value; The decoded value, the second decoded value and the timestamp are spliced to obtain a spliced string; the spliced character string is encrypted by md5 to generate an encrypted signature.

进一步的,在对请求信息中的签名信息进行校验时,执行主体可以先对签名信息进行md5解密,得到解密后的签名信息;分别将解密后的签名信息中的密钥子部分和上述的小程序密钥对应的md5值进行比对,将解密后的签名信息中的网址解码值和上述的第一解码值进行比对,将解密后的签名信息中的排序后的键值对与上述的第二解码值进行比对,将时间戳和当前时间进行比对。如果解密后的签名信息中的密钥子部分和上述的小程序密钥对应的md5值相同、解密后的签名信息中的网址解码值和上述的第一解码值相同、解密后的签名信息中的排序后的键值对与上述的第二解码值相同且时间戳未过期,则确定签名相同,识别结果为请求信息为用户。如果不满足上述条件,则确定签名不相同,识别结果为请求信息为爬虫。Further, when verifying the signature information in the request information, the execution subject can first perform md5 decryption on the signature information to obtain the decrypted signature information; respectively, the key sub-part in the decrypted signature information and the above-mentioned Compare the md5 value corresponding to the applet key, compare the URL decoded value in the decrypted signature information with the above-mentioned first decoded value, and compare the sorted key-value pairs in the decrypted signature information with the above-mentioned first decoded value. The second decoded value is compared, and the timestamp is compared with the current time. If the key sub-part in the decrypted signature information is the same as the md5 value corresponding to the above-mentioned applet key, the URL decoded value in the decrypted signature information is the same as the above-mentioned first decoded value, and the decrypted signature information in the If the sorted key-value pair is the same as the above-mentioned second decoded value and the timestamp has not expired, it is determined that the signatures are the same, and the identification result is that the request information is the user. If the above conditions are not met, it is determined that the signatures are not identical, and the identification result is that the requested information is a crawler.

在本实施例的一些可选的实现方式中,还可以执行以下步骤:基于识别结果,确定请求信息对应的爬虫分数信息;输出爬虫分数信息。In some optional implementation manners of this embodiment, the following steps may also be performed: based on the identification result, determine the crawler score information corresponding to the request information; and output the crawler score information.

在本实现方式中,执行主体可以基于上述的数据分析识别操作对应的识别结果和等级打分,以及上述的人机特征识别操作的识别结果,生成请求信息对应的爬虫分数信息,并将爬虫分数信息输出给开发者服务器,以供开发者服务器采取其他反爬虫手段进行相应的处理。这里的识别结果可以是预设的反爬虫操作集合中的反爬虫操作中至少一项的识别结果。其中,爬虫分数信息用于描述请求信息为爬虫的概率情况。In this implementation manner, the execution subject may generate crawler score information corresponding to the request information based on the identification result and grade score corresponding to the above-mentioned data analysis and identification operation, and the identification result of the above-mentioned human-machine feature identification operation, and use the crawler score information Output to the developer server for the developer server to take other anti-crawling methods for corresponding processing. The identification result here may be the identification result of at least one item of anti-crawling operations in the preset anti-crawling operation set. The crawler score information is used to describe the probability that the requested information is a crawler.

步骤406,响应于确定识别结果指示请求信息为爬虫,将识别结果确定为目标爬虫识别结果。Step 406, in response to determining that the identification result indicates that the requested information is a crawler, determine the identification result as the identification result of the target crawler.

在本实施例中,对于步骤406的详细描述,请参照对于步骤204的详细描述,在此不再赘述。In this embodiment, for the detailed description ofstep 406, please refer to the detailed description ofstep 204, which is not repeated here.

步骤407,响应于确定识别结果指示请求信息不为爬虫、且预设的反爬虫操作集合未遍历完成,按照预设的爬虫识别顺序,从预设的反爬虫操作集合中重新确定针对请求信息的目标反爬虫操作。Step 407, in response to determining that the identification result indicates that the request information is not a crawler and the preset anti-crawler operation set has not been traversed, re-determine the request information from the preset anti-crawler operation set according to the preset crawler identification sequence. Target anti-crawling operations.

在本实施例中,如果识别结果指示请求信息不为爬虫(为用户)、且预设的反爬虫操作集合未遍历完成,也即是,预设的反爬虫操作集合中存在未使用的反爬虫操作,则按照预设的爬虫识别顺序,从预设的反爬虫操作集合中重新确定目标反爬虫操作,继续确定识别结果。可选的,执行主体可以先存储本次的识别结果,如果识别结果具有相应的爬虫等级等信息,也可以对应存储识别结果与爬虫等级等信息。之后,再重新确定目标反爬虫操作。In this embodiment, if the identification result indicates that the request information is not a crawler (for the user) and the preset anti-crawler operation set has not been traversed, that is, there are unused anti-crawlers in the preset anti-crawler operation set operation, according to the preset crawler identification sequence, re-determine the target anti-crawler operation from the preset anti-crawler operation set, and continue to determine the identification result. Optionally, the execution body may first store the current recognition result, and if the recognition result has information such as the corresponding crawler level, it may also store the information such as the identification result and the crawler level correspondingly. After that, re-determine the target anti-reptile operation.

步骤408,响应于确定识别结果指示请求信息不为爬虫、且预设的反爬虫操作集合遍历完成,将识别结果确定为目标爬虫识别结果。Step 408 , in response to determining that the identification result indicates that the request information is not a crawler and the preset anti-crawler operation set traversal is completed, determine the identification result as the target crawler identification result.

在本实施例中,如果遍历完成预设的反爬虫操作集合,每次的识别结果均指示请求信息不为爬虫,则将指示请求信息不为爬虫的识别结果确定为目标爬虫识别结果。In this embodiment, if the preset anti-crawler operation set is traversed and completed, and each identification result indicates that the request information is not a crawler, the identification result indicating that the request information is not a crawler is determined as the target crawler identification result.

本公开的上述实施例提供的用于识别爬虫的方法,还可以在识别结果指示请求信息不为爬虫、且反爬虫操作集合未遍历完成的情况下,重新确定目标反爬虫操作,直至识别出爬虫,或者将反爬虫操作集合遍历完成识别出不为爬虫,从而实现了逐级增强对爬虫的安全防护,直至所有反爬虫操作均使用完成,提高了对于爬虫识别的精准度。以及,可以基于令牌识别操作应对伪造端特征请求重放的爬虫场景,基于对第三方网址加密应对攻击第三方服务的爬虫场景,采用数据分析识别操作应对端特征识别操作、令牌识别操作、人机特征识别操作等操作无法识别出的爬虫场景,采用签名同步识别操作,应对攻击开发者服务器的爬虫场景,实现不同爬虫场景的针对性防护。以及,还可以生成爬虫分数信息,以供开发者进一步处理,提高了爬虫处理的灵活性。The method for recognizing a crawler provided by the above-mentioned embodiments of the present disclosure can also re-determine the target anti-crawling operation when the identification result indicates that the request information is not a crawler and the set of anti-crawling operations has not been traversed until the crawler is identified , or complete the traversal of the anti-crawler operation set to identify that it is not a crawler, thereby realizing the step-by-step enhancement of the security protection for the crawler until all anti-crawler operations are completed, which improves the accuracy of crawler identification. And, based on the token recognition operation to deal with the crawler scene of the fake terminal feature request replay, based on the third-party website encryption to deal with the crawler scene of attacking the third-party service, the data analysis and recognition operation can be used to deal with the terminal feature recognition operation, token recognition operation, For crawler scenarios that cannot be identified by operations such as human-machine feature recognition operations, signature synchronization identification operations are used to deal with crawler scenarios that attack the developer server, and to achieve targeted protection for different crawler scenarios. In addition, crawler score information can also be generated for further processing by developers, which improves the flexibility of crawler processing.

进一步参考图5,作为对上述各图所示方法的实现,本公开提供了一种用于识别爬虫的装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于网页小程序代理服务器中。With further reference to FIG. 5 , as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for recognizing a crawler. The apparatus embodiment corresponds to the method embodiment shown in FIG. 2 . The device can be specifically applied to a web page applet proxy server.

如图5所示,本实施例的用于识别爬虫的装置500包括:信息获取单元501、操作确定单元502、爬虫识别单元503和结果确定单元504。As shown in FIG. 5 , the apparatus 500 for identifying a crawler in this embodiment includes: an information acquiring unit 501 , an operation determining unit 502 , a crawler identifying unit 503 and a result determining unit 504 .

信息获取单元501,被配置成获取请求访问页面数据的请求信息。The information acquisition unit 501 is configured to acquire request information for requesting to access page data.

操作确定单元502,被配置成按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定针对请求信息的目标反爬虫操作。The operation determining unit 502 is configured to determine a target anti-crawler operation for the request information from a preset anti-crawler operation set according to a preset crawler identification sequence.

爬虫识别单元503,被配置成基于目标反爬虫操作,对请求信息进行爬虫识别,得到识别结果。The crawler identification unit 503 is configured to perform crawler identification on the request information based on the target anti-crawler operation to obtain an identification result.

结果确定单元504,被配置成响应于确定识别结果指示请求信息为爬虫,将识别结果确定为目标爬虫识别结果。The result determination unit 504 is configured to, in response to determining that the identification result indicates that the request information is a crawler, determine the identification result as the identification result of the target crawler.

在本实施例的一些可选的实现方式中,操作确定单元502进一步被配置成:响应于确定识别结果指示请求信息不为爬虫、且预设的反爬虫操作集合未遍历完成,按照预设的爬虫识别顺序,从预设的反爬虫操作集合中重新确定针对请求信息的目标反爬虫操作。In some optional implementations of this embodiment, the operation determining unit 502 is further configured to: in response to determining that the identification result indicates that the request information is not a crawler, and the preset anti-crawler operation set has not been traversed, according to the preset The crawler identification sequence is to re-determine the target anti-crawler operation for the requested information from the preset anti-crawler operation set.

在本实施例的一些可选的实现方式中,结果确定单元504进一步被配置成:响应于确定识别结果指示请求信息不为爬虫、且预设的反爬虫操作集合遍历完成,将识别结果确定为目标爬虫识别结果。In some optional implementations of this embodiment, the result determination unit 504 is further configured to: in response to determining that the identification result indicates that the request information is not a crawler and the preset anti-crawler operation set traversal is completed, determine the identification result as Target crawler identification result.

在本实施例的一些可选的实现方式中,目标反爬虫操作至少包括令牌识别操作;以及,爬虫识别单元503进一步被配置成:确定请求信息对应的令牌索引信息;基于令牌索引信息,确定目标字符;响应于确定目标字符和预设的字符不匹配,确定识别结果为请求信息为爬虫。In some optional implementations of this embodiment, the target anti-crawler operation includes at least a token identification operation; and the crawler identification unit 503 is further configured to: determine token index information corresponding to the request information; based on the token index information , determine the target character; in response to determining that the target character does not match the preset character, determine that the recognition result is that the request information is a crawler.

在本实施例的一些可选的实现方式中,爬虫识别单元503进一步被配置成:确定请求信息对应的目标小程序标识和目标时间戳;响应于确定目标小程序标识和预设的小程序标识不匹配、或者目标时间戳已过期,确定识别结果为请求信息为爬虫。In some optional implementations of this embodiment, the crawler identification unit 503 is further configured to: determine the target applet identifier and the target time stamp corresponding to the request information; in response to determining the target applet identifier and the preset applet identifier If it does not match, or the target timestamp has expired, it is determined that the identification result is that the requested information is a crawler.

在本实施例的一些可选的实现方式中,还包括:网络访问单元,被配置成确定请求信息对应的加密网络地址;确定加密网络地址中的第一加密索引和第二加密索引;基于第一加密索引和第二加密索引,对加密网络地址进行解密,得到解密网络地址;基于解密网络地址,进行网络访问。In some optional implementations of this embodiment, the method further includes: a network access unit configured to determine an encrypted network address corresponding to the request information; determine a first encrypted index and a second encrypted index in the encrypted network address; The first encrypted index and the second encrypted index decrypt the encrypted network address to obtain the decrypted network address; and perform network access based on the decrypted network address.

在本实施例的一些可选的实现方式中,目标反爬虫操作至少包括数据分析识别操作;以及,爬虫识别单元503进一步被配置成:获取爬虫分析数据;基于爬虫分析数据,对请求信息进行爬虫识别,得到请求信息对应的识别结果。In some optional implementations of this embodiment, the target anti-crawling operation includes at least a data analysis and identification operation; and the crawler identification unit 503 is further configured to: acquire crawler analysis data; and perform crawling on the request information based on the crawler analysis data Identify, and obtain the identification result corresponding to the request information.

在本实施例的一些可选的实现方式中,目标反爬虫操作至少包括签名同步识别操作;以及,爬虫识别单元503进一步被配置成:确定请求信息中的签名信息;基于签名信息和预设的签名信息,得到识别结果。In some optional implementations of this embodiment, the target anti-crawler operation includes at least a signature synchronization identification operation; and the crawler identification unit 503 is further configured to: determine the signature information in the request information; Signature information to get the identification result.

在本实施例的一些可选的实现方式中,还包括:分数输出单元,被配置成基于识别结果,确定请求信息对应的爬虫分数信息;输出爬虫分数信息。In some optional implementations of this embodiment, the method further includes: a score output unit, configured to determine the crawler score information corresponding to the request information based on the identification result; and output the crawler score information.

在本实施例的一些可选的实现方式中,请求信息用于请求访问网页小程序的页面数据。In some optional implementations of this embodiment, the request information is used to request access to page data of the web page applet.

在本实施例的一些可选的实现方式中,预设的反爬虫操作集合中的反爬虫操作至少包括以下一项:端特征识别操作、令牌识别操作、人机特征识别操作、数据分析识别操作和签名识别操作。In some optional implementations of this embodiment, the anti-crawling operations in the preset anti-crawling operation set include at least one of the following: terminal feature recognition operation, token recognition operation, human-machine feature recognition operation, data analysis and recognition operation Actions and Signature Recognition Actions.

应当理解,用于识别爬虫的装置500中记载的单元501至单元504分别与参考图2中描述的方法中的各个步骤相对应。由此,上文针对用于识别爬虫的方法描述的操作和特征同样适用于装置500及其中包含的单元,在此不再赘述。It should be understood that the units 501 to 504 recorded in the apparatus 500 for recognizing crawlers correspond to respective steps in the method described with reference to FIG. 2 . Therefore, the operations and features described above with respect to the method for recognizing a crawler are also applicable to the apparatus 500 and the units included therein, and will not be repeated here.

根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序系统。According to an embodiment of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program system.

图6示出了可以用来实施本公开的实施例的示例电子设备600的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。6 shows a schematic block diagram of an exampleelectronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

如图6所示,设备600包括计算单元601,其可以根据存储在只读存储器(ROM)602中的计算机程序或者从存储单元608加载到随机访问存储器(RAM)603中的计算机程序,来执行各种适当的动作和处理。在RAM 603中,还可存储设备600操作所需的各种程序和数据。计算单元601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6 , thedevice 600 includes acomputing unit 601 that can be executed according to a computer program stored in a read only memory (ROM) 602 or loaded into a random access memory (RAM) 603 from astorage unit 608 Various appropriate actions and handling. In theRAM 603, various programs and data necessary for the operation of thedevice 600 can also be stored. Thecomputing unit 601 , theROM 602 , and theRAM 603 are connected to each other through abus 604 . An input/output (I/O)interface 605 is also connected tobus 604 .

设备600中的多个部件连接至I/O接口605,包括:输入单元606,例如键盘、鼠标等;输出单元607,例如各种类型的显示器、扬声器等;存储单元608,例如磁盘、光盘等;以及通信单元609,例如网卡、调制解调器、无线通信收发机等。通信单元609允许设备600通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Various components in thedevice 600 are connected to the I/O interface 605, including: aninput unit 606, such as a keyboard, mouse, etc.; anoutput unit 607, such as various types of displays, speakers, etc.; astorage unit 608, such as a magnetic disk, an optical disk, etc. ; and acommunication unit 609, such as a network card, a modem, a wireless communication transceiver, and the like. Thecommunication unit 609 allows thedevice 600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元601可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元601的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元601执行上文所描述的各个方法和处理,例如用于识别爬虫的方法。例如,在一些实施例中,用于识别爬虫的方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元608。在一些实施例中,计算机程序的部分或者全部可以经由ROM 602和/或通信单元609而被载入和/或安装到设备600上。当计算机程序加载到RAM 603并由计算单元601执行时,可以执行上文描述的用于识别爬虫的方法的一个或多个步骤。备选地,在其他实施例中,计算单元601可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行用于识别爬虫的方法。Computing unit 601 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computingunits 601 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various specialized artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. Thecomputing unit 601 performs the various methods and processes described above, such as the method for recognizing crawlers. For example, in some embodiments, a method for identifying crawlers may be implemented as a computer software program tangibly embodied on a machine-readable medium, such asstorage unit 608 . In some embodiments, part or all of the computer program may be loaded and/or installed ondevice 600 viaROM 602 and/orcommunication unit 609 . When the computer program is loaded intoRAM 603 and executed by computingunit 601, one or more steps of the above-described method for identifying crawlers may be performed. Alternatively, in other embodiments, thecomputing unit 601 may be configured by any other suitable means (eg, by means of firmware) to perform the method for identifying crawlers.

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein above can be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.

用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, performs the functions/functions specified in the flowcharts and/or block diagrams. Action is implemented. The program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.

在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having: a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user computer having a graphical user interface or web browser through which a user can interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式系统的服务器,或者是结合了区块链的服务器。A computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, a distributed system server, or a server combined with blockchain.

应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present disclosure can be executed in parallel, sequentially, or in different orders. As long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, there is no limitation herein.

上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The above-mentioned specific embodiments do not constitute a limitation on the protection scope of the present disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements, and improvements made within the spirit and principles of the present disclosure should be included within the protection scope of the present disclosure.

Claims (21)

Translated fromChinese
1.一种用于识别爬虫的方法,包括:1. A method for identifying crawlers, comprising:获取请求访问页面数据的请求信息;其中,所述页面数据为网页小程序对应的页面数据,所述请求信息用于请求访问所述网页小程序的页面数据;obtaining request information for requesting access to page data; wherein the page data is page data corresponding to a web page applet, and the request information is used to request access to the page data of the web page applet;按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定针对所述请求信息的目标反爬虫操作;其中,按照所述预设的爬虫识别顺序对所述预设的反爬虫操作集合中的各个反爬虫操作进行排序,得到排序结果依次为:端特征识别操作、令牌识别操作、人机特征识别操作、数据分析识别操作和签名同步识别操作;其中,所述预设的爬虫识别顺序基于爬虫场景的级别由低至高的顺序确定;According to the preset crawler identification sequence, the target anti-crawler operation for the request information is determined from the preset anti-crawler operation set; wherein, according to the preset crawler identification sequence, the preset anti-crawler operation set is Sort each anti-crawling operation in , and the sorting results are obtained in sequence: terminal feature recognition operation, token recognition operation, man-machine feature recognition operation, data analysis recognition operation and signature synchronization recognition operation; wherein, the preset crawler recognition operation The order is determined from low to high order based on the level of the crawler scene;基于所述目标反爬虫操作,对所述请求信息进行爬虫识别,得到识别结果;Based on the target anti-crawler operation, perform crawler identification on the request information to obtain an identification result;响应于确定所述识别结果指示所述请求信息为爬虫,将所述识别结果确定为目标爬虫识别结果。In response to determining that the identification result indicates that the request information is a crawler, the identification result is determined to be a target crawler identification result.2.根据权利要求1所述的方法,还包括:2. The method of claim 1, further comprising:响应于确定所述识别结果指示所述请求信息不为爬虫、且所述预设的反爬虫操作集合未遍历完成,按照所述预设的爬虫识别顺序,从所述预设的反爬虫操作集合中重新确定针对所述请求信息的所述目标反爬虫操作。In response to determining that the identification result indicates that the request information is not a crawler, and the preset anti-crawler operation set has not been traversed, according to the preset crawler identification sequence, from the preset anti-crawler operation set in redetermining the target anti-crawler operation for the request information.3.根据权利要求1所述的方法,还包括:3. The method of claim 1, further comprising:响应于确定所述识别结果指示所述请求信息不为爬虫、且所述预设的反爬虫操作集合遍历完成,将所述识别结果确定为所述目标爬虫识别结果。In response to determining that the identification result indicates that the request information is not a crawler and the preset anti-crawler operation set traversal is completed, the identification result is determined as the target crawler identification result.4.根据权利要求1所述的方法,其中,所述基于所述目标反爬虫操作,对所述请求信息进行爬虫识别,得到识别结果,包括:4. The method according to claim 1, wherein, performing crawler identification on the request information based on the target anti-crawling operation to obtain an identification result, comprising:响应于确定所述目标反爬虫操作为所述令牌识别操作,确定所述请求信息对应的令牌索引信息;In response to determining that the target anti-crawling operation is the token identification operation, determining the token index information corresponding to the request information;基于所述令牌索引信息,确定目标字符;determining a target character based on the token index information;响应于确定所述目标字符和预设的字符不匹配,确定所述识别结果为所述请求信息为爬虫。In response to determining that the target character does not match a preset character, it is determined that the recognition result is that the request information is a crawler.5.根据权利要求4所述的方法,还包括:5. The method of claim 4, further comprising:确定所述请求信息对应的目标小程序标识和目标时间戳;Determine the target applet identifier and target timestamp corresponding to the request information;响应于确定所述目标小程序标识和预设的小程序标识不匹配、或者所述目标时间戳已过期,确定所述识别结果为所述请求信息为爬虫。In response to determining that the target applet identifier does not match a preset applet identifier, or the target timestamp has expired, determine that the identification result is that the request information is a crawler.6.根据权利要求1所述的方法,还包括:6. The method of claim 1, further comprising:确定所述请求信息对应的加密网络地址;determining the encrypted network address corresponding to the request information;确定所述加密网络地址中的第一加密索引和第二加密索引;determining the first encrypted index and the second encrypted index in the encrypted network address;基于所述第一加密索引和所述第二加密索引,对所述加密网络地址进行解密,得到解密网络地址;Decrypting the encrypted network address based on the first encryption index and the second encryption index to obtain a decrypted network address;基于所述解密网络地址,进行网络访问。Based on the decrypted network address, network access is performed.7.根据权利要求1所述的方法,其中,所述基于所述目标反爬虫操作,对所述请求信息进行爬虫识别,得到识别结果,包括:7. The method according to claim 1, wherein, performing crawler identification on the request information based on the target anti-crawling operation to obtain an identification result, comprising:响应于确定所述目标反爬虫操作为所述数据分析识别操作,获取爬虫分析数据;In response to determining that the target anti-crawler operation is the data analysis and identification operation, acquiring crawler analysis data;基于所述爬虫分析数据,对所述请求信息进行爬虫识别,得到所述请求信息对应的所述识别结果。Based on the crawler analysis data, crawler identification is performed on the request information, and the identification result corresponding to the request information is obtained.8.根据权利要求1所述的方法,其中,所述基于所述目标反爬虫操作,对所述请求信息进行爬虫识别,得到识别结果,包括:8. The method according to claim 1, wherein, performing crawler identification on the request information based on the target anti-crawling operation to obtain an identification result, comprising:响应于确定所述目标反爬虫操作为所述签名同步识别操作,确定所述请求信息中的签名信息;In response to determining that the target anti-crawling operation is the signature synchronization identification operation, determining signature information in the request information;基于所述签名信息和预设的签名信息,得到所述识别结果。Based on the signature information and preset signature information, the identification result is obtained.9.根据权利要求1所述的方法,还包括:9. The method of claim 1, further comprising:基于所述识别结果,确定所述请求信息对应的爬虫分数信息;Based on the identification result, determine the crawler score information corresponding to the request information;输出所述爬虫分数信息。Output the crawler score information.10.一种用于识别爬虫的装置,包括:10. A device for identifying a crawler, comprising:信息获取单元,被配置成获取请求访问页面数据的请求信息;其中,所述页面数据为网页小程序对应的页面数据,所述请求信息用于请求访问所述网页小程序的页面数据;an information obtaining unit configured to obtain request information for requesting access to page data; wherein the page data is page data corresponding to a web page applet, and the request information is used to request access to the page data of the web page applet;操作确定单元,被配置成按照预设的爬虫识别顺序,从预设的反爬虫操作集合中确定针对所述请求信息的目标反爬虫操作;其中,按照所述预设的爬虫识别顺序对所述预设的反爬虫操作集合中的各个反爬虫操作进行排序,得到排序结果依次为:端特征识别操作、令牌识别操作、人机特征识别操作、数据分析识别操作和签名同步识别操作;其中,所述预设的爬虫识别顺序基于爬虫场景的级别由低至高的顺序确定;The operation determination unit is configured to determine a target anti-crawler operation for the request information from a preset anti-crawler operation set according to a preset crawler identification sequence; wherein, according to the preset crawler identification sequence, the Sort each anti-crawler operation in the preset anti-crawler operation set, and the sorting results are obtained in order: terminal feature recognition operation, token recognition operation, man-machine feature recognition operation, data analysis and recognition operation and signature synchronization recognition operation; wherein, The preset crawler identification sequence is determined in an order from low to high based on the level of the crawler scene;爬虫识别单元,被配置成基于所述目标反爬虫操作,对所述请求信息进行爬虫识别,得到识别结果;A crawler identification unit, configured to perform crawler identification on the request information based on the target anti-crawler operation to obtain an identification result;结果确定单元,被配置成响应于确定所述识别结果指示所述请求信息为爬虫,将所述识别结果确定为目标爬虫识别结果。A result determination unit, configured to, in response to determining that the identification result indicates that the request information is a crawler, determine the identification result as a target crawler identification result.11.根据权利要求10所述的装置,所述操作确定单元进一步被配置成:11. The apparatus of claim 10, the operation determination unit is further configured to:响应于确定所述识别结果指示所述请求信息不为爬虫、且所述预设的反爬虫操作集合未遍历完成,按照所述预设的爬虫识别顺序,从所述预设的反爬虫操作集合中重新确定针对所述请求信息的所述目标反爬虫操作。In response to determining that the identification result indicates that the request information is not a crawler, and the preset anti-crawler operation set has not been traversed, according to the preset crawler identification sequence, from the preset anti-crawler operation set in redetermining the target anti-crawler operation for the request information.12.根据权利要求10所述的装置,所述结果确定单元进一步被配置成:12. The apparatus of claim 10, the result determination unit further configured to:响应于确定所述识别结果指示所述请求信息不为爬虫、且所述预设的反爬虫操作集合遍历完成,将所述识别结果确定为所述目标爬虫识别结果。In response to determining that the identification result indicates that the request information is not a crawler and the preset anti-crawler operation set traversal is completed, the identification result is determined as the target crawler identification result.13.根据权利要求10所述的装置,其中,所述爬虫识别单元进一步被配置成:13. The apparatus of claim 10, wherein the crawler identification unit is further configured to:响应于确定所述目标反爬虫操作为所述令牌识别操作,确定所述请求信息对应的令牌索引信息;In response to determining that the target anti-crawling operation is the token identification operation, determining the token index information corresponding to the request information;基于所述令牌索引信息,确定目标字符;determining a target character based on the token index information;响应于确定所述目标字符和预设的字符不匹配,确定所述识别结果为所述请求信息为爬虫。In response to determining that the target character does not match a preset character, it is determined that the recognition result is that the request information is a crawler.14.根据权利要求13所述的装置,所述爬虫识别单元进一步被配置成:14. The apparatus of claim 13, the crawler identification unit further configured to:确定所述请求信息对应的目标小程序标识和目标时间戳;Determine the target applet identifier and target timestamp corresponding to the request information;响应于确定所述目标小程序标识和预设的小程序标识不匹配、或者所述目标时间戳已过期,确定所述识别结果为所述请求信息为爬虫。In response to determining that the target applet identifier does not match a preset applet identifier, or the target timestamp has expired, determine that the identification result is that the request information is a crawler.15.根据权利要求10所述的装置,还包括:15. The apparatus of claim 10, further comprising:网络访问单元,被配置成确定所述请求信息对应的加密网络地址;确定所述加密网络地址中的第一加密索引和第二加密索引;基于所述第一加密索引和所述第二加密索引,对所述加密网络地址进行解密,得到解密网络地址;基于所述解密网络地址,进行网络访问。a network access unit, configured to determine an encrypted network address corresponding to the request information; determine a first encrypted index and a second encrypted index in the encrypted network address; based on the first encrypted index and the second encrypted index , decrypt the encrypted network address to obtain a decrypted network address; and perform network access based on the decrypted network address.16.根据权利要求10所述的装置,其中,所述爬虫识别单元进一步被配置成:16. The apparatus of claim 10, wherein the crawler identification unit is further configured to:响应于确定所述目标反爬虫操作为所述数据分析识别操作,获取爬虫分析数据;In response to determining that the target anti-crawler operation is the data analysis and identification operation, acquiring crawler analysis data;基于所述爬虫分析数据,对所述请求信息进行爬虫识别,得到所述请求信息对应的所述识别结果。Based on the crawler analysis data, crawler identification is performed on the request information, and the identification result corresponding to the request information is obtained.17.根据权利要求10所述的装置,其中,所述爬虫识别单元进一步被配置成:17. The apparatus of claim 10, wherein the crawler identification unit is further configured to:响应于确定所述目标反爬虫操作为所述签名同步识别操作,确定所述请求信息中的签名信息;In response to determining that the target anti-crawling operation is the signature synchronization identification operation, determining signature information in the request information;基于所述签名信息和预设的签名信息,得到所述识别结果。Based on the signature information and preset signature information, the identification result is obtained.18.根据权利要求10所述的装置,还包括:18. The apparatus of claim 10, further comprising:分数输出单元,被配置成基于所述识别结果,确定所述请求信息对应的爬虫分数信息;输出所述爬虫分数信息。The score output unit is configured to determine the crawler score information corresponding to the request information based on the identification result; and output the crawler score information.19.一种电子设备,包括:19. An electronic device comprising:至少一个处理器;以及at least one processor; and与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-9中任一项所述的方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 1-9 Methods.20.一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-9中任一项所述的方法。20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any of claims 1-9.21.一种计算机程序系统,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-9中任一项所述的方法。21. A computer program system comprising a computer program which, when executed by a processor, implements the method of any of claims 1-9.
CN202111316197.6A2021-11-082021-11-08Method, apparatus, device, medium, and system for identifying crawlersActiveCN114036364B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111316197.6ACN114036364B (en)2021-11-082021-11-08Method, apparatus, device, medium, and system for identifying crawlers

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111316197.6ACN114036364B (en)2021-11-082021-11-08Method, apparatus, device, medium, and system for identifying crawlers

Publications (2)

Publication NumberPublication Date
CN114036364A CN114036364A (en)2022-02-11
CN114036364Btrue CN114036364B (en)2022-10-21

Family

ID=80136842

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111316197.6AActiveCN114036364B (en)2021-11-082021-11-08Method, apparatus, device, medium, and system for identifying crawlers

Country Status (1)

CountryLink
CN (1)CN114036364B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114640538A (en)*2022-04-012022-06-17北京明略昭辉科技有限公司Crawler program detection method and device, readable medium and electronic equipment
CN115098757A (en)*2022-06-272022-09-23平安银行股份有限公司 A network crawler identification method, device, system and equipment
CN115329291A (en)*2022-08-082022-11-11广州鑫景信息科技服务有限公司Anti-crawler method, system, computer equipment and storage medium
CN116015938A (en)*2022-12-302023-04-25数字广东网络建设有限公司Anti-crawler method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106790105A (en)*2016-12-262017-05-31携程旅游网络技术(上海)有限公司Reptile identification hold-up interception method and system based on business datum
CN110858229A (en)*2018-08-232020-03-03阿里巴巴集团控股有限公司Data processing method, device, access control system and storage medium
CN111611462A (en)*2020-04-092020-09-01北京歌华有线电视网络股份有限公司 A kind of APP data acquisition method and system
CN112417240A (en)*2020-02-212021-02-26上海哔哩哔哩科技有限公司Website link detection method and device and computer equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
GB2391346A (en)*2002-07-312004-02-04Hewlett Packard CoOn-line recognition of robots
CN103164446A (en)*2011-12-142013-06-19阿里巴巴集团控股有限公司Webpage request information response method and webpage request information response device
CN103279516B (en)*2013-05-272016-09-14百度在线网络技术(北京)有限公司Web spider identification method
CN105812366B (en)*2016-03-142019-09-24携程计算机技术(上海)有限公司Server, anti-crawler system and anti-crawler verification method
CN107092660A (en)*2017-03-282017-08-25成都优易数据有限公司A kind of Website server reptile recognition methods and device
CN108777687B (en)*2018-06-052020-04-14掌阅科技股份有限公司Crawler intercepting method based on user behavior portrait, electronic equipment and storage medium
CN112073412A (en)*2020-09-082020-12-11北京天融信网络安全技术有限公司Anti-crawler method, device, processor and computer readable medium
CN112688919A (en)*2020-12-112021-04-20杭州安恒信息技术股份有限公司APP interface-based crawler-resisting method, device and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106790105A (en)*2016-12-262017-05-31携程旅游网络技术(上海)有限公司Reptile identification hold-up interception method and system based on business datum
CN110858229A (en)*2018-08-232020-03-03阿里巴巴集团控股有限公司Data processing method, device, access control system and storage medium
CN112417240A (en)*2020-02-212021-02-26上海哔哩哔哩科技有限公司Website link detection method and device and computer equipment
CN111611462A (en)*2020-04-092020-09-01北京歌华有线电视网络股份有限公司 A kind of APP data acquisition method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于网站访问行为的匿名爬虫检测;邹建鑫 等;《基于网站访问行为的匿名爬虫检测》;20171231;第27卷(第12期);103-107,114*

Also Published As

Publication numberPublication date
CN114036364A (en)2022-02-11

Similar Documents

PublicationPublication DateTitle
CN114036364B (en)Method, apparatus, device, medium, and system for identifying crawlers
KR102429406B1 (en) Check user interactions on the content platform
AU2021204543B2 (en)Digital signature method, signature information verification method, related apparatus and electronic device
CN109241484B (en)Method and equipment for sending webpage data based on encryption technology
CN104796257A (en)Flexible data authentication
CN108848058A (en)Intelligent contract processing method and block catenary system
CN109743161B (en) Information encryption method, electronic device and computer readable medium
CN114363088B (en)Method and device for requesting data
CN114500054A (en)Service access method, service access device, electronic device, and storage medium
CN119961890B (en)Model fingerprint embedding and model copyright authentication method, device and medium
CN115238310A (en)Data encryption and decryption method, device, equipment and storage medium
CN115580489B (en) Data transmission method, device, equipment and storage medium
CN114884714B (en)Task processing method, device, equipment and storage medium
CN113794706A (en) Data processing method, apparatus, electronic device and readable storage medium
CN112565156B (en)Information registration method, device and system
US10013539B1 (en)Rapid device identification among multiple users
CN113609156B (en)Data query and write method and device, electronic equipment and readable storage medium
CN115694902A (en) Second kill request method and second kill verification method, device, system and medium
CN115484080A (en)Data processing method, device and equipment of small program and storage medium
CN110990822B (en) Verification code generation and verification method, system, electronic device and storage medium
CN110740112B (en)Authentication method, apparatus and computer readable storage medium
CN114117388A (en) Device registration method, device registration device, electronic device, and storage medium
CN115879122A (en) Open platform management method, device, equipment and storage medium
CN120470632B (en) A method, system, device and storage medium for verifying sensitive data in a database
CN111294326B (en) Method, apparatus, device and medium for confirming system data security

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp