CN111786937B

Movatterモバイル変換

Info

Publication number: CN111786937B
Application number: CN202010045431.5A
Authority: CN
Inventors: 李川; 程钰; 胡玉麒; 李明程
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2022-09-06
Anticipated expiration: 2040-01-16
Also published as: CN111786937A

Abstract

Description

Method, apparatus, electronic device and readable medium for identifying malicious requests

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to the technical field of internet data processing, and particularly relates to a method and a device for identifying malicious requests.

Background

The internet establishes a link between the platform and the user. The user initiates a request to the platform through the internet and obtains a response of the platform through the internet. However, with the diversification of internet contents and interactive ways, malicious requests for acquiring contents and resources through abnormal means have emerged. For example, some "black and gray products" initiate a large amount of malicious requests in a malicious manner, and these malicious requests not only affect the processing result of normal requests, but also occupy a large amount of network resources, thereby increasing the load of the platform server.

The common identification mode of malicious requests is a rule-based mode, and a rule based on statistical indexes is formed according to high-frequency aggregation behaviors of the requests in dimensions such as IP, mobile phone numbers, equipment numbers and the like or through combination aggregation behaviors of different dimensions and the like. For example, one rule of the wind control system is to intercept the request of a mobile number segment when the number of requests of the mobile number segment is greater than N times within time T.

More internet technologies are currently used to generate malicious requests in batches, such as IP proxies, mobile phone authentication code platform, device information tampering, and so on. The technologies can change own request information, reduce the aggregation of requests in each dimension, and reduce the identification capability of the existing malicious request identification method.

Disclosure of Invention

Embodiments of the present disclosure propose methods and apparatuses, electronic devices, and computer-readable media for identifying malicious requests.

In some embodiments, the determining the association request of the request to be identified based on the request data of the request to be identified and the request data of the obtained history request includes: creating nodes for representing requests to be identified in a request structure diagram constructed based on request data of historical requests, wherein the nodes in the request structure diagram represent the requests, and edges in the request structure diagram represent the association relationship between the requests corresponding to the two connected nodes; updating the request structure chart based on the incidence relation between the request data of the request to be identified and the request data of the historical request represented by the nodes in the request structure chart; and taking the request corresponding to the association node associated with the request to be identified in the updated request structure chart as the association request of the request to be identified.

In some embodiments, the updating the request structure diagram based on the association relationship between the request data of the request to be identified and the request data of the historical requests characterized by the nodes in the request structure diagram includes: screening out nodes, which are used as first matching nodes, of the represented request data of the historical requests and the request data of the requests to be identified, wherein the request data of the historical requests and the request data of the requests to be identified meet preset matching conditions; and creating an edge connecting the node corresponding to the request to be identified and the first matching node in the request graph structure.

In some embodiments, before the request corresponding to the association node associated with the request to be identified in the updated request structure diagram is used as the association request of the request to be identified, the determining the association request of the request to be identified based on the request data of the request to be identified and the acquired request data of the history request further includes: determining a request pair to be identified, wherein the corresponding request data meet preset matching conditions; and creating an edge connecting the determined request pair to be identified in the request structure chart so as to update the request structure chart.

In some embodiments, the taking the request corresponding to the association node associated with the request to be identified in the updated request structure diagram as the association request of the request to be identified includes: and determining at least one node in the request structure diagram, which is connected with the nodes corresponding to the to-be-identified requests through edges not larger than a preset number, as an associated node associated with the to-be-identified requests.

In some embodiments, the determining the request characteristic of the request to be identified based on the request data of the request to be identified and the request data of the request to be identified for associating includes: embedding a pre-trained graph into a neural network, and performing characteristic coding on request data corresponding to the associated node associated with the request to be identified to obtain the characteristic coding of the associated node of the request to be identified; and determining the request characteristics of the request to be identified based on the characteristic codes of the associated nodes of the request to be identified.

In some embodiments, the determining the request characteristics of the request to be identified based on the characteristic codes of the associated nodes of the request to be identified includes: and selecting the maximum value in the feature codes of the associated nodes of the request to be identified as the request feature of the request to be identified.

In some embodiments, the above method further comprises: acquiring a malicious request identification result of a historical request; constructing sample request data according to the malicious request identification result of the historical request and the request data of the historical request; embedding a neural network and a classifier based on the sample request data training diagram; wherein, request data training diagram embedding neural network and classifier based on the sample, include: acquiring a request structure chart constructed based on request data of a historical request; for a target node in the request structure chart, determining an associated node of the target node, encoding the characteristics of the associated node of the target node based on the graph to be trained embedded into a neural network, and determining a request characteristic corresponding to the target node based on the characteristic encoding of the associated node of the target node; determining whether the historical request corresponding to the target node is a prediction result of a malicious request or not by adopting a classifier to be trained based on the request characteristics corresponding to the target node; and determining whether the historical request corresponding to the target node by the classifier to be trained is a prediction error of the malicious request according to the obtained malicious request identification result of the historical request, and iteratively adjusting parameters of the graph embedding neural network to be trained and the classifier based on the prediction error.

In some embodiments, the above method further comprises: and responding to the request to be identified which is determined to be a malicious request, and intercepting the request to be identified.

In some embodiments, the determining unit is configured to determine the association request of the request to be identified according to the following manner based on the request data of the request to be identified and the request data of the acquired history request: creating nodes for representing the requests to be identified in a request structure diagram constructed based on request data of historical requests, wherein the nodes in the request structure diagram represent the requests, and edges in the request structure diagram represent the association relationship between the requests corresponding to the two connected nodes; updating the request structure chart based on the incidence relation between the request data of the request to be identified and the request data of the historical request represented by the nodes in the request structure chart; and taking the request corresponding to the association node associated with the request to be identified in the updated request structure chart as the association request of the request to be identified.

In some embodiments, the determining unit is configured to update the request structure diagram as follows: screening out nodes of the request structure chart, wherein the nodes are used for selecting the request data of the represented historical request and the request data of the request to be identified, which meet the preset matching conditions, and the nodes are used as first matching nodes; and creating an edge connecting the node corresponding to the request to be identified and the first matching node in the request graph structure.

In some embodiments, the determining unit is further configured to update the request structure diagram as follows: determining a request pair to be identified, wherein the corresponding request data meet preset matching conditions; and creating an edge connecting the determined request pair to be identified in the request structure chart so as to update the request structure chart.

In some embodiments, the determining unit is configured to determine the association request of the request to be identified as follows: and determining at least one node in the request structure diagram, which is connected with the nodes corresponding to the to-be-identified requests through edges not larger than a preset number, as an associated node associated with the to-be-identified requests.

In some embodiments, the determining unit is configured to determine the request characteristics of the request to be identified as follows: embedding a pre-trained graph into a neural network, and performing characteristic coding on request data corresponding to the associated node associated with the request to be identified to obtain the characteristic coding of the associated node of the request to be identified; and determining the request characteristics of the request to be identified based on the characteristic codes of the associated nodes of the request to be identified.

In some embodiments, the determining unit is configured to determine the request characteristics of the request to be identified as follows: and selecting the maximum value in the feature codes of the associated nodes of the request to be identified as the request feature of the request to be identified.

In some embodiments, the apparatus further comprises a training unit configured to: acquiring a malicious request identification result of a historical request; constructing sample request data according to the malicious request identification result of the historical request and the request data of the historical request; embedding a neural network and a classifier based on the sample request data training diagram; wherein the training unit is configured to train the graph-embedded neural network and classifier as follows: acquiring a request structure diagram constructed based on request data of a history request; determining the associated nodes of the target nodes for the target nodes in the request structure chart, coding the characteristics of the associated nodes of the target nodes based on the graph to be trained embedded into the neural network, and determining the request characteristics corresponding to the target nodes based on the characteristic codes of the associated nodes of the target nodes; determining whether the historical request corresponding to the target node is a prediction result of a malicious request or not by adopting a classifier to be trained based on the request characteristics corresponding to the target node; and determining whether the historical request corresponding to the target node by the classifier to be trained is a prediction error of the malicious request according to the malicious request identification result of the obtained historical request, and iteratively adjusting parameters of the graph to be trained, which is embedded into the neural network and the classifier, based on the prediction error.

In some embodiments, the above apparatus further comprises: an interception unit configured to: and in response to the fact that the request to be identified is determined to be a malicious request, intercepting the request to be identified.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; storage means for storing one or more programs which, when executed by one or more processors, cause the one or more processors to carry out the method for identifying malicious requests as provided in the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method for identifying malicious requests provided by the first aspect.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which embodiments of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for identifying malicious requests, according to the present disclosure;

FIG. 3 is a flow diagram of another embodiment of a method for identifying malicious requests in accordance with the present disclosure;

FIG. 4 is a schematic diagram of a request structure diagram in the method for identifying malicious requests of the present disclosure;

FIG. 5 is a block diagram illustrating an embodiment of an apparatus for identifying malicious requests according to the present disclosure;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates anexample system architecture 100 to which the disclosed method for identifying malicious requests or apparatus for identifying malicious requests may be applied.

As shown in fig. 1, thesystem architecture 100 may include

terminal devices

101, 102, 103, anetwork 104, and aserver 105. Thenetwork 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and theserver 105.Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 interact with aserver 105 via anetwork 104 to receive or send messages or the like. The

end devices

101, 102, 103 may be customer premises devices on which various internet applications may be installed. For example, shopping platform class, social application, audio-video playing class application, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

Theserver 105 may be a server running various services, for example a server providing background support for applications running on the

terminal devices

101, 102, 103. Theserver 105 may receive the request sent by the

terminal devices

101, 102, 103, process the request to generate response data, and feed the response data back to the

terminal devices

101, 102, 103.

In the context of the present disclosure, theserver 105 may, after receiving a request sent by the

terminal devices

101, 102, 103, etc., determine whether the request is a malicious request according to the characteristics of the request. Theserver 105 may also perform an interception operation according to the discrimination result, such as transmitting security authentication information to the requesting

terminal apparatuses

101, 102, 103, and the like.

It should be noted that the method for identifying a malicious request provided by the embodiment of the present disclosure is generally performed by theserver 105, and accordingly, the apparatus for identifying a malicious request is generally disposed in theserver 105.

In some scenarios, theserver 105 may retrieve the request to be identified from a database, memory, or other device. At this point, theexemplary system architecture 100 may be absent of the

terminal devices

101, 102, 103 and thenetwork 104.

Theserver 105 may be hardware or software. When theserver 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When theserver 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, aflow 200 of one embodiment of a method for identifying malicious requests in accordance with the present disclosure is shown. The method for identifying malicious requests comprises the following steps:

step 201, obtaining request data of a request to be identified.

In this embodiment, an execution subject of the method for identifying a malicious request may obtain request data of at least one request to be identified. The request to be identified may be a request sent by a user side received in real time, or a request received and recorded in a historical time period.

As an example, a user may send a resource obtaining request to an internet platform, and a background server of the internet platform may obtain the resource obtaining request sent by the user as a request to be identified. Or, the internet platform may use each received resource obtaining request as the request to be identified.

The request data may include an identification of the request (e.g., request ID), an IP address of the request, and a request time. In some scenarios, the request data may also include an identification of the requesting party that made the request, such as a mobile phone number of the requesting user, an identification of a user account, and so on. The execution main body may obtain the request data of the request to be identified by parsing the request to be identified, or may obtain the parsed request data of the request to be identified from a database or other devices.

Step 202, determining an association request of the request to be identified based on the request data of the request to be identified and the request data of the acquired historical request, and determining the request characteristic of the request to be identified based on the request data of the request to be identified and the request data of the association request of the request to be identified.

The historical request may be a request over a historical period of time. Request data of the history request may be stored in the storage medium of the execution body. The execution subject may obtain the request data of the history request by reading the memory. The above-mentioned history time period may be set in advance, and may be a time period before the time when the request to be identified is received.

Request data for the historical requests may include an identification of the historical requests, an IP address of the historical requests, a request time of the historical requests, and so forth.

In this embodiment, correlation analysis may be performed on the request data of the request to be identified and the request data of the historical request to determine the correlation request of the request to be identified. Specifically, the degree of association between corresponding requests may be determined according to the degree of similarity or correlation between request data. For example, the similarity between two request data may be calculated according to at least one of the request identification, the IP address, the request time, and the like, and the request having the similarity larger than a preset threshold may be determined as the association request with each other.

Or, feature extraction may be performed on the request data of the request to be identified, feature extraction may also be performed on the request data of each history request, and whether the request data feature of the request to be identified and the request data feature of the history request are related requests may be determined according to the similarity or the correlation degree between the request data feature of the request to be identified and the request data feature of the history request.

In some embodiments, the request data of a plurality of requests to be identified may be acquired simultaneously, and then correlation analysis may be performed on different requests to be identified based on the request data of the requests to be identified, so as to determine the mutually correlated requests to be identified. That is, for one to-be-identified request, the association request may be determined from the history request, or the association request may be determined from other to-be-identified requests.

After determining the association request of the request to be identified, the request data of the request to be identified and the request data of the association request thereof may be combined to determine the request characteristics of the request to be identified. Specifically, the request to be identified and the associated request thereof may be subjected to feature extraction, and the extracted features are fused in manners of splicing, weighted sum, and the like to serve as the request features of the request to be identified.

And step 203, based on the request characteristics of the request to be identified, adopting the trained classifier to identify whether the request to be identified is a malicious request.

The request characteristics of the to-be-identified request obtained instep 202 may be input to a classifier that has been trained based on the malicious request/non-malicious request classification task, and the classifier may output a result of determining whether the to-be-identified request is a malicious request.

The classifier can adopt the classifier in the existing deep learning model for identifying the malicious request, and can calculate the probability value corresponding to the malicious request or the non-malicious request by carrying out nonlinear transformation on the characteristics of the request to be identified. And then determining whether the request to be identified is a malicious request according to whether the probability value is greater than a preset confidence threshold.

It should be noted that the malicious request according to the present disclosure refers to a request for obtaining internet resources through an illegal means, and may include, but is not limited to, a request generated by modifying request information through various internet technologies, such as an IP proxy, a mobile phone authentication code platform, and device information tampering.

According to the method for identifying the malicious request in the embodiment of the disclosure, the association request of the request to be identified is determined according to the association between the request data of the request to be identified and the request data of the historical request, and the request characteristic of the request to be identified is determined based on the request to be identified and the request data of the association request, so that the aggregation of the characteristics of the association request is realized, and the identification capability of the malicious request with weak aggregation is improved.

With continued reference to FIG. 3, a flow diagram of another embodiment of a method of the present disclosure for identifying malicious requests is shown. As shown in fig. 3, aflow 300 of the method for identifying a malicious request of the present embodiment includes the following steps:

step 301, obtaining request data of the request to be identified.

The execution body may, in response to receiving the request, regard the received request as a to-be-identified request, and obtain request data of the request, including a request identifier, a requested IP address, a request time, and the like. In some scenarios, the request data may also include an identification of the requesting party that issued the request, such as a mobile phone number of the requesting user, a user account identification, and the like.

The execution main body may also extract a request that is not yet determined to be malicious or not from the acquired requests, and obtain request data of the request to be identified by parsing the request to be identified.

Instep 302, a node characterizing a request to be identified is created in a request structure graph constructed based on request data of the acquired historical requests.

In this embodiment, the execution subject of the method for identifying a malicious request may construct a request structure diagram based on request data of historical requests. The request structure graph includes nodes and edges connecting the nodes. The nodes in the request structure chart represent requests, and the edges in the request structure chart represent incidence relations between the requests corresponding to the two connected nodes.

Specifically, when the request structure diagram is constructed, nodes representing the acquired history requests may be created first, and then edges between the nodes may be created according to whether the request data of the corresponding history requests satisfy a preset association relationship. For example, when it is determined that the request data of the history request represented by the node 1 and the request data of the history request represented by the node 2 satisfy the preset association relationship, an edge connecting the node 1 and the node 2 is created in the request structure diagram.

The preset association relationship may be a relationship between at least one item of predefined request data. Optionally, the constructed edges may be respectively marked according to different preset association relationships. As an example, the following association relationship may be used to establish the edge relationship between nodes:

the request IP of the two requests is the same, and the two requests initiate the requests in the same preset time period (such as within one hour), then an edge e1 is added to the two requests;

the first 3 segments of the two requests are the same, and the two requests initiate requests in the same preset time period (for example, within the same 10 minutes), then one edge e2 is added to the two requests;

if the mobile phone numbers of the requesters of the two requests are the same and the two requests initiate requests within the same preset time period (for example, within the same hour), adding an edge e3 for the two requests;

the first 7 digits of the mobile phone numbers of the requesters of the two requests are the same, and the two requests initiate the requests within the same preset time period (for example, within the same 10 minutes), an edge e4 is added to the two requests.

Fig. 4 is a schematic diagram of a request structure diagram in the method for identifying malicious requests according to the present disclosure. As shown in FIG. 4, the request nodes representing request 1, request 2, request 3, … and request n are connected through edges e1, e2, e3 and e4 representing the association type of the request nodes and the request nodes.

It should be noted that the preset association relationship is not limited to the association relationship listed in the above example, and the association relationship condition that needs to be satisfied when an edge is created between two nodes may be set according to actual needs.

Optionally, the request data of the history request may further include feature data obtained by performing feature extraction on the history request, and when the request structure diagram is constructed, if the two pieces of requested feature data satisfy a preset association relationship, an edge may also be added between the two corresponding nodes.

Optionally, the request data of the history request may further include a tag that characterizes a history identification result of whether the history request is a malicious request. The historical recognition result can be recognized by a rule-based method or can be obtained by adopting a trained malicious request recognition model. When the request structure diagram is constructed, malicious requests, non-malicious requests and historical requests which are unknown whether to be malicious requests can be distinguished by configuring the attributes (such as colors and the like) of the nodes.

Optionally, nodes in the request graph may be configured with respective risk tag values to characterize whether the corresponding historical request is a malicious request. For example, the risk tag value of a non-malicious request is 0, the risk tag value of a history request unknown to be a malicious request is-1, and the risk tag value of a malicious request is 1.

In practice, if the existing malicious request identification method is adopted to determine that the probability value of a certain historical request as a malicious request is lower than a preset threshold, the historical request is not determined as the malicious request. In this embodiment, the identification result of the existing malicious request identification method may be modified according to the following formula (1):

wherein y represents the modified malicious identification result label, y-1 represents that the modified identification result is a malicious request, and y-0 represents that the modified identification result is a non-malicious request; y ' represents a malicious identification result label before correction, y ' is 1 and represents that the request is a malicious request, and y ' is 0 and represents that the request is a non-malicious request; n is the number of requests or the statistical value of the request frequency in a single request data dimension, such as the number of requests for the same request IP, the number of requests for the same mobile phone number, the number of requests for the same request in the first three segments of the IP, etc., and thr is a preset threshold.

Step 303, updating the request structure diagram based on the association relationship between the request data of the request to be identified and the request data of the history request characterized by the nodes in the request structure diagram.

In this embodiment, after the to-be-identified request is acquired, a newly-built node representing the to-be-identified request may be added to the request structure diagram. The association relationship between the request data of the request to be identified and the request data of each history request represented by each node in the request structure chart can be analyzed, and if the request data of a certain history request and the request data of the request to be identified meet the preset association relationship, an edge is created between the node representing the history request and a newly-built node representing the request to be identified. Further, the created edge may also be marked according to the type of the association relationship between the two (e.g., the type that the request time is close and the request IPs are the same, the type that the request IPs are similar and the request time is close, etc.).

In some embodiments, the request structure diagram may be updated as follows: and screening out nodes of which the represented request data of the historical requests and the request data of the requests to be identified meet preset matching conditions from the nodes of the request structure chart, taking the nodes as first matching nodes, and creating edges for connecting the nodes corresponding to the requests to be identified and the first matching nodes in the request structure.

For each request to be identified, it may be determined whether the request data of the history request represented by each node in the pre-constructed request structure diagram and the request data of the request to be identified satisfy a preset matching condition, where the preset matching condition may be, for example: the request IPs are the same and the request interval does not exceed 1 hour, the first 3 segments of the request IPs are the same and the request time interval does not exceed 10 minutes, the mobile phone numbers of the requesting parties are the same and the request interval does not exceed 1 hour, the first 7 digits of the mobile phone numbers of the requesting parties are the same and the request interval does not exceed 10 minutes, and the like. The preset matching condition may be that the matching degree calculated based on the request data reaches a preset matching degree threshold.

If the request data of the history request and the request data of the request to be identified meet the preset matching condition, the node corresponding to the history request can be used as a first matching node of the request to be identified, and an edge connecting the first matching node and the node corresponding to the request to be identified is added in the request structure chart.

By connecting the node corresponding to the request to be identified with the existing node in the request structure diagram based on the matching relationship between the request data of the request to be identified and the request data of the historical request, the request structure diagram can be quickly updated, and the historical request associated with the request to be identified can be determined.

In a further embodiment, after the request structure diagram is updated based on the request data of the history request, the request structure diagram is continuously updated by the following method: and determining a request pair to be identified, the corresponding request data of which meets the preset matching conditions, and creating an edge connecting the determined request pair to be identified in the request structure chart.

Specifically, if the number of the acquired requests to be identified is greater than 1, the association relationship between different requests to be identified may also be analyzed, and if it is determined that the request data of two requests to be identified satisfies the preset matching condition, nodes corresponding to the two requests to be identified may be connected by edges in the request structure diagram.

In this way, the updated request structure diagram not only includes the association between the request to be identified and the historical request, but also includes the association between different requests to be identified, which is helpful to further enhance the aggregation between the associated requests.

Step 304, taking the request corresponding to the association node associated with the request to be identified in the updated request structure diagram as the association request of the request to be identified.

In this embodiment, an associated node associated with a node corresponding to the request to be identified may be found according to the updated request structure diagram, and a request corresponding to the associated node is used as an associated request of the request to be identified. Here, the association relationship between two nodes in the request structure diagram may be determined according to whether the two nodes are directly or indirectly connected or according to a distance between the two nodes, and specifically, it may be assumed that lengths of all edges are the same, or a corresponding distance weight is configured according to an association relationship type corresponding to different edges and a distance between the two nodes is calculated according to the distance weight. For example, two nodes connected by one edge are associated nodes, or two nodes with a distance smaller than a preset distance threshold are associated nodes.

In some optional implementation manners of this embodiment, at least one node in the request structure diagram, to which a node corresponding to the request to be identified is connected through edges of which the number is not greater than a preset number, may be determined as an associated node associated with the request to be identified, and then it is determined that the request corresponding to the associated node associated with the request to be identified is an associated request of the request to be identified.

As an example, if the preset number is 2, a node corresponding to the request to be identified and directly connected to the node through an edge may be determined from the request structure diagram as a first-layer node, and a node connected to the first-layer node through an edge may be determined as a second-layer node. The first layer of nodes and the second layer of nodes are the associated nodes associated with the nodes corresponding to the to-be-identified requests.

Step 305, determining the request characteristics of the request to be identified based on the request data of the request to be identified and the request data of the request to be identified for associating.

The request data of the request to be identified and the request data of the association request of the request to be identified can be subjected to feature extraction, and the extracted features are fused to obtain the request features of the request to be identified. In the feature fusion, the feature of the request data of each association request may be weighted according to the distance between the corresponding nodes.

In some optional implementation manners of this embodiment, a pre-trained graph may be embedded in the neural network, the feature code of the request data corresponding to the associated node associated with the request to be identified is obtained by performing feature coding on the request data, and the request feature of the request to be identified is determined based on the feature codes of the associated nodes of the request to be identified.

The graph-embedded neural network may propagate the characteristics of each node to neighboring nodes according to the edge relationships in the request graph structure. The characteristics propagated by the neighbor nodes can be aggregated at each node, and the updating of the characteristic information of the node is completed. Through the information propagation and feature aggregation processes of k times (k is a positive integer) of edge relation cascade connection, the graph embedding neural network can complete the aggregation coding of the features of each node.

Here, if the a node and the B node are connected by one edge in the request structure diagram, the B node is a node of the first-level cascade of the a node. If the node B and the node C are connected through one edge in the request structure chart and no edge exists between the node A and the node C, the node C is a secondary cascade node of the node A, and so on, the graph embedded neural network can aggregate the characteristics of all nodes cascaded within k levels for each node.

The graph is embedded into a neural network, and the feature codes of the kth layer nodes (namely the nodes of the k-level cascade) are subjected to nonlinear transformation according to the formula (2):

wherein,

representing the characteristics of the k-th level nodes, N being the number of associated nodes of the k-th level among all the associated nodes, d^k And for the input layer, the feature h is extracted based on the existing malicious recognition model.

Representing dimension from d to k-th layer feature^k Mapping to d^k+1 The activation is a nonlinear activation function of the k-th layer, and Relu can be used as the activation function.

The characteristic information is transmitted to the associated node by the node.

And aggregating the information transmitted by all the k-level cascaded nodes to obtain the characteristic information of the node.

Alternatively, when determining the request feature of the request to be identified by using the graph-embedded neural network, the maximum value among feature codes of each associated node of the request to be identified may be selected as the feature of the request to be identified. That is, the characteristics of the request to be identified are determined using the following equation (3):

wherein,

and representing all the associated nodes of the node i corresponding to the request to be identified and the set of the node i.

The larger the value of the characteristic transformation of the request data corresponding to the node after nonlinear transformation is, the larger the probability that the request corresponding to the node is a malicious request is. By taking the maximum value of the feature codes of all the associated nodes after the nonlinear transformation, the aggregation of the features of the request data of the same dimension can be further improved.

And step 306, based on the request characteristics of the request to be identified, adopting the trained classifier to identify whether the request to be identified is a malicious request.

In this embodiment, the request features of the request to be identified obtained instep 305 may be input into a trained classifier for classification.

The trained classifier can be an output layer of a graph embedding model, and the node characteristics after the final layer of set are assumed to be h E R^d Then, the output layer calculates the probability that the request to be identified is a malicious request by:

o＝sigmoid(W_o h) (4)

wherein, W_o ∈R^d And indicates parameters of the output layer.

Whether the request to be identified is an identification result of the malicious request can be further determined according to the probability that the request to be identified is the malicious request.

In the embodiment, the incidence relation among different requests is described by constructing a request graph structure, and then the characteristics of the nodes are extracted through the graph embedded neural network, so that the information of the nodes with the incidence relation can be conducted, the expression capability of the characteristics of the request data in the weak incidence relation and the weak aggregation scene is improved, and the identification capability of the weak aggregation malicious requests can be further improved.

In some optional implementations of the foregoing embodiment, the flow of the method for identifying a malicious request may further include: acquiring a malicious request identification result of a historical request; and constructing sample request data according to the malicious request identification result of the historical request and the request data of the historical request, and embedding a neural network and a classifier based on the sample request data training diagram.

The malicious request identification result of the historical request can be obtained by identification of an existing malicious request identification model or according to a predefined rule. It should be noted that, in the collected historical requests, there may be some historical requests that do not have malicious request recognition result tags, and the embodiment may also embed a neural network and a classifier by using a semi-supervised method training graph.

In particular, a method of embedding neural networks and classifiers based on sample request data training diagrams may include performing training operations for a plurality of iterations. The training operation comprises: firstly, acquiring a request structure chart constructed based on request data of a historical request; then, for a target node in the request structure chart, determining an associated node of the target node, encoding the characteristics of the associated node of the target node based on the graph to be trained embedded into the neural network, and determining a request characteristic corresponding to the target node based on the characteristic encoding of the associated node of the target node; then, based on the request characteristics corresponding to the target node, determining whether the historical request corresponding to the target node is a prediction result of a malicious request by adopting a classifier to be trained; and finally, determining whether the historical request corresponding to the target node by the classifier to be trained is a prediction error of the malicious request according to the obtained malicious request identification result of the historical request, and iteratively adjusting parameters of the graph to be trained, which is embedded into the neural network and the classifier, based on the prediction error.

In the training operation, for the target node in the request structure diagram, the method described in

steps

303 and 304 may be used to determine the associated node, and then the method described instep 305 may be used to determine the request characteristic of the target node. And then classifying the request by using a classifier to be trained based on the request characteristics of the target node to obtain a current malicious request identification result. A loss function representing the difference between the recognition result of the currently-trained graph-embedded neural network and the classifier on the sample request data and the recognition result of the pre-acquired sample request data can be constructed, for example, a loss function loss can be constructed by adopting a binary cross entropy:

loss＝binary_cross_entroy(Y[mask],O[mask]) (5)

wherein, mask represents the sample request data participating in calculating error, Y [ mask ] and O [ mask ] respectively represent the identification result of the sample request data acquired in advance, the identification result of the current graph to be trained embedded into the neural network and the identification result of the classifier on the sample request data. Only historical requests which previously acquired the recognition result of the malicious request recognition can be taken for training.

Then, the parameters of the graph to be trained, the embedded neural network and the classifier to be trained are adjusted by adopting error reverse conduction.

Although the historical requests which do not acquire the recognition result of the malicious request recognition do not participate in calculating the loss function in advance, the historical requests participate in constructing a request graph structure and participate in the transfer of features between the requests, so that on one hand, the dependence of a training process on labeling information of positive and negative samples can be reduced, and on the other hand, the expression capacity of graph embedding neural networks and classifiers is improved.

In some optional implementations of the embodiments described above in connection with fig. 2 and 3, the method for identifying malicious requests may further include: and in response to the fact that the request to be identified is determined to be a malicious request, intercepting the request to be identified. The specific interception processing mode may be to add a verification process, such as sending a short message verification code, voice verification, face recognition, or may further improve security by combining the above methods. Therefore, malicious requests can be effectively intercepted, and the safety protection performance is improved for the malicious requests with weak aggregation.

Referring to fig. 5, as an implementation of the method for identifying a malicious request, the present disclosure provides an embodiment of an apparatus for identifying a malicious request, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2 and fig. 3, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 5, theapparatus 500 for identifying a malicious request according to the present embodiment includes an obtainingunit 501, a determiningunit 502, and an identifyingunit 503. Wherein the obtainingunit 501 is configured to obtain request data of a request to be identified; the determiningunit 502 is configured to determine an association request of the request to be identified based on the request data of the request to be identified and the request data of the acquired history request, and determine a request characteristic of the request to be identified based on the request data of the request to be identified and the request data of the association request of the request to be identified; the identifyingunit 503 is configured to identify whether the request to be identified is a malicious request by using a trained classifier based on the request characteristics of the request to be identified.

In some embodiments, the determiningunit 502 is configured to determine the association request of the request to be identified according to the following manner based on the request data of the request to be identified and the acquired request data of the history request: creating nodes for representing the requests to be identified in a request structure diagram constructed based on request data of historical requests, wherein the nodes in the request structure diagram represent the requests, and edges in the request structure diagram represent the association relationship between the requests corresponding to the two connected nodes; updating the request structure chart based on the incidence relation between the request data of the request to be identified and the request data of the historical request represented by the nodes in the request structure chart; and taking the request corresponding to the association node associated with the request to be identified in the updated request structure chart as the association request of the request to be identified.

In some embodiments, the determiningunit 502 is configured to update the request structure diagram as follows: screening out nodes of the request structure chart, wherein the nodes are used for selecting the request data of the represented historical request and the request data of the request to be identified, which meet the preset matching conditions, and the nodes are used as first matching nodes; and creating an edge connecting the node corresponding to the request to be identified and the first matching node in the request graph structure.

In some embodiments, the determiningunit 502 is further configured to update the request structure diagram according to the following manner: determining a request pair to be identified, wherein the corresponding request data meet preset matching conditions; and creating an edge connecting the determined request pair to be identified in the request structure chart so as to update the request structure chart.

In some embodiments, the determiningunit 502 is configured to determine the association request of the request to be identified as follows: and determining at least one node in the request structure diagram, which is connected with the nodes corresponding to the to-be-identified requests through edges not larger than a preset number, as an associated node associated with the to-be-identified requests.

In some embodiments, the determiningunit 502 is configured to determine the request characteristics of the request to be identified as follows: embedding a pre-trained graph into a neural network, and performing characteristic coding on request data corresponding to the associated node associated with the request to be identified to obtain the characteristic coding of the associated node of the request to be identified; and determining the request characteristics of the request to be identified based on the characteristic codes of the associated nodes of the request to be identified.

In some embodiments, the determiningunit 502 is configured to determine the request characteristics of the request to be identified as follows: and selecting the maximum value in the feature codes of the associated nodes of the request to be identified as the request feature of the request to be identified.

In some embodiments, the apparatus further comprises a training unit configured to: acquiring a malicious request identification result of a historical request; constructing sample request data according to the malicious request identification result of the historical request and the request data of the historical request; embedding a neural network and a classifier based on the sample request data training diagram; wherein the training unit is configured to train the graph-embedded neural network and classifier as follows: acquiring a request structure chart constructed based on request data of a historical request; determining the associated nodes of the target nodes for the target nodes in the request structure chart, coding the characteristics of the associated nodes of the target nodes based on the graph to be trained embedded into the neural network, and determining the request characteristics corresponding to the target nodes based on the characteristic codes of the associated nodes of the target nodes; determining whether a historical request corresponding to the target node is a prediction result of a malicious request or not by adopting a classifier to be trained based on the request characteristics corresponding to the target node; and determining whether the historical request corresponding to the target node by the classifier to be trained is a prediction error of the malicious request according to the obtained malicious request identification result of the historical request, and iteratively adjusting parameters of the graph embedding neural network to be trained and the classifier based on the prediction error.

In some embodiments, the above apparatus further comprises: an interception unit configured to: and responding to the request to be identified which is determined to be a malicious request, and intercepting the request to be identified.

The units in theapparatus 500 described above correspond to the steps in the method described with reference to fig. 2 and 3. Thus, the operations, features and technical effects described above for the method for identifying a malicious request are also applicable to theapparatus 500 and the units included therein, and are not described herein again.

Referring now to FIG. 6, a block diagram of an electronic device (e.g., the server shown in FIG. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6,electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In theRAM 603, various programs and data necessary for the operation of theelectronic apparatus 600 are also stored. The processing device 601, the ROM 602, and theRAM 603 are connected to each other via abus 604. An input/output (I/O) interface 605 is also connected tobus 604.

Generally, the following devices may be connected to the I/O interface 605:input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, or the like;output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; astorage device 608 including, for example, a hard disk; and acommunication device 609. The communication means 609 may allow theelectronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates anelectronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring request data of a request to be identified; determining an association request of the request to be identified based on the request data of the request to be identified and the acquired request data of the historical request, and determining the request characteristic of the request to be identified based on the request data of the request to be identified and the request data of the association request of the request to be identified; and based on the request characteristics of the request to be identified, adopting a trained classifier to identify whether the request to be identified is a malicious request.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a machine, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, which may be described as: a processor includes an acquisition unit, a determination unit, and an identification unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, the acquiring unit may also be described as a "unit that acquires request data of a request to be identified".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for identifying malicious requests, comprising:

acquiring request data of a request to be identified, wherein the request data comprises a request identifier;

determining the association request of the request to be identified based on the request data of the request to be identified and the acquired request data of the historical request, wherein the method comprises the following steps: determining an association request of the request to be identified based on the similarity between the request data characteristic of the request to be identified and the request data characteristic of the acquired historical request, and determining the request characteristic of the request to be identified based on the request data of the request to be identified and the request data of the association request of the request to be identified;

and identifying whether the request to be identified is a malicious request or not by adopting a trained classifier based on the request characteristics of the request to be identified.

2. The method of claim 1, wherein the determining the association request of the request to be identified based on the request data of the request to be identified and the request data of the obtained historical requests comprises:

creating a node for representing the request to be identified in a request structure diagram constructed based on request data of the historical requests, wherein the node in the request structure diagram represents the request, and an edge in the request structure diagram represents the association relationship between the requests corresponding to the two connected nodes;

updating the request structure chart based on the incidence relation between the request data of the request to be identified and the request data of the historical request characterized by the nodes in the request structure chart;

and taking the request corresponding to the association node associated with the request to be identified in the updated request structure chart as the association request of the request to be identified.

3. The method of claim 2, wherein the updating the request structure diagram based on the association between the request data of the request to be identified and the request data of the historical requests characterized by the nodes in the request structure diagram comprises:

screening out nodes, serving as first matching nodes, of the represented request data of the historical requests and the request data of the requests to be identified, meeting preset matching conditions from the nodes of the request structure chart;

and creating an edge connecting the node corresponding to the request to be identified and the first matching node in the request structure chart.

4. The method of claim 3, wherein before using the updated request structure diagram corresponding to the association node associated with the request to be identified as the request to associate with the request to be identified, the determining the request to associate with the request to be identified based on the request data of the request to be identified and the acquired request data of the historical request further comprises:

determining a request pair to be identified, wherein the corresponding request data meet the preset matching conditions;

and creating the edge of the request pair to be identified determined by connection in the request structure chart so as to update the request structure chart.

5. The method according to claim 3 or 4, wherein the taking the request corresponding to the association node associated with the request to be identified in the updated request structure diagram as the association request of the request to be identified comprises:

and determining at least one node in the request structure diagram, which is connected with the node corresponding to the request to be identified through edges not larger than a preset number, as an associated node associated with the request to be identified.

6. The method of claim 5, wherein the determining the request characteristics of the request to be identified based on the request data of the request to be identified and the request data of the request to be identified for association comprises:

embedding a pre-trained graph into a neural network, and performing feature coding on request data corresponding to the associated node associated with the request to be identified to obtain the feature code of the associated node of the request to be identified;

and determining the request characteristics of the request to be identified based on the characteristic codes of the associated nodes of the request to be identified.

7. The method of claim 6, wherein the determining the request characteristics of the request to be identified based on the characteristic codes of the associated nodes of the request to be identified comprises:

and selecting the maximum value in the feature codes of all the associated nodes of the request to be identified as the request feature of the request to be identified.

8. The method of claim 6 or 7, wherein the method further comprises:

acquiring a malicious request identification result of the historical request; and

constructing sample request data according to a malicious request identification result of the historical request and the request data of the historical request;

training the graph-embedded neural network and the classifier based on the sample request data;

wherein said training said graph-embedded neural network and said classifier based on said sample request data comprises:

acquiring a request structure chart constructed based on request data of the historical request;

for a target node in the request structure diagram, determining a relevant node of the target node, coding the characteristics of the relevant node of the target node based on a graph to be trained embedded into a neural network, and determining a request characteristic corresponding to the target node based on the characteristic code of the relevant node of the target node;

determining whether the historical request corresponding to the target node is a prediction result of a malicious request or not by adopting a classifier to be trained based on the request characteristics corresponding to the target node;

and determining whether the historical request corresponding to the target node by the classifier to be trained is a prediction error of the malicious request according to the obtained malicious request identification result of the historical request, and iteratively adjusting parameters of the graph embedded neural network to be trained and the classifier based on the prediction error.

9. The method of claim 1, wherein the method further comprises:

and in response to the fact that the request to be identified is determined to be a malicious request, intercepting the request to be identified.

10. An apparatus for identifying malicious requests, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire request data of a request to be identified, and the request data comprises an identifier of the request;

a determining unit configured to determine an association request of the request to be identified based on the request data of the request to be identified and the request data of the acquired history request, including: determining an association request of the request to be identified based on the similarity between the request data characteristic of the request to be identified and the request data characteristic of the acquired historical request, and determining the request characteristic of the request to be identified based on the request data of the request to be identified and the request data of the association request of the request to be identified;

the identification unit is configured to identify whether the request to be identified is a malicious request or not by adopting a trained classifier based on the request characteristics of the request to be identified.

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.

12. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-9.