Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As described in the background art, in order to solve the above-mentioned problem, in an exemplary embodiment of the present application, a method and an apparatus for determining a delay abnormality are provided.
According to the embodiment of the application, a method for determining delay abnormality is provided.
Fig. 1 is a flowchart of a method of determining a delay exception according to an embodiment of the present application. The determining method is applied to a cloud platform, wherein the cloud platform comprises a plurality of service components, as shown in fig. 1, and the determining method comprises the following steps:
Step S101, based on the operation log information of each service component, constructing an execution path of a user request on each target service component to obtain target path information, wherein one user request corresponds to one target path information, and one target path information at least comprises two target service components;
step S102, classifying each user request based on the target path information of each user request to obtain a plurality of request category groups, and calculating the compensation response time of the corresponding target service component when each user request is executed based on at least each target path information in each request category group, wherein one request category group comprises a plurality of user requests;
step S103, determining whether a delay fault occurs in the corresponding target service component based at least on the compensating response time of each target service component in all the request class groups.
In the method for determining delay abnormality, firstly, based on the operation log information of each service component, an execution path of a user request on each target service component is constructed to obtain target path information of each user request; then, classifying each user request based on target path information of each user request to obtain a plurality of request category groups, and calculating compensation response information when each target service component executes the corresponding user request in each request category group; finally, determining whether the corresponding target service component has delay faults based on at least the compensating response time of each target service component in all the request category groups. Compared with the prior art, whether the delay faults occur in the cloud platform is determined only through the log information of the service components or the resource utilization rate of the cloud platform, in the scheme, whether the delay faults occur in the corresponding target service components is determined at least based on the compensation response time of each target service component in all request type groups, namely, whether the delay faults occur in the cloud platform can be determined, and the delay faults can be accurately determined, so that the problem that the delay faults of the service components of the cloud platform are difficult to accurately detect in the prior art is solved, and further the fault processing efficiency of the cloud platform and the running reliability of the cloud platform are guaranteed to be higher.
In the actual application process, for a cloud platform (i.e. a cloud computing service platform), a user inputs a user request at different UI service ports, and the cloud platform directly returns the result. In fact, the user request input by the user through the UI service port is sequentially executed through a plurality of target service components in the cloud platform. Therefore, when one user requests input, the running log information of each service component in the cloud platform records the time stamp of starting and ending the execution request, the ID information corresponding to the executed user request and the source component (i.e. the upper component) of the user request, so that a path of executing each target service component when the user request is executed can be constructed according to the running log information of each service component on the cloud platform. In a specific embodiment of the present application, assuming that ID information corresponding to a user request is 00001, target path information thereof may be constructed as a directed graph as shown in fig. 2. I.e. the user request is input from the UI service port a (i.e. the UI service port 100), goes through the target service component B (the first target service component 101), the target service component C (the second target service component 102) and the target service component D (i.e. the third target service component 103), and its target path information is a— > B, B — > C and b— > D.
In a specific embodiment of the application, based on the target path information of each user request, each user request is classified to obtain a plurality of request category groups, so that target service components experienced by each user request in one request category group are the same, and further, the compensation response time based on each target service component is further ensured, and whether delay faults occur in the corresponding target service components is accurately determined.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
In order to more simply classify each user request, in one embodiment of the present application, classifying each user request based on the target path information of each user request to obtain a plurality of request class groups includes: dividing the target path information requested by each user to obtain a plurality of path elements, wherein each path element is a path formed by two adjacent target service components on the target path information, and one target path information corresponds to at least one path element; and dividing the user requests with the same path primitives into a category to obtain a plurality of request category groups.
In a specific embodiment of the present application, before dividing the target path information corresponding to each user request to obtain the path primitive, the target path information corresponding to each user request may be further classified to obtain the classified target path information. The process of grading the target path information corresponding to each user request may be: when a user inputs a user request from the UI service port, the user will go through each target service component on the execution path, so the first target service component directly connected to the UI service port can directly obtain the user request. And when the second target service component next to the first target service component executes the user request, the call command sent by the first target service component is required, so that the call of each target service component has a concept of level. Therefore, the target service components of each level can be classified according to the positions of the target service components on the execution paths of the corresponding user requests. The method comprises the following steps: for a user request, it corresponds to an execution path of a target service component, and the execution path can search out various unidirectional links according to the transmission direction of the user request. The level division of the target service components on each unidirectional link is independent of each other. Thus, for a unidirectional link from a UI service port, the level of each target service component is defined as follows: the target service component directly connected with the UI service port is a primary target service component, the service component which needs to be connected with the UI service port through the primary target service component is a secondary target service component, the target component which needs to be connected with the UI service port through the primary target service component and the secondary target service component is a tertiary target service component, the UI service port is a 0-level target service component, and the like.
In the practical application process, since each user request is in one-to-one correspondence with the target path information, that is, the user requests with different target path information are not necessarily user requests of the same category. Therefore, the target path information corresponding to each user request is divided to obtain path primitives, specifically, the target path information may be divided into the smallest path primitives, and taking the target path information shown in fig. 2 as an example, the path primitives may be divided into three path primitives, namely a- (B, B) -C and B- (D). If the path primitive of another user request has the same composition as the user request, the two user requests can be divided into a request category group, so that each user request corresponds to a request category group, and the target service component in each user request corresponds to a level on the transmission path of one user request.
In the actual application process, for a user request, as shown in fig. 3, the target path information of the user request has only one unidirectional path, and a- > B, B- > C, namely, from the UI service port 100 to the first target service component 101 and then to the second target service component 102. In order to more simply calculate the compensation response time of each target service component in such a case, in a further embodiment of the present application, the target path information further includes timestamp information of each of the target service components being invoked, the timestamp information including a start invocation time and a stop invocation time, and calculating the compensation response time of each of the target service components corresponding to each of the user requests when being executed based at least on each of the target path information in each of the request class groups includes: calculating the backoff response time of each target service component corresponding to each user request when the user request is executed, wherein tj (i) is the backoff response time of the jth target service component among the target path information corresponding to the ith user request, tbj (i) is the start call time of the jth target service component among the target path information corresponding to the ith user request, tej (i) is the stop call time of the jth target service component among the target path information corresponding to the ith user request, tbj+1 (i) is the start call time of the jth target service component among the target path information corresponding to the ith user request, tej+1 (i) is the target path information corresponding to the ith user request, and the stop call time of the jth+1 target service component is equal to or less than the total number of the j target service components in the group of the user requests.
In the actual application process, the value of i starts from 1 up to the total number of all user requests in a request class group. The value of j starts from 1 up to the total number of target service components in one target path information.
In a specific embodiment of the present application, as shown in fig. 3, on the target path information corresponding to the user request, the compensation response time for the target service component B (the first target service component 101) is the difference between the first response time and the second response time, where the first response time is the absolute value of the difference between the start call time and the stop call time of the target service component B, and the second response time is the absolute value of the difference between the start call time and the stop call time of the target service component C (i.e. the second target service component 102). Of course, the compensation response time to the target service component C may be only the absolute value of the difference between the start call time and the stop call time of the target service component C.
In the actual application process, for a user request, the target path information of the user request is shown in fig. 2, and there may be more than two unidirectional paths. In order to more simply calculate the compensation response time of each target service component in such a case, in another embodiment of the present application, the target path information further includes timestamp information of each of the target service components being invoked, the timestamp information includes a start invocation time and a stop invocation time, and calculating the compensation response time of each of the target service components corresponding to each of the user requests when being executed based at least on each of the target path information in each of the request class groups includes: determining unidirectional links corresponding to the target path information requested by each user in each request type group, wherein one target path information corresponds to at least two unidirectional links; at least adoptCalculating the compensation response time on the target service component corresponding to each of the user requests when executed, wherein,For the compensation response time of the jth target service component on the kth unidirectional link in the target path information corresponding to the ith user request,In the target path information corresponding to the ith user request, the start calling time of the jth target service component on the kth unidirectional link,The stop call time of the j-th target service component of the kth unidirectional link in the target path information corresponding to the ith user request,In the target path information corresponding to the ith user request, the start calling time of the (j+1) th target service component on the kth unidirectional link,And in the target path information corresponding to the ith user request, the stop call time of the (j+1) th target service components on the kth unidirectional link is less than or equal to the total number of the user requests in one request type group, j is less than or equal to the total number of the target service components on one target path information, and k is less than or equal to the total number of the unidirectional links of one target path information.
In the actual application process, the value of i starts from 1 up to the total number of all user requests in a request class group. The value of j starts from 1 up to the total number of target service components in one target path information. The value of k starts from 1 up to the total number of the unidirectional links described above for one target path information.
In a specific embodiment of the present application, as shown in FIG. 2, the target path information corresponding to the user request has two unidirectional paths A- (B-) C and A- (B-) D, so that the formula can be adoptedAnd respectively calculating the compensation response time of the target service component A, the target service component B, the target service component C and the target service component D. However, since the target path information corresponding to the user request has two unidirectional paths a— B-C and a— B-D, when calculating the compensation response times of the target service component a, the target service component B, the target service component C and the target service component D, calculation may be performed from the unidirectional paths a — B-C and a— B-D, respectively, in which case the target service component B is caused to be calculated twice, the compensation response time of the target service component B may be an average value of the compensation response times on the two unidirectional paths.
In yet another embodiment of the present application, determining whether a delay fault occurs in a corresponding target service component based at least on the compensating response time of each of the target service components in all of the request class groups includes: normalizing at least all the compensation response times of the target service components in each request class group to obtain the corresponding delay fault rate of each target service component in the request class group; calculating a target delay fault rate of the corresponding target service component based on the delay fault rates of the target service components in the request class groups; and determining whether a delay fault occurs to the corresponding target service component based on the target delay fault rate and a calling ratio of each target service component, wherein one of the service components corresponds to one of the calling ratios, and the calling ratio is a ratio of the total number of times the corresponding target service component is called to the total number of times the cloud platform processes the user request. In this embodiment, based on the target delay fault rate and the call ratio of the target service component in all request class groups, it is possible to accurately determine whether the corresponding target service component has a delay fault.
In one embodiment of the present application, normalizing at least all the compensation response times of the target service components in each request class group to obtain the delay fault rate of each target service component in the corresponding request class group includes: in one request category group, normalizing all the compensation response time of the z-th target service component to obtain a plurality of normalized probabilities; by usingCalculating the delay fault rate of the corresponding target service component, wherein Hz is the delay fault rate of the z-th target service component in one request class group, Pz (l) is the first normalized probability of the z-th target service component, m is the total number of the normalized probabilities of the z-th target service component, and z is less than or equal to the total number of the target service components in the corresponding request class group. In the embodiment, in a request class group, all compensation response times of a target service component in the request class group are normalized, so that a plurality of normalized probabilities are obtained, and then the delay fault rate of the target service component is calculated based on the normalized probabilities of the target service component, so that the influence of singular sample data can be avoided, the obtained delay fault rate is ensured to be accurate, and further, whether the target service component fails or not can be accurately determined later is ensured.
In practical application, the compensating response time of a target service component in the same kind of user request should be concentrated around one or several fixed values, because the corresponding target service component completes the same kind of user request in a substantially identical manner. Thus, each target service component should have its compensating response time concentrated around one or several fixed values for different user requests of the same type, the more random and discrete this distribution, the more likely it is that the target service component will be in delay faults. The failure rate of each target service component is calculated based on the logic, specifically in the following manner:
For a target service component, the compensation response time of the target service component under the request of one user is obtained, so that the compensation response time of all the user requests of the target service component in a request type group can be correspondingly arranged on a one-dimensional space, the maximum value Tmax of the compensation response time is set, the one-dimensional space is limited according to the minimum value 0 of the compensation response time and the maximum value Tmax of the compensation response time, all the obtained compensation response times of the target service component are normalized, the normalized compensation response time is graded, 0-0.1 is classified as level 1, 0.1-0.2 is classified as level 2, and the like, 0-1 is classified as 10 levels, and the probability of occurrence of the compensation response time level j in the one-dimensional space is expressed by P (j). The delay fault rate of the target service component is calculated as follows:
In an actual application process, after obtaining a delay fault rate of a target service component in a request class group, in order to eliminate influence of singular data and further determine more accurately whether a delay fault occurs in a corresponding target service component, in another embodiment of the present application, calculating a target delay fault rate of the corresponding target service component based on the delay fault rates of the target service components in the request class groups includes: calculating the sum of the delay fault rates of the same target service component in all the request class groups to obtain the sum of the fault rates of the corresponding target service components; and calculating the quotient of the sum of the fault rates and the total number of the request category groups to obtain the target delay fault rate. That is, based on the delay fault rate of the same target service component in each request class group, the average value of the delay fault rates of the target service components is calculated to obtain the target delay fault rate of the target service components, so that the abrupt change influence of the delay fault rate in a certain request class group can be avoided, and further, whether the corresponding target service components have delay faults or not can be accurately determined.
In order to determine whether a corresponding target service component fails more simply, in another embodiment of the present application, determining whether a corresponding target service component fails based on the target delay fault rate and the call ratio of each of the target service components includes: calculating the product of the target delay fault rate and the corresponding calling ratio of each target service component to obtain a plurality of target fault thresholds; determining that a delay fault occurs in the target service component corresponding to the target fault threshold under the condition that the target fault threshold is larger than a preset fault identification threshold; and under the condition that the target fault threshold is smaller than or equal to the preset fault identification threshold, determining that the target service component corresponding to the target fault threshold has no delay fault.
In the actual application process, if the target delay fault rate of the nth target service component in the cloud platform is thatThe ratio of the total number of times the target service component is called to the total number of times the cloud platform processes the user request is phi, and the preset fault identification threshold value is phi. If it isDetermining that the target service component has a delay fault; if it isIt is determined that the target service component has not failed in a delay.
The embodiment of the application also provides a device for determining the delay abnormality, and the device for determining the delay abnormality can be used for executing the method for determining the delay abnormality. The following describes a delay abnormality determination device provided in the embodiment of the present application.
Fig. 4 is a schematic structural diagram of a delay abnormality determination apparatus according to an embodiment of the present application. The determining apparatus is applied to a cloud platform, where the cloud platform includes a plurality of service components, as shown in fig. 4, and includes:
A construction unit 10, configured to construct an execution path of a user request on each target service component based on the running log information of each service component, to obtain target path information, where one of the user requests corresponds to one of the target path information, and one of the target path information includes at least two of the target service components;
A calculating unit 20 configured to classify each of the user requests based on the target path information of each of the user requests to obtain a plurality of request category groups, and calculate a compensation response time of the target service component corresponding to each of the user requests when executed based on at least each of the target path information in each of the request category groups, one of the request category groups including the plurality of user requests;
A determining unit 30, configured to determine whether a delay fault occurs in the corresponding target service component based at least on the compensating response time of each target service component in all the request class groups.
In the delay abnormality determining device, the constructing unit is configured to construct an execution path of a user request on each target service component based on the running log information of each service component, so as to obtain target path information of each user request; the computing unit is used for classifying each user request based on the target path information of each user request to obtain a plurality of request category groups, and computing compensation response information when each target service component executes the corresponding user request in each request category group; the determining unit is used for determining whether the corresponding target service component has delay faults or not at least based on the compensation response time of each target service component in all the request category groups. Compared with the prior art, whether the delay faults occur in the cloud platform is determined only through the log information of the service components or the resource utilization rate of the cloud platform, in the scheme, whether the delay faults occur in the corresponding target service components is determined at least based on the compensation response time of each target service component in all request type groups, namely, whether the delay faults occur in the cloud platform can be determined, and the delay faults can be accurately determined, so that the problem that the delay faults of the service components of the cloud platform are difficult to accurately detect in the prior art is solved, and further the fault processing efficiency of the cloud platform and the running reliability of the cloud platform are guaranteed to be higher.
In the actual application process, for a cloud platform (i.e. a cloud computing service platform), a user inputs a user request at different UI service ports, and the cloud platform directly returns the result. In fact, the user request input by the user through the UI service port is sequentially executed through a plurality of target service components in the cloud platform. Therefore, when one user requests input, the running log information of each service component in the cloud platform records the time stamp of starting and ending the execution request, the ID information corresponding to the executed user request and the source component (i.e. the upper component) of the user request, so that a path of executing each target service component when the user request is executed can be constructed according to the running log information of each service component on the cloud platform. In a specific embodiment of the present application, assuming that ID information corresponding to a user request is 00001, target path information thereof may be constructed as a directed graph as shown in fig. 2. I.e. the user request is input from the UI service port a (i.e. the UI service port 100), goes through the target service component B (the first target service component 101), the target service component C (the second target service component 102) and the target service component D (i.e. the third target service component 103), and its target path information is a— > B, B — > C and b— > D.
In a specific embodiment of the application, based on the target path information of each user request, each user request is classified to obtain a plurality of request category groups, so that target service components experienced by each user request in one request category group are the same, and further, the compensation response time based on each target service component is further ensured, and whether delay faults occur in the corresponding target service components is accurately determined.
In order to classify each user request more simply, in one embodiment of the present application, the computing unit includes a first dividing module and a second dividing module, where the first dividing module is configured to divide the target path information of each user request to obtain a plurality of path primitives, where the path primitives are paths formed by two adjacent target service components on the target path information, and one of the target path information corresponds to at least one of the path primitives; the second dividing module is configured to divide the user requests with the same path primitives into one category, to obtain a plurality of request category groups.
In a specific embodiment of the present application, before dividing the target path information corresponding to each user request to obtain the path primitive, the target path information corresponding to each user request may be further classified to obtain the classified target path information. The process of grading the target path information corresponding to each user request may be: when a user inputs a user request from the UI service port, the user will go through each target service component on the execution path, so the first target service component directly connected to the UI service port can directly obtain the user request. And when the second target service component next to the first target service component executes the user request, the call command sent by the first target service component is required, so that the call of each target service component has a concept of level. Therefore, the target service components of each level can be classified according to the positions of the target service components on the execution paths of the corresponding user requests. The method comprises the following steps: for a user request, it corresponds to an execution path of a target service component, and the execution path can search out various unidirectional links according to the transmission direction of the user request. The level division of the target service components on each unidirectional link is independent of each other. Thus, for a unidirectional link from a UI service port, the level of each target service component is defined as follows: the target service component directly connected with the UI service port is a primary target service component, the service component which needs to be connected with the UI service port through the primary target service component is a secondary target service component, the target component which needs to be connected with the UI service port through the primary target service component and the secondary target service component is a tertiary target service component, the UI service port is a 0-level target service component, and the like.
In the practical application process, since each user request is in one-to-one correspondence with the target path information, that is, the user requests with different target path information are not necessarily user requests of the same category. Therefore, the target path information corresponding to each user request is divided to obtain path primitives, specifically, the target path information may be divided into the smallest path primitives, and taking the target path information shown in fig. 2 as an example, the path primitives may be divided into three path primitives, namely a- (B, B) -C and B- (D). If the path primitive of another user request has the same composition as the user request, the two user requests can be divided into a request category group, so that each user request corresponds to a request category group, and the target service component in each user request corresponds to a level on the transmission path of one user request.
In the actual application process, for a user request, as shown in fig. 3, the target path information of the user request has only one unidirectional path, and a- > B, B- > C, namely, from the UI service port 100 to the first target service component 101 and then to the second target service component 102. In order to more simply calculate the backoff response time of each target service component in such a case, in still another embodiment of the present application, the target path information further includes timestamp information of each target service component to be invoked, the timestamp information includes a start invocation time and a stop invocation time, the calculating unit further includes a first calculating module configured to calculate the backoff response time of each target service component corresponding to each user request when the user request is executed, using tj(i)=|Tbj(i)-Tej(i)|-|Tbj+1(i)-Tej+1 (i), wherein tj (i) is the backoff response time of the i-th target service component, tbj (i) is the start invocation time of the j-th target service component, tej (i) is the target path information corresponding to the i-th target service component, tbj+1 (i) is the stop invocation time of the j-th target service component, and the total number of the j-th target path components is equal to or less than the total number of target path information corresponding to the i-th target service component, and is equal to 35 j+1, among the target path information corresponding to the i-th target service component, and the total number of the j-th target path information is equal to or less than 35 i-th target path information.
In the actual application process, the value of i starts from 1 up to the total number of all user requests in a request class group. The value of j starts from 1 up to the total number of target service components in one target path information.
In a specific embodiment of the present application, as shown in fig. 3, on the target path information corresponding to the user request, the compensation response time for the target service component B (the first target service component 101) is the difference between the first response time and the second response time, where the first response time is the absolute value of the difference between the start call time and the stop call time of the target service component B, and the second response time is the absolute value of the difference between the start call time and the stop call time of the target service component C (i.e. the second target service component 102). Of course, the compensation response time to the target service component C may be only the absolute value of the difference between the start call time and the stop call time of the target service component C.
In the actual application process, for a user request, the target path information of the user request is shown in fig. 2, and there may be more than two unidirectional paths. In order to more simply calculate the compensation response time of each target service component in such a case, in another embodiment of the present application, the target path information further includes timestamp information that each target service component is called, where the timestamp information includes a start call time and a stop call time, and the calculating unit further includes a first determining module and a second calculating module, where the first determining module is configured to determine unidirectional links corresponding to the target path information requested by each user in each request type group, and one target path information corresponds to at least two unidirectional links; the second computing module is used for adopting at leastCalculating the compensation response time on the target service component corresponding to each of the user requests when executed, wherein,For the compensation response time of the jth target service component on the kth unidirectional link in the target path information corresponding to the ith user request,In the target path information corresponding to the ith user request, the start calling time of the jth target service component on the kth unidirectional link,The stop call time of the j-th target service component of the kth unidirectional link in the target path information corresponding to the ith user request,In the target path information corresponding to the ith user request, the start calling time of the (j+1) th target service component on the kth unidirectional link,And in the target path information corresponding to the ith user request, the stop call time of the (j+1) th target service components on the kth unidirectional link is less than or equal to the total number of the user requests in one request type group, j is less than or equal to the total number of the target service components on one target path information, and k is less than or equal to the total number of the unidirectional links of one target path information.
In the actual application process, the value of i starts from 1 up to the total number of all user requests in a request class group. The value of j starts from 1 up to the total number of target service components in one target path information. The value of k starts from 1 up to the total number of the unidirectional links described above for one target path information.
In a specific embodiment of the present application, as shown in FIG. 2, the target path information corresponding to the user request has two unidirectional paths A- (B-) C and A- (B-) D, so that the formula can be adoptedAnd respectively calculating the compensation response time of the target service component A, the target service component B, the target service component C and the target service component D. However, since the target path information corresponding to the user request has two unidirectional paths a— B-C and a— B-D, when calculating the compensation response times of the target service component a, the target service component B, the target service component C and the target service component D, calculation may be performed from the unidirectional paths a — B-C and a— B-D, respectively, in which case the target service component B is caused to be calculated twice, the compensation response time of the target service component B may be an average value of the compensation response times on the two unidirectional paths.
In still another embodiment of the present application, the determining unit includes a processing module, a third calculating module, and a second determining module, where the processing module is configured to normalize at least all the compensation response times of the target service components in each request class group to obtain a delay fault rate of each target service component in the corresponding request class group; the third calculation module is configured to calculate a target delay fault rate of the corresponding target service component based on the delay fault rates of the target service components in each request class group; the second determining module is configured to determine whether a delay fault occurs in a corresponding target service component based on the target delay fault rate and a calling ratio of each target service component, where one of the service components corresponds to one of the calling ratios, and the calling ratio is a ratio of a total number of times the corresponding target service component is called to a total number of times the cloud platform processes the user request. In this embodiment, based on the target delay fault rate and the call ratio of the target service component in all request class groups, it is possible to accurately determine whether the corresponding target service component has a delay fault.
In one embodiment of the present application, the processing module includes a normalization sub-module and a first calculation sub-module, where the normalization sub-module is configured to normalize all the compensation response times of the z-th service component in one of the request class groups to obtain a plurality of normalization probabilities; the first computing submodule is used for adoptingCalculating the delay fault rate of the corresponding target service component, wherein Hz is the delay fault rate of the z-th target service component in one request class group, Pz (l) is the first normalized probability of the z-th target service component, m is the total number of the normalized probabilities of the z-th target service component, and z is less than or equal to the total number of the target service components in the corresponding request class group. In the embodiment, in a request class group, all compensation response times of a target service component in the request class group are normalized, so that a plurality of normalized probabilities are obtained, and then the delay fault rate of the target service component is calculated based on the normalized probabilities of the target service component, so that the influence of singular sample data can be avoided, the obtained delay fault rate is ensured to be accurate, and further, whether the target service component fails or not can be accurately determined later is ensured.
In practical application, the compensating response time of a target service component in the same kind of user request should be concentrated around one or several fixed values, because the corresponding target service component completes the same kind of user request in a substantially identical manner. Thus, each target service component should have its compensating response time concentrated around one or several fixed values for different user requests of the same type, the more random and discrete this distribution, the more likely it is that the target service component will be in delay faults. The failure rate of each target service component is calculated based on the logic, specifically in the following manner:
For a target service component, the compensation response time of the target service component under the request of one user is obtained, so that the compensation response time of all the user requests of the target service component in a request type group can be correspondingly arranged on a one-dimensional space, the maximum value Tmax of the compensation response time is set, the one-dimensional space is limited according to the minimum value 0 of the compensation response time and the maximum value Tmax of the compensation response time, all the obtained compensation response times of the target service component are normalized, the normalized compensation response time is graded, 0-0.1 is classified as level 1, 0.1-0.2 is classified as level 2, and the like, 0-1 is classified as 10 levels, and the probability of occurrence of the compensation response time level j in the one-dimensional space is expressed by P (j). The delay fault rate of the target service component is calculated as follows:
In an actual application process, after obtaining a delay fault rate of a target service component in a request class group, in order to eliminate influence of singular data and further determine more accurately whether a delay fault occurs in a corresponding target service component, in another embodiment of the present application, the third calculation module includes a second calculation sub-module and a third calculation sub-module, where the second calculation sub-module is configured to calculate a sum of the delay fault rates of the same target service component in all the request class groups, and obtain a sum of fault rates of the corresponding target service components; the third calculation sub-module is configured to calculate a quotient of the sum of the failure rates and the total number of the request class groups, and obtain the target delay failure rate. That is, based on the delay fault rate of the same target service component in each request class group, the average value of the delay fault rates of the target service components is calculated to obtain the target delay fault rate of the target service components, so that the abrupt change influence of the delay fault rate in a certain request class group can be avoided, and further, whether the corresponding target service components have delay faults or not can be accurately determined.
In order to determine whether the corresponding target service component fails more simply, in a further embodiment of the present application, the second determining module includes a fourth calculating submodule, a first determining submodule and a second determining submodule, where the fourth calculating submodule is configured to calculate a product of the target delay failure rate of each of the target service components and the corresponding calling ratio to obtain a plurality of target failure thresholds; the first determining submodule is used for determining that the target service component corresponding to the target fault threshold has a delay fault under the condition that the target fault threshold is larger than a preset fault identification threshold; the second determining submodule is configured to determine that the target service component corresponding to the target fault threshold does not have a delay fault when the target fault threshold is less than or equal to the preset fault identification threshold.
In the actual application process, if the target delay fault rate of the nth target service component in the cloud platform is thatThe ratio of the total number of times the target service component is called to the total number of times the cloud platform processes the user request is phi, and the preset fault identification threshold value is phi. If it isDetermining that the target service component has a delay fault; if it isIt is determined that the target service component has not failed in a delay.
The delay abnormality determination device comprises a processor and a memory, wherein the construction unit, the calculation unit, the determination unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one kernel, and the problem that delay abnormality of a service component of a cloud platform is difficult to detect accurately in the prior art is solved by adjusting kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
An embodiment of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the above-described method of determining a delay abnormality.
The embodiment of the invention provides a processor, which is used for running a program, wherein the method for determining the delay abnormality is executed when the program runs.
In an exemplary embodiment of the present application, a cloud platform is further provided, where the cloud platform includes a delay anomaly determination device, where the determination device is configured to perform any one of the delay anomaly determination methods described above.
The cloud platform comprises a delay abnormality determining device, and the determining device is used for executing any one of the delay abnormality determining methods. In the above determination method, firstly, based on the operation log information of each service component, an execution path of a user request on each target service component is constructed to obtain target path information of each user request; then, classifying each user request based on target path information of each user request to obtain a plurality of request category groups, and calculating compensation response information when each target service component executes the corresponding user request in each request category group; finally, determining whether the corresponding target service component has delay faults based on at least the compensating response time of each target service component in all the request category groups. Compared with the prior art, whether the delay faults occur in the cloud platform is determined only through the log information of the service components or the resource utilization rate of the cloud platform, in the scheme, whether the delay faults occur in the corresponding target service components is determined at least based on the compensation response time of each target service component in all request type groups, namely, whether the delay faults occur in the cloud platform can be determined, and the delay faults can be accurately determined, so that the problem that the delay faults of the service components of the cloud platform are difficult to accurately detect in the prior art is solved, and further the fault processing efficiency of the cloud platform and the running reliability of the cloud platform are guaranteed to be higher.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program stored in the memory and capable of running on the processor, wherein the processor realizes at least the following steps when executing the program:
Step S101, based on the operation log information of each service component, constructing an execution path of a user request on each target service component to obtain target path information, wherein one user request corresponds to one target path information, and one target path information at least comprises two target service components;
step S102, classifying each user request based on the target path information of each user request to obtain a plurality of request category groups, and calculating the compensation response time of the corresponding target service component when each user request is executed based on at least each target path information in each request category group, wherein one request category group comprises a plurality of user requests;
step S103, determining whether a delay fault occurs in the corresponding target service component based at least on the compensating response time of each target service component in all the request class groups.
The device herein may be a server, PC, PAD, cell phone, etc.
The application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with at least the following method steps:
Step S101, based on the operation log information of each service component, constructing an execution path of a user request on each target service component to obtain target path information, wherein one user request corresponds to one target path information, and one target path information at least comprises two target service components;
step S102, classifying each user request based on the target path information of each user request to obtain a plurality of request category groups, and calculating the compensation response time of the corresponding target service component when each user request is executed based on at least each target path information in each request category group, wherein one request category group comprises a plurality of user requests;
step S103, determining whether a delay fault occurs in the corresponding target service component based at least on the compensating response time of each target service component in all the request class groups.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units may be a logic function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
From the above description, it can be seen that the above embodiments of the present application achieve the following technical effects:
1) In the method for determining the delay abnormality, firstly, an execution path of a user request on each target service component is constructed based on the operation log information of each service component to obtain target path information of each user request; then, classifying each user request based on target path information of each user request to obtain a plurality of request category groups, and calculating compensation response information when each target service component executes the corresponding user request in each request category group; finally, determining whether the corresponding target service component has delay faults based on at least the compensating response time of each target service component in all the request category groups. Compared with the prior art, whether the delay faults occur in the cloud platform is determined only through the log information of the service components or the resource utilization rate of the cloud platform, in the scheme, whether the delay faults occur in the corresponding target service components is determined at least based on the compensation response time of each target service component in all request type groups, namely, whether the delay faults occur in the cloud platform can be determined, and the delay faults can be accurately determined, so that the problem that the delay faults of the service components of the cloud platform are difficult to accurately detect in the prior art is solved, and further the fault processing efficiency of the cloud platform and the running reliability of the cloud platform are guaranteed to be higher.
2) In the delay abnormality determination device, a construction unit is used for constructing an execution path of a user request on each target service component based on the operation log information of each service component to obtain target path information of each user request; the computing unit is used for classifying each user request based on the target path information of each user request to obtain a plurality of request category groups, and computing compensation response information when each target service component executes the corresponding user request in each request category group; the determining unit is used for determining whether the corresponding target service component has delay faults or not at least based on the compensation response time of each target service component in all the request category groups. Compared with the prior art, whether the delay faults occur in the cloud platform is determined only through the log information of the service components or the resource utilization rate of the cloud platform, in the scheme, whether the delay faults occur in the corresponding target service components is determined at least based on the compensation response time of each target service component in all request type groups, namely, whether the delay faults occur in the cloud platform can be determined, and the delay faults can be accurately determined, so that the problem that the delay faults of the service components of the cloud platform are difficult to accurately detect in the prior art is solved, and further the fault processing efficiency of the cloud platform and the running reliability of the cloud platform are guaranteed to be higher.
3) The cloud platform comprises a delay abnormality determining device, wherein the determining device is used for executing any one of the delay abnormality determining methods. In the above determination method, firstly, based on the operation log information of each service component, an execution path of a user request on each target service component is constructed to obtain target path information of each user request; then, classifying each user request based on target path information of each user request to obtain a plurality of request category groups, and calculating compensation response information when each target service component executes the corresponding user request in each request category group; finally, determining whether the corresponding target service component has delay faults based on at least the compensating response time of each target service component in all the request category groups. Compared with the prior art, whether the delay faults occur in the cloud platform is determined only through the log information of the service components or the resource utilization rate of the cloud platform, in the scheme, whether the delay faults occur in the corresponding target service components is determined at least based on the compensation response time of each target service component in all request type groups, namely, whether the delay faults occur in the cloud platform can be determined, and the delay faults can be accurately determined, so that the problem that the delay faults of the service components of the cloud platform are difficult to accurately detect in the prior art is solved, and further the fault processing efficiency of the cloud platform and the running reliability of the cloud platform are guaranteed to be higher.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.