WO2019125445A1

Movatterモバイル変換

Info

Publication number: WO2019125445A1
Application number: PCT/US2017/067714
Authority: WO
Inventors: Theodore D. Harris; Tatiana KOROLEVSKAYA; Yue Li
Original assignee: Visa International Service Association
Current assignee: Visa International Service Association
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2019-06-27
Anticipated expiration: 2020-06-20

Abstract

Methods and systems for throttling or rerouting network communications using an artificial intelligence model are provided. A control system receives messages from client computers and processor computers. The control system extracts message features from these messages, and uses those features to train an artificial intelligence model. The artificial intelligence model is able to predict network outcomes based on these features, and based on those predictions, either throttle messages, reroute messages, or alert a network administrator.

Description

AUTOMATED FAULT DETECTING CONTROL SYSTEM

BACKGROUND

[0001] Many online services are enabled through routing, the process by which network traffic is sent to different servers. Many web services, such as search engines, maintain a large bank of servers and incoming network traffic is routed to different servers depending on their current computational load. Network traffic is directed away from servers handling a large number of requests toward servers handling a smaller number of requests. In this way, the computational load is roughly even across all servers. [0002] However, conventional methods of network traffic routing and throttling generally rely on complex and ineffectual rule sets. These rule sets determine how network traffic should be routed and throttled under certain conditions, and may not be effective in all circumstances. For example, one ruleset may suggest routing all network traffic from a hub to first server during a first period of time (e.g., the evening), but then routing network traffic from the hub to two or more different servers a second period of time (e.g., during the day). This rule set may be based upon historical data that determined that it is more desirable to route data in this manner. However, if an unexpected event such as a failure of a downstream server occurs, then the rule set may not be able to account for the unexpected event, thereby causing data routing problems

[0003] Embodiments of the invention address these and other problems in further detail below.

SUMMARY

[0004] Embodiments of the invention are directed to automated systems for routing messages in an intelligent manner. [0005] One embodiment is directed to a method. The method comprises receiving, by a server computer, a plurality of messages from one or more client computers, and determining, by the server computer, at least one candidate processor computer. The method further comprises determining, by the server computer, if the at least one candidate processor computer can process data elements from the plurality of messages according to predetermined parameters using an analytical model built using a graph learning algorithm. If one or more candidate processor computers of the at least one candidate processor computer can process the data elements according to predetermined parameters, then the method includes transmitting, by the server computer, the plurality of messages or other messages derived from the plurality of messages to the one or more candidate processor computers. If one or more candidate processor computers of the at least one candidate processor computer cannot process the data elements according to the predetermined parameters, then the method includes selecting, by the server computer, one or more other processor computers to process the data elements according to the predetermined parameters, and then transmitting, by the server computer, the plurality of messages or other messages derived from the plurality of messages to the one or more other processor computers for processing.

[0006] Another embodiment is directed to a server computer comprising: a processor; and a non-transitory computer readable medium coupled to the processor, the non-transitory computer readable medium comprising code executable by the processor for performing the above method.

[0007] These and other embodiments of the invention are described in further detail below. TERMS

[0008] Prior to discussing specific embodiments, some terms may be described in detail.

[0009] A“server computer" may include a powerful computer or cluster of computers. For example, a server computer can be a large mainframe, a minicomputer duster, or a group of servers functioning as a unit in one example, a server computer may be a database server coupled to a Web server. The server computer may comprise one or more computational apparatuses and may use any of a variety of computing structures, arrangements, and compilations for servicing the requests from one or more client computers.

[0010] A“memory” may include any suitable device or devices that may store electronic data. A suitable memory may comprise a nan-transitory computer readable medium that stores instructions that can be executed by a processor to implement a desired method. Examples of memories may comprise one or more memory chips, disk drives, etc. Such memories may operate using any suitable electrical, optical, and/or magnetic mode of operation.

[0011] A“data processor” may include a device that processes data. In some embodiments, a data processor may comprise one or more microprocessors working together to accomplish a desired function. The data processor may include a CPU that comprises at least one high-speed data processor adequate to execute program components for executing user and/or system-generated requests. The CPU may be a microprocessor such as AMD's Athlon, Duron and/or Opteron; IBM and/or Motorola's PowerPC; IBM's and Sony's Ceil processor; Intel's Celeron, Itanium, Pentium, Xeon, and/or XSca!e; and/or the like processor(s).

[0012] A“processor computer” can be a computer that performs a process on received data. A“processor computer” may include a data processor and a computer readable medium. Some examples of processor computers may include computers that can determine if certain types of fraud are associated with a specific message. For example, one type of processor computer may determine a risk of fraud based upon an IP address of a client computer, while another type of processor computer may determine a risk of fraud based upon the way in which client computer behaves (e.g., speed of a mouse interaction with a browser). Other types of processor computers may include authorizing entity computers such as issuer computers or secure data access computers which may authorize or not authorize certain actions based upon the data in messages (e.g., authorization request messages) that it might receive. [0013] The term“module” may include a software module, which may be part of a larger program.

[0014] An“artificial intelligence model" or“Ai model” may include a model that may be used to predict outcomes in order achieve a target goal. The Ai model may be developed using a learning algorithm, in which training data is classified based on known or inferred patterns. An AI model may also be referred to as a“machine learning model.”

[0015] An“analytical model” may be a model used to perform analysis. An analytical model may accept data as an input and produce some form of analysis as an output. For example, an analytical model may be a classifier that accepts data relating to messages (such as where the message was sent from, where the message is sent to, etc.) and produce a classification as to whether those messages are typical or atypical. An analytical model may be based-off observations, such as previously captured data, an underlying physical model, or others. An analytical model may be a machine learning model or an artificial intelligence model, such as an artificial neural network.

[0016] An“information space” may include sets of data that may be explored to identify specific data to be used in training a machine learning model. The information space may comprise data relating to events, such as the time and place that the events occurred, the devices involved, and the specific actions performed. An involved device may be identified by an identification number and may further be associated with a user or entity. The user or entity may be associated with profile data regarding the user or entity’s behavior and characteristics. The data may further be characterized as comprising input and output variables, which may be recorded and learned from in order to make predictions. For example, an information space may comprise a transactional network of consumer and merchants that may be defined by known characteristics, such as name, location, age, type, etc. The interactions between the consumers and merchants may further be described and defined by variables, such as transaction amount and transaction time, which may later be used to identify patterns in the information space and make predictions about what interactions will happen next. [0017] The term“node” may refer to a discrete data point representing specified information. Nodes may be connected to one another in a topological graph by edges, which may be assigned a value known as an edge weight in order to describe the connection strength between the two nodes. For example, a first node may be a data point representing a first device in a network, and the first node may be connected in a graph to a second node representing a second device in the network. The connection strength may be defined by an edge weight corresponding to how quickly and easily information may be transmitted between the two nodes. An edge weight may also be used to express a cost or a distance required to move from one state or node to the next. For example, a first node may be a data point representing a first position of a machine, and the first node may be connected in a graph to a second node for a second position of the machine. The edge weight may be the energy required to move from the first position to the second position.

[0018] The term“feature” may refer to data to be used in training a machine learning model and/or used by a trained model. An input feature may be data that is compiled and expressed in a form that may be accepted and used to train an artificial intelligence model as useful information for making predictions. An input feature may be identified as a collection of one or more input nodes in a graph, such as a path comprising the input nodes. For example, an input feature may be identified as a path (set) comprising input nodes for‘male,^’’25 years old,’ lives in San Francisco,’ and ‘college graduate; in another example, an input feature may be identified as a path comprising input nodes for‘object detected’‘distance = 1 m,’‘shape = round,’‘diameter = 5 cm,’‘color = orange.’ Features may be organized into a searchable format, such as a graph database or index table. For example, input features may be organized as a list of keys that may be indexed, queried, and assigned values for making predictions. A list of input features gathered for use by an Al model may be referred to as a“feature set.”

[0019] The term“message feature” or“message parameter may refer to a measureable aspect of a message. This may include the length of the message, when the message was sent, when the message was received, the sender of the message, and the receiver of the message among others. A message feature may be used to train a machine iearning model or an artificial intelligence. A message feature may be evaluated by a machine Iearning model or artificial intelligence in order to come to some classification or decision. [0020] The term“solver" may refer to a computational component that searches for a solution. For example, one or more solvers may be used to calculate a solution to an optimization problem. Solvers may additionally be referred to as“agents.” A plurality of agents that work together to solve a given problem, such as in the case of ant colony optimization, may be referred to as a“colony.” [0021] The term“epoch" may refer to a period of time, e.g., in training a machine iearning model. During training of learners in a iearning algorithm, each epoch may pass after a defined set of steps have been completed. For example, in ant colony optimization, each epoch may pass after ail computational agents have found solutions and have calculated the cost of their solutions. In an iterative algorithm, an epoch may include an iteration or multiple iterations of updating a model. An epoch may

sometimes be referred to as a“cycle.”

[0022] The term“trial solution” may refer to a solution found at a given cycle of an iterative algorithm that may be evaluated. In ant colony optimization, a trial solution may refer to a solution that is proposed to be a candidate for the optimal path within an information space before being evaluated against predetermined criteria. A trial solution may also be referred to as a“candidate solution,”“intermediate solution," or“proposed solution” A set of trial solutions determined by a colony of agents may be referred to as a solution state.

[0023] The term“event” may refer to something that has happened, is happening, or may happen. Events relating to messages, such as the receipt or transmission of a message may be referred to as“message events” A message event may be

associated with one or more message features. For example, the message event “received a message from New York” is associated with the message feature describing the origin of the message (from New York). [0024] The term“sequence” may be defined as some collection of things that come one after another. A sequence may be spatial or temporal. For example, a number of street lamps may be considered to be in sequence. Events may also be in sequence, for example, turning down a thermostat and a room cooling are events that may occur in sequence with one another. A“sequential relationship” may refer to a relationship or association between things in sequence. For example, a sequential relationship between street lamps on the street may include, among other things, the distance between the street lamps. A sequential relationship between two events may include the time difference between them, or a correlation factor, a property describing how likely one event may be seen in sequence with another.

[002S] The term“network outcome” may refer to an outcome associated with a network. This may include outcomes associated with computer networks, such as the Internet or a distributed computing system. Examples of network outcomes may include “computer shutdown,”“increase in latency,”“reduction in throughput,” etc. Network outcomes may include measurable characteristics of a network. A network outcome associated with a negative experience or attribute may be referred to as a“network issue.

[0026] A“data element” in a message may include any suitable data associated with the message it may include actual content in a message or data relating to the circumstances surrounding the message. Examples of data elements may be transaction amounts, data relating to how data may have been entered into a client computer by a user, an originating IP address for a client computer, a device ID for the client computer, or any other suitable data.

BRIEF DESCRIPTION OF THE DRAWINGS [0027] FIG. 1 shows a high-level diagram depicting a process for training and using a machine learning model.

[0028] FIG. 2 shows a depiction of a topological graph according to embodiments of the invention. [0029] FIG. 3 shows a system block diagram of an automated fault detecting control system according to embodiments of the invention.

[0030] FIG. 4 shows a block diagram of a front end processing module according to embodiments of the invention. [0031] FIG. 5 shows a block diagram of a model builder module according to embodiments of the invention.

[0032] FIG. 6 show an example of a graph produced by a sequential graph learner according to embodiments of the invention

[0033] FIG. 7 shows an example of graph reduction using an adaptive Prim’s algorithm according to embodiments of the invention.

[0034] FIG. 8 shows a block diagram of a rules module according to

embodiments of the invention.

[0035] FIG. 9 shows a flowchart detailing a method for responding to impending network issues according to embodiments of the invention. [0036] FIG. 10 shows a flowchart detailing a method for evaluating the recovery process of a processor computer according to embodiments of the invention.

[0037] FIG. 1 1 shows a sequence diagram detailing the transmission of messages between client computers, a server computer, and processor computers.

DETAILED DESCRIPTION [0038] Embodiments include methods for routing and throttling messages sent from client computers to processor computers in order to improve the processing of data by those processor computers. Additionally, embodiments may include systems programmed to perform the above methods. In some embodiments of the invention, these systems may be referred to as an Automated Fault Detecting Control System (AFDCS). in some cases, the AFDCS may include a server computer. [0039] Embodiments of the invention may be accomplished through a multi-stage artificial intelligence and signal processing model (AISPM) implemented through a number of modules and submodules in the AFDCS. Generally, the AISPM may perform two operations. The first is an analysis, where the AISPM may forecast“network outcomes” and risk scores associated with those network outcomes. The second is a “determination,” where the AISPM may determine an optimal course of action in order to prevent undesirable network outcomes from occurring, or cause current undesirable network outcomes to return to acceptable levels.

[0040] In some embodiments of the invention, The AFDCS, implemented as a server computer, may receive a plurality of messages from one or more client computers. The AISPM may continuously learn to identify patterns in the messages received from client computers. Further, the AISPM may learn to identify“motifs” (i.e , frequently occurring patterns). The AISPM may also form associations between patterns which sequentially follow one another.

[0041] Additionally, the AISPM may learn to identify different“network outcomes^” associated with one or more processor computers. These network outcomes may be thought of as what a processor computer may“experience^” during the course of its operation. These network outcomes may comprise a number of metrics, such as the response time or“latency,” the time it takes for a processor computer to process a request, risk scores, and/or the probability of processor computer failure over a predetermined time frame (such as one minute, five minutes, ten minutes, etc.).

[0042] Once the potential outcomes are determined, the AISPM may compare these outcomes against predetermined parameters. These parameters may define what constitutes an acceptable or unacceptable performance in terms of the network outcomes. As an example, a predetermined parameter could include a maximum latency of 400 milliseconds. This indicates to the AISPM that a measured network outcome greater than or equal to 400 milliseconds constitutes unacceptable

performance. Predetermined parameters may include quantitative or qualitative values pertaining to latency, message throughput, message transmission or response delay, other quality of service metrics, etc. [0043] In some embodiments, the AISPM may also use a number of filtering techniques to remove statistical variances in network outcome measurements. By filtering out noise, the AISPM can produce better estimates of network outcomes.

[0044] The AISPM can learn to determine a risk score based on a comparison between the network outcomes and the predetermined parameters. This risk score is an assessment as to how likely one or more network outcomes will reach unacceptable levels, as defined by the predetermined parameters. This risk score informs the development of a“cutoff value,” or a level at which the AISPM believes it should act in order to prevent one or more network outcomes from reaching unacceptable levels.

The AISPM may learn based on experience how to establish a cutoff value and risk scores such that action is only taken when needed.

[0045] The AISPM may learn to associate network outcomes with identified patterns. This may be accomplished via frequentist or Bayesian inference. As an example, the AISPM may notice that a specific pattern of messages or message features is frequently followed by a specific network outcome. For example, a rapid series of messages ail originating from the same source (such as during a DDoS attack) may result in a candidate processor server becoming unresponsive. The AISPM may determine that there is either a correlational or causationai relationship between the specific pattern of messages and the specific outcome. Further, the AISPM may determine a“signal strength” associated with these patterns and outcomes. This signal strength relates the likelihood that a specific network outcome will be observed in conjunction with a specific pattern.

[0046] Combining inferred associations of message patterns and meta-patterns, as well as associations between messages and network outcomes can allow the AISPM to perform forecasting (i.e. , the prediction of patterns or outcomes based on previously observed patterns or outcomes). This effectively allows the AISPM to observe sequences of incoming messages and forecast network outcomes based on those messages. Combined with risk assessment and the cutoff value, the AISPM

determines an appropriate time to act in order to prevent undesirable network outcomes from occurring. [0047] In some embodiments of the invention, when a risk score exceeds the established cutoff value, the AISPM may determine and enact one or more appropriate courses of action in order to prevent undesirable network outcomes. For example, a course of action may comprise routing network traffic from its original processor computer destination to another processor computer, rerouting network traffic to a new network location (such as another AFDCS or a router mediating message flow to a collection of processor computers), throttling network requests, or alerting a network administrator in some embodiments, throttling network requests can include restricting the number of messages sent to a given processor computer

[0048] Using the models and techniques above, the AFDCS may determine how messages should be routed among a collection of processor computers, computers that can process messages or data elements within the messages. This may involve selecting a group of“candidate processor computers,” processor computers that are candidates to receive messages from the AFDCS, and“other processor computers,^” computers that may receive messages if the candidate processor computers are unable to process those messages according to predetermined parameters. The AFDCS, which may be implemented as a server computer, may transmit messages to the candidate processor computers or reroute messages to the other processor computers based on the results of the determination.

[0049] Further, the AFDCS may generate subsequent messages based on the messages received from the client computers. These subsequent messages may be composed of any number of data elements or message features extracted from the messages received from client computers. These subsequent messages may be transmitted to different processor computers based on their contents. As an example, a message relating to a financial transaction may be transmitted to a candidate processor computer that is capable of processing financial transaction messages, whereas a message relating to an email may be transmitted to a candidate processor computer (such as an email server) that is capable of processing emails.

[0050] In some embodiments of the invention, the AISPM may interact with and build on an existing sets of rules that dictate appropriate courses of action associated with a given processor computer. The AISPM may determine the optimal course of action given the constraints imposed by the rules, historical data, and message forecasting. This may be accomplished with an evolutionary learning system, such as an“ant colony optimization algorithm,” which may converge on the optimal course of action over a number of evaluations of the rules, their interrelations, and courses of action.

[0051] Further, in some embodiments of the invention, the AISPM may monitor the recovery of a given processor computer. For exampie, the AISPM may adjust its course of action in order to alleviate undesirable network outcomes. For example, despite initially throttling messages to a processor computer, it may still experience undesirable network outcomes such as high latency. The AISPM may use a learning system, such as an artificial neural network to evaluate the recovery process and establish a new course of action.

[00S2] FIG. 1 shows a high-level diagram depicting a process for training and using a machine learning model. Process 100 starts with training data, shown as existing records 1 10. The training data can comprise various data samples, where each data sample includes input data and known or inferred output data. For an exampie data sample, the input data can be a current network outcome, and the output data can be a future network outcome correlated with that network outcome.

[0053] After training data is obtained, a learning process can be used to train the model. Learning module 120 is shown receiving existing records 1 10 and providing model 130 after training has been performed. As data samples include outputs known to correspond to specific inputs, a model can learn the type of inputs that correspond to which outputs, e.g., which network outcomes are correlated with one another. The training can determine errors between a predicted output of the model and the known or inferred output. These errors can be used to optimize parameters (e.g., weights) of the model so as to reduce the errors, thereby increasing accuracy

[0054] Once model 130 has been trained, it can be used to predict the output for a new request 140 that includes new Inputs For instance, model 130 can determine whether a new network outcome predicts a future network outcome based on their correlation. Model 130 is shown providing a predicted output 150 based on new request 140. Examples of predictive output 150 may include a classification of a threat, a classification for the authentication of a device to access a resource (e.g., a building or a computer), or a classification of a user seeking a recommendation (e.g., a recommended response from a chat bot or a recommended restaurant or answer to another query). In this manner, the wealth of the training data can be used to create artificial intelligence that can be advantageously used for a particular problem.

[0055] Machine learning may include supervised learning, unsupervised learning, and semi-supervised learning. In supervised learning, each input is related to a corresponding target output that provides feedback about an error of the model so that the training can increase an accuracy of the model in predicting an output in such a training method, training data that matches sets of inputs to corresponding outputs is provided to the model.

[0056] However, in unsupervised learning, the data used to train a model primarily consists of inputs where the structure or relationship of the inputs to outputs is not readily known. The artificial intelligence model may be tasked with discovering the structure and relationships contained in the data. Although certain embodiments may refer to unsupervised learning, embodiments may be implemented for supervised, unsupervised, and semi-supervised models. Examples of unsupervised learning techniques include clustering, latent variable models, blind signal separation, and principal component analysis.

[0057] In semi-supervised learning, the data used to train a model may include unlabeled data in conjunction with labeled data. The labeled data may be data that may be used to fit the data to a known structure (i.e supervised learning), while the unlabeled data may be used for further training and greater learning accuracy.

Clustering may be performed using the un!abe!ed data, and each cluster may be labeled according to the known structure provided by the labeled data.

[0058] One form of machine learning is graph-based machine learning, which is based on graph topology. A topological graph G may comprise data for an information space represented by a set of nodes V. Each node may represent specific information for an event (e.g. transaction, recommendation, login attempt, etc.) or may represent specific information for a profile of an entity or object (e.g. user, login device, obstacle, etc.) The nodes may be related to one another by a set of edges, E. An edge may be described as an unordered pair composed of two nodes as a subset of the graph G = (V, E), where is G is a graph comprising a set V of vertices (nodes) connected by a set of edges E.

[0059] An edge may be associated with a numerical value, known as a weight that may be assigned to the pairwise connection between the two nodes. The edge weight may be identified as a strength of connectivity between two nodes and/or may be related to a cost or distance, as it often represents a quantity that is required to move from one node to the next. The value of an edge weight may be determined based on an actual measured quantity such as travel time, currency, computational cost, energy,

[0060] For example, an edge weight may represent the length of a route, the capacity of a communication line, or the energy required to move from one state to the next. The connections between nodes, as defined by corresponding edges and weights, may be evaluated in order to determine optimal routes or movements, such as to reach an end goal in the most efficient manner. For example, the edges in a graph for a communication network can be evaluated to find an optimal route for delivering a message to ail nodes using the least amount of resources or in the least amount of time (i.e. traveling salesman problem). Various optimization techniques may be used to find optimal paths in a graph.

[0061] FIG. 2 shows a depiction of a topological graph. Graph 200 is composed of nodes labeled by letters (A through H). Each node, A-H, may represent a specific event, entity, or characteristic thereof. For example, node A may represent a user (i.e. user A) in a social network, and nodes B, J, and G may represent users who are friends with user A or may represent posts liked or events attended by user A The nodes are linked to each other by edges labeled by associated connection weights.

[0062] For example, the weight for the connection between node A and node G is

1 (relatively weak), but the connection between node G and node E is 14 (relatively strong). This may, for example, represent a degree of commonality between two users of a social network. The connections In graph 200 may be evaluated In order to achieve a desired outcome. For example, an Al model may be created for suggesting friends in a social network, and a learning algorithm may use the information of graph 200 as training data for predicting which users likely know each other. For example, the Al model may recognize that B and H are both highly connected to node D, and may suggest user B as a friend to user H.

[0063] The structure of the AFDCS according to embodiments of the invention may be better understood with reference to FIG. 3. The AFDCS may be composed of a number of modules, each enacting or contributing some part to the methods described above.

[0064] FIG. 3 shows the AFDCS 310 interfacing with a collection of client computers 302-304, a collection of processor computers 306-308, and an external database 330. The one or more processor computers 306-308 may be candidate processor computers or other processor computers that can be capable of processing messages from the one or more client computers 302-304.

[006S] The AFDCS 310 may be in the form of a server computer comprising a data processor, and a computer readable medium. The computer readable medium may comprise code, executable by the data processor to implement a method comprising: receiving, by a server computer, a plurality of messages from one or more client computers; determining, by the server computer, at least one candidate processor computer; determining, by the server computer, if the at least one candidate processor computer can process data elements from the plurality of messages according to predetermined parameters using an analytical model built using a graph learning algorithm; if the server computer determines that one or more candidate processor computers of the at least one candidate processor computer can process the data elements according to predetermined parameters, then transmitting, by the server computer, the plurality of messages or other messages derived from the plurality of messages to the one or more candidate processor computers; and if the server computer determines that one or more candidate processor computers of the at least one candidate processor computer cannot process the data elements according to the predetermined parameters, then selecting, by the server computer, one or more other processor computers to process the data elements according to the predetermined parameters, and then transmitting, by the server computer, the plurality of messages or other messages derived from the plurality of messages to the one or more other processor computers for processing.

[0066] Specifically, the AFDCS may comprise a message queue 312, a processor rules database 314, a front end processing module 316, a throttling and routing module 318, a rules module 320, an event processing module 322, a model builder module 324, a short term history database 326, and a historical database 328.

[0067] The client computers 302-304 may comprise any number of devices capable of formulating and transmitting requests directly or via a network such as the internet or a cellular communications network. These may for example include laptop computers, desktop computer, smartphones, smartwatches, smart cars, videogame consoles, etc. These requests may interchangeably be referred to as messages.

These messages may be transmitted for any suitable purpose. For example, a message may be a web search request sent to processor servers 306-308. The search request may indicate that one or more client computers 302-304 want one or more processor computers 306-308 to process the search requests, produce the search results, and transmit the search results back to the respective client computers. In another case, this message may be a payment request, requesting that that one or more processor computers enact a payment according to the contents of the message, for example, paying a merchant for goods and services. The messages may comprise a number of data packets adhering to any appropriate standard or protocol. For example, an individual message may comprise multiple Transport Control Protocol packets. These packets may possess a header, a source port field, a destination port field, a sequence number, an acknowledgement number, a checksum, and payload data, among others. The client computers may possess one or more single core or multi-core processor and one or more computer readable medium coupled to those processors. [0068] The processor computers 306-308 may comprise any number of devices capable of receiving messages and processing those messages. As non-limiting example, a processor computer may comprise a web server enacting a search engine. The processor computers 306-308 may receive messages in the form of search requests, process those search requests and transmit messages back in the form of search results. As another example, the processor computer may be operating an encryption service, receiving encrypted messages from client computers 302-304 via the AFDCS 310, decrypting those messages and transmitting them back to the client computers 302-304. Alternatively, the processor computers 306-308 may be a payment processor, processing payment requests transmitted from client computers 302-304 and enacting a payment between one or more entities associated with client computers 302- 304.

[0069] The message queue 312 is a subsystem that receives and stores messages and responses received from the client computers 302-304 and processor computers 306-308. Additionally, the message queue 312 may release messages and may transmit them to their appropriate destination. The message queue may interface with two other systems, the processor rules database 314, and the front end processing module 316. The processor rules module 314 may dictate how messages are released and the destination of those messages. The front end processing module 316 may observe and extracts message features from messages in the queue such that artificial intelligence models may learn from and make decisions based on those message features.

[0070] In some cases, the message queue 312 may not transmit the messages in the message queue 312. Instead, the message queue may generate subsequent messages based on the messages in the message queue and transmit those messages instead.

[0071] The message queue 312 may not necessarily be a first in first out data structure like a traditional queue. Instead, messages may reside on the message queue, which may be implemented as special hardware and/or software such as an allocated range of memory addresses. The messages are released from the message queue 312 according to the rules established in the processor rules database 314. In some cases, a message may be intentionally held in the message queue 312 in order to throttle or reduce the rate at which messages are sent to the processor computers 306- 308.

[0072] The processor rules database 314 can be a database containing a number of rules which may govern how messages are released from the message queue 312 and transmitted to the processor computers 306-308. The processor rules database 314 may store and provides these rules to the message queue 312 so they may be enacted. The processor rules database 314 may interface with the throttling and routing module 318.

[0073] The processor rules database 314 may take on any appropriate structure, such as a relational or non-relational database. The rules may be stored in the database in any appropriate form or structure. For example, the rules could be stored in the form of a decision tree, an abstract data structure comprising a number of parent and child nodes, beginning at the root node and ending at a collection of leaf nodes. Each non-leaf node in the free may correspond to a determination or evaluation and the leaf nodes may correspond to actions. The processor rules database may have processor rule entries associated with each processor computer among processor computers 306-308.

[0074] The front end processing module 316 may be better understood with reference to FIG. 4. In FIG. 4, the front end processing module 404 is shown interfacing with a number of external modules and databases, such as the external database 402, the short term history database 406, the historical database 412, the event processing module 414, the rules module 416, the model builder module 418 and the message queue 420. The front end processing module 404 may comprise two submodules, a data normalization module 408 and a data orchestration module 410. Each external component shown in FIG 4 may correspond to similarly named components in FIG. 3, i.e., short term history 406 may correspond to short term history 326 from FIG. 3.

[0075] In general terms, the front end processing module 404 may prepares data so that it may be used by other modules. This may be data retrieved from the external database 402, the short term history database 406, or the historical database 412. This may also be data extracted from messages in the message queue 420. This may be accomplished through the data normalization module 408 and the data orchestration module 410. Once the data is prepared for further use, it may be transmitted to the event processing module 414, the rules module 416, and the model builder module 418, which may perform their respective functions. Additionally, the front end processing module 404 may monitor packets received from the processor computers in order to extract network outcomes associated with those packets.

[0076] The external database 402 may contain message related data which may be used as features to train one or more artificial intelligence models in addition, the message related data may include data that is used by later modules in order to identify patterns or make decisions. As an example, the data from the external database 402 may include weights, values which are used to amplify or attenuate the effect of different message related features. The weights stored in the external database 402 may cause the models to pay more attention to certain features of incoming messages, and less attention to other features. The data stored in the external database 402 may be in any suitable form and stored in any suitable data structure or database, such as a relational or non-relational database

[0077] Message features may constitute any information which may be inferred from a message or data packets comprising that message. These features could include the source port or source address of a data packet, the time the message was received, the time the message was sent, the number of hops that the message encountered, etc. Message features may also constitute“sub port” classifications, such as the URL at which the message originated from, the host, or the Multipurpose internet Mail Extension (MIME). Message features may also include traceroute information, such as the internet nodes which the message was sent though, and may also include the mean hop time. One skilled in the art may conceive of any number of message features or dimensions that may be extracted or inferred from received messages.

[0078] Message features may also comprise statistical measurements, such as velocity features like the response time standard deviation, the minimum communication time, the maximum communication time, the spread, the number of messages sent to a particular processor computer over a given time period, etc.

[0079] The short term history database 406 may contain records relating to messages received in the immediate or near immediate past. In some instantiations, this may mean message received in the last ten or fifteen minutes, or over the last twenty-four hours. The short term history database 406 may store sets of features associated with messages received over that time period. The short term history database 406 may also store network outcomes recorded over the same time period. These network outcomes may include a variety of features or metrics which may be used to estimate or infer the state or performance of a number of processor computers. The short term history database 406 may include the identity of one or more processor computers associated with their respective network outcomes, as well as stored copies of received messages.

[0080] Examples of network outcomes may include response time, packet loss (the difference between the number of packets sent to the processor computer and the packets received, bit rate (the number of bits received or processed per unit time), throughput (the maximum message processing rate), availability (the portion of time a system is in a functioning condition), etc. Network outcomes also include the probability of network or processor computer failure over a time period, such as 1 minute, 5 minutes, or 10 minutes. These are intended to be non-limiting examples, one skilled in the art may conceive of any number of network outcomes which may be inferred or measured.

[0081] The data normalization module 408 may extract and prepare data so that it may be used by other modules. This may include the data orchestration module 410, and the event processing module 414. This may involve extracting message features from messages received from the external database 402, the short term history 406, the historical database 412, or the message queue 420. it may also involve initially processing the data such that it can be used by other modules. This may involve separating out message features into groups. The data normalization module 408 may comprise a series of data normalizers. [0082] The data normalization module 408 may also perform some data analysis For example, the data normalization module 408 may evaluate message features. As one non-limiting example, the data normalization module 408 may evaluate the source address associated with a received message. If the source address corresponds to a blacklisted address, the data normalization module 408 may not process the message further, and may transmit an indication to the message queue flagging the message for deletion.

[0083] Data normalization may involve normalizing data such that it lies in a specific range. As an example, rather than using latency in milliseconds, latency could be measured in terms of a ratio with an important latency figure. For example, if a predetermined parameter associated with latency equals 400 milliseconds, each measured network outcome could be divided by 400. Provided that the normalized data is within the range 0 to 1 , the network outcome does not exceed a predetermined parameter. This may be useful for fuzzy variable analysis.

[0084] The data orchestration module 410 may extend the functions performed by the data normalization module 408, such that the data may be better used to train the rules module 416 and the model builder module 418. This may include“deduping,” removing repeated or identical data points from the data set. It may also involve converting data from one form to another i.e., converting data expressed as a 32 bit unsigned integer to a 64 bit signed integer. There may be a number of other ways in which data may be orchestrated for future processing.

[0085] The historical database 412 may include historical data related to messages and message features. This may include data similar to the data stored in the short term history database 406, only on a longer timescale. While the historical database 412 may include individual messages and message features, it may also include data which deals with more broad trends in message features over time. As an example, the historical database 412 could contain statistical measures of message features rather than the features themselves. A number of message features collected throughout a given time period, such as a day, could be averaged and stored in the historical database 412. [0086] To summarize, the front end processing module 404 may collect data from an external database 402, a short term history 406, and a historical database 412.

Additionally, the front end processing module may collect data from messages in the message queue, and normalizes and orchestrates that data such that it can be used by other modules. After normalizing the data, including deleting repeat records or restructuring the data, the data normalization module 408 may transmit the data to the event processing module 414, which may then evaluate the data in order to determine whether to throttle or reroute one or more messages in the message queue, or alert a network administrator. The data normalization module 408 may also transmit data to the data orchestration module 410, which further processes the data so that it may be used by two learning modules, the rules module 416 and the model builder module 418.

[0087] Returning to FIG. 3, the front end processing module 316 may transmit various data to a number of other modules. These may include the model builder module 324, which may use data from the front end processing module in order to learn to identify patterns in message features which may be used by the event processing module 322 in order to determine if the system should throttle or reroute messages in the message queue 312, or alert an administrator. The model builder module 324 may be better understood with reference to FIG. 5.

[0088] FIG. 5 shows a system block diagram showing the model builder module 502, the event processing module 504 and the data orchestration module 514. These modules are substantially similar to similarly named modules in FIG. 3, i.e. , the event processing module 504 corresponds to the event processing module 322 from FIG. 3, etc.

[0089] In general, the model builder module 502 may build the artificial intelligence model used by the system in order to identify when to throttle or reroute messages received by the system or alert a network administrator. The model builder module 502 may comprised of at least four submodules, a Kalman filter parameter database 506, a motif database 508, a short term profile neural network database 510, and a model builder 512. [0090] The model builder 512 may learn from the data provided by the data orchestration module 514 and determines the parameters or defining characteristics for three models, an ensemble of unscented Kalman Filters, a“motif lookup, and a short term profile neural network. These parameters or defining characteristics may be stored in their respective databases, the Kalman filter parameters database 506, the motif database 508, and the short term profile neural network database 510. These parameters may be used by the event processing module 504 in order to evaluate messages in the message queue and determine if the messages should be throttled or rerouted, or a network administrator should be notified in order to prevent undesirable network outcomes.

[0091] The following subsections will discuss the nature of the models used and how they might be generated by the model builder 512. i. Kalman Filter

[0092] A Kalman filter is a special class of filter. A filter is a device or abstraction which removes undesired components from a signal. The Kalman filter outputs estimates of variables that tend to be more accurate than estimates generated by a single measurement alone. For example, a train on a straight track may use a Kalman filter to determine its position. The train has two sensors, a velocity sensor and a GPS. The velocity sensor can be used to estimate the location of the train using physical models, i.e., Newton^’s laws of motion. The GPS can be used to measure the position of the train. However, both sensors have noise associated with them. As such, a position estimate based on the velocity sensor may not reflect the actual position of the train, and likewise for the GPS The Kalman filter can use both sensor data in order to more accurately determine the train’s location [0093] Kalman filters, particularly unscented Kalman filters, may be better understood with reference to Wan, Eric A , and Rudolph Van Der Merwe. "The unscenied Kalman filter for nonlinear estimation." Adaptive Systems for Signal

Processing , Communications , and Control Symposium 2000. AS-SPCC. The IEEE 2000. leee, 2000 [0094] The Kalman filter works in two steps, a prediction step and an update step. In the prediction step, the Kalman filter produces estimates of the current position of the train. This is accomplished by using the trains measured velocity, the time between the last measurement, and the previous position of the train. In the update step, the Kalman filter estimates the position of the train using its GPS. The Kalman filter combines these estimates using a weighted average, where more weight is given to measurements with greater certainty.

[0095] Kalman filters perform well with linear state transition models and observation models. They do not perform as well with highly non-linear models. An unscented Kalman filter uses the“unscented” transform in order to sample a set of “sigma points” from around the mean of a measurement. The sigma points are used to produce new mean and covariance estimates, which are more accurate than estimates used in a traditional Kalman filter.

[0096] An adaptive Kalman filter is a Kalman filter where the state transition model or observation model can be adapted to provide more accurate estimates. How well a Kalman filter performs can be estimated by determining an error value. This error value describes the difference between the predicted state of the system (i.e., where the filter says the train is) and the actual state of the system (where the train actually is).

[0097] These errors may arise when either the state transition model or observation model is inadequate. For example, the state transition model describing the motion of the train may be only a function of the train’s velocity and time. This model doesn’t consider other factors, such as friction and drag, which may cause the train to lag behind its predicted position.

[0098] An adaptive Kalman filter modifies its models to minimize or bound its error. This may be accomplished through covariance matching, wherein the covariance of the filter’s“innovation sequence” is matched to the covariance obtained from the filter itself. If the covariance of the innovation sequence is greater than the covariance obtained from the Kalman filter estimates, then the assumed covariance of the noise in the system model Q and the variance of the noise in the measurement model R may be increased in order to reduce the difference between the two covariances. [0099] In short, an adaptive Kalman filter further increases accuracy by modifying its state transition model and observation model to minimize its error. This may be accomplished by covariance matching.

[0100] An ensemble Kalman filter is a Kalman filter especially adapted to problems with large numbers of variables. As the number of variables increases, it becomes computationally infeasible to maintain the covariance matrix used by a traditional Kalman filter. Rather than using a covariance matrix, the ensemble Kalman filter uses a sample covariance generated from an ensemble, a collection of state vectors which are used to represent the state space of the system. [0101] Thus an ensemble adaptive unscented Kalman filter is a Kalman filter which makes use of the unscented transform to provide better estimates of state variables in non-linear systems, adapts its state transition and observation model to improve estimate quality, and uses an ensemble in order to allow the Kalman filter to scale to a large number of variables. [0102] The ensemble of adaptive unscented Kalman filter may be used in embodiments to forecast network outcomes, such as processor computer failures probabilities. The filter may use a state transition model the describes how network outcomes will change with respect to incoming message features, and an observation model which records network outcomes associated with the processor computers. The state transition model is expected to have some noise because it is not a perfect model of how the processor computers react to messages. Additionally, any number of unknown variables influence the performance of the processor computers. Since these variables cannot be easily accounted for, their contribution to the state transition model signal is treated as noise. Likewise, the observation model cannot account for random fluctuations in network outcomes. These random fluctuations are likewise treated as noise.

[0103] The result is a smoother forecast of a sequence or series of network outcomes than might be obtained from the state transition model or the observation model alone. [0104] Returning to FIG. 5, the model builder 512 determines a set of parameters describing the ensemble adaptive unscented Kalman filter and stores those parameters in the Kalman filter parameter database 506, such as the ensemble state vectors, state transition model, observation model, covariance of the noise in the system model, variance of the noise in the observation model, etc.

[0105] At a later time, or concurrently, the event processing module 504 may retrieve these parameters from the Kalman filter parameter database 506 in order to generate a Kalman filter which it may use to forecast network events, such as processor server failure or a spike in latency. These forecasts may be used by the event processing module 504 in order to identify signatures of upcoming system failures. ii. Sequential Graph Learner

[0106] Another model generated by the model builder 512 may be a sequential graph learner. A sequential graph learner is a model that breaks down sequences of events into a branching graph that describes those sequences of events. As an example a node in the graph, representing a specific event, may have a directed edge pointing toward one or more nodes representing other related events.

[0107] As an analogy, consider a simple dictionary comprising three words, “apple,”“adapt,^” and“applaud.” Among these words, only 7 unique letters are used (A, P, L, E, D, T, and U). Each letter in each word may be thought of as an event, and the word itself as a sequence of events. A sequential graph may possess a node for each letter in each sequence, that is, a node for A, P, L, E, D, T, and U. The edges between the nodes describe how the letters follow each other in sequence in“apple,” the letter P follows the letter A, in“adapt,” the letter D follows the letter A, and the letter P follows the letter A. In“applaud,” the letter P follows the letter A Thus the“A” node may be connected by a directed edge pointing to the P and D nodes. Likewise, the P node will point toward the L node and the T node, the L node will point towards the E node and the A node, the E node won’t point toward any node, the D node will point toward the A node, the T node won’t point toward any node, and the U node will point toward the D node. [0108] The weight on any given edge may be proportional to the frequency of that sequence. For example, D only follows A in“adapt,” whereas P follows A in every other word in the dictionary. A weight of 0.67 or 67% could be assigned to the edge pointing from A to P, and a weight of 0.33 or 33% could be assigned to the edge pointing from A to D. This reflects the fact that in two out of three A sequences in the dictionary, P follows A, and in one out of three A sequences in the dictionary, D follows A. Likewise, the weight of the edge may be derived from the entropy associated with the sequence of events, or the number of events branching off from each event.

[0109] A more robust graph learner may construct a graph wherein each node may not represent a single event, but rather a collection of events. In the case above, this could mean an“AP” node, in addition to the A and P nodes.

[0110] The sequential graph learner may be better understood with reference to FIG. 6 which shows a graph 600, two input event sequences 602, four graph“layers” 604, 606, 608, and 610, and a collection of event nodes, some of which comprising multiple events.

[0111] Since both event sequences [A, B, C, D] and [A, B, C, E] begin with event A, the first layer 604 of the graph is a node representing that event. On the next layer 606, there are two nodes, one corresponding to the composite event AB, and one corresponding to the event B. Composite events are useful to describe events which typically happen in sequence. As an example, an event such as“answering the door” will often follow“ringing the doorbell.” As each of these events follow event A in both input event sequences, A has a directed edge pointing to each of these events.

[0112] The third layer 608 of the sequential graph has three nodes, the first corresponding to event ABC, the second corresponding to event BC, and the third corresponding to event C. Each node in the second layer 606 points to nodes in the third layer 608 based on the input event sequences.

[0113] The composite event ABC exists in both sequences and follows its sub- event AB, thus the AB node has a directed edge pointing toward the ABC node.

Likewise, the AB node has a directed edge pointing toward the C node. The B node has a directed edge pointing toward the BC node and the C node, as each event could follow event B. Notably, the B node does not point to the ABC node, as the B node does not contain event A. Likewise, the AB node does not point to the BC node, as the BC node does not contain event A.

[0114] Layer four 610 is much larger than the preceding layers because the event sequences diverge. One event sequence ends with event D, the other event sequence ends with event E. This means that each node on the third layer may point toward four distinct nodes. ABC for example points toward ABCD, ABCE, D, and E. BC points towards BCD, BCE, D, and E, and C points towards CD, CE, D, and E. This sequence graph totally describes how events follow each other given the two input sequence events, both for individual events and composite events comprising two or more events. The model builder 512 may create this graph structure as described above, working through each event sequence and adding event nodes and edges until the entire graph is completed. The model builder 512 may also assign a weight to each edge, which may correspond to the likelihood that one event follows another event sequentially or an entropy level.

[0115] A motif may refer to a sequence of nodes connected by directed edges in the event sequence diagram. With reference to FIG. 6, (A, B, C, and D) may be considered to be a motif. Likewise (A, AB, ABC, and ABCD) may also be considered to be a motif.

[0116] An event may refer to message features or a transition between message features extracted from messages in the queue. For example, the system may receive a message from Norway, followed immediately by a message from New York. The sequential graph learner may construct the graph such that a“receive a message from Norway” node has a directed edge pointing at a“receive a message from New York” node. Because it is unlikely that there is any causation or correlation between these two events, the weight of the edge between the Norway and New York nodes may be assigned a low value. In many cases, an event may comprise a message feature that normally wouldn’t be considered an“event,” such as the standard deviation of response time or other message features. [0117] The model builder may identify motifs in the graph based on the strength of the association between two events in sequence. For example, a given node in the event sequence graph may have an edge pointing toward 100 different nodes, as a result of a number of different event sequence in which that first event was sequentially followed by those 100 different event nodes. If there were no correlation between the first event and a subsequent event, it follows that the weight of the edges pointing at the 100 different nodes would be random, small (near 0.01 ), and roughly equal. If an edge has a weight much greater than the expected weight (for example, 0.5), it follows that there is a likely correlation between the first event and the particular subsequent event connected to the first event by that edge. This may be an indication of a motif.

[0118] The model builder 512 may traverse the graph and evaluate the relative weight and relative cardinality of each node and its connecting nodes and compare them to a predetermined threshold in order to determine if that sequence of nodes constitutes a motif. For example, the weighting threshold could be 0.2, indicating that any connection between nodes with weight 0.2 or greater is a motif. These sequences of events with strong associations with one another can be stored in a database (such as the motif database 608) as motifs.

[0119] Once a motif has been identified, it can then be matched to one or more network outcomes, which may also be stored in a database in association with their respective motifs. These may be network outcomes that are observed after a motif is observed. For example, after the motif“ABCD” is observed from a message or sequence of messages in the message queue, the network outcome“latency increased by 50ms’’ may be observed. This network outcome could be associated with the motif. Much like the strength of the sequential relationship between events could be evaluated by the graph learner, the signal strength between a motif and its corresponding network outcomes may be evaluated. For example, during thousands of instances of receiving the motif“ABCD,” the latency may have increased by 50ms exactly once. This suggests that there isn’t a strong connection between the motif and a 50ms increase in latency. Additionally, the motif may be associated with risk scores or probabilities.

These risk scores or probabilities may include probability of a specific network outcome, such as“the probability that packet loss increases to 2Q%” or“the probability of processor computer failure over 10 minutes.”

[0120] The model builder 512 may work through the motif database and organize it by the strength of association between motifs and network outcomes. The model builder can look for short motifs with strong sequences and strong signal to noise ratios. The sequential graph learner may also prune motifs without strong associations from the database.

[0121] During real-time scoring, the event processing module 504 may identify a motif from messages in the message queue, it may then query the database in order to find network outcomes associated with that motif. The event processing module now knows network outcomes that may occur as a result of the messages in the message queue. These network outcomes may be used in order to determine a risk level. For example, if an identified motif has a strong association with an undesirable network outcome (e.g., processor computer failure) that indicates a high risk, whereas if an identified motif has a weak association with a benign network outcome (e.g., a negligible increase in latency), that may correspond to a low risk. The network outcomes may also be compared against predetermined parameters (such as parameters relating to response time, packet loss, latency, etc.) as part of the analysis. iii. Short-term Profile Neural Network [0122] The model builder 512 may also train a short term profile neural network.

This neural network may take in a number of message features, network outcomes, risk scores, or probabilities. It may use these data to produce a number of subsequent risk scores or probabilities. These risk scores may be based on the likelihood of a network outcome occurring and the seriousness of that network event. For example, an outcome such as processor computer failure having a high likelihood of occurring corresponds to a high risk level, and an outcome such as a mild increase in latency having a low likelihood of occurring corresponds to a low risk level. [0123] The concept and use of artificial neural networks for forecasting may be better understood with reference to Park, Dong C., et ai. "Electric load forecasting using an artificial neural network " IEEE transactions on Power Systems 6.2 (1991 ): 442-449.

[0124] A neural network may be a computer system or computer abstraction comprising an input layer, a number of hidden layers, and an output layer. Each layer may comprise a number of neurons. These neurons may be connected by“synapses” to other neurons on subsequent layers. A single neuron may have a number of neurons connected to it from the previous layer. Each connected neuron may send that single neuron a signal. The single neuron may perform some computation using each of those signals and produce an output based on those computations. For example, a neuron may have a state variable between zero and one, where zero indicates that the neuron is inactive, and one indicates the neuron is active. The single neuron may receive, via the synapse, the state variables of each of the neurons that“feed into if and may perform a linear combination using those state variables. [012S] To continue the example, a neuron E may be connected to neurons A, B,

C, D. The state of E, SE, may be a linear combination of the states of A, B, C, and D.

Le., SE = WASA + WBSB + WcSc + WDSD. Where WA is a learned weight associated with neuron A, and likewise for WB, WC, and WD.

[0126] The learning process of a neural network as described above could involve determining the weights W over a learning phase such that the output of the neural network (the output risk score) is closest to the expected output of the neural network given the training data. As an example, one set of network features may be found to correlate with a particular network outcome which has a particular risk score, a second set may be found to correlate with a different network outcome with a different risk score, and so on for a number of sets of network features and risk scores which comprise the training data set. The neural network may be trained so that the difference between its estimation of risk score given input data features is minimized i.e. , the risk scores it predicts are reasonably close to the risk scores associated with historical message features. [0127] The weight parameters, number of neurons, synaptic connections, and other parameters which describe the neural network may be stored in the short term profile neural network database 510. The event processing module 504 may use these parameters to construct an instance of the neural network, and input into the neural network message features extracted from messages in the message queue in real time. The neural network may then produce a risk score, which may be used by the event processing module 504 to determine if messages in the message queue need to be throttled or rerouted, or if an administrator needs to be contacted or notified in order to prevent an undesirable network outcome.

[0128] Additionally, the short term profile neural network may continually rebuild itself using residuals from prior runs. Residuals refer to the difference between an observed variable and a predicted value in essence, the risk score predicted by the neural network may be compared against the“actual” risk score produced by evaluating measured network outcomes as they occur. The difference between these residuals can be used to further train the neural network, allowing it to become more accurate and better attuned to the information space as time progresses. This allows the neural network to adapt to a changing feature or information space

[0129] These residuals may be processed using an auto-regressive moving average in order to filter out noise. The auto-regressive moving average model states that a time series X (in this case the residuals as a function of time) can be described as the sum of an autoregressive component, a moving average component, and white noise error terms. By passing the residuals through an auto-regressive moving averager, the residual signal is smoothed, filtering out the white noise error terms

[0130] To conclude, the model builder module 502 may comprise a Kalman filter parameters database 506, a motif database 508, a short-term profile neural network database 510, and a mode! builder 512. The model builder 512 may receive training data from the data orchestration module 514 and generates three models, via three learning processes described above. These models may include an ensemble adaptive unscented Kalman filter, a sequential graph learner, and a short term profile neural network. The parameters which describe these models may be stored in each of their respective databases. The event processing module 504 may retrieve these

parameters from the databases in order to evaluate message features extracted from messages in real time. It may generate a risk score or risk scores in order to determine when the system needs to throttle or reroute messages in the message queue, or alert an administrator in order to prevent an undesirable network outcome.

[0131] Returning to FIG. 3, the event processing module 322 may receive data from the front end processing module 316 and evaluates that data using the models provided by the model builder module 324. Based on this evaluation, the event processing module 322 may generate a signal and transmit it to the rules module 320. This signal indicates to the rules module 320 whether or not it needs to throttle messages, reroute messages, or alert an administrator about impending network issues. This data may comprise message features extracted from messages in the message queue 312 by the front end processing module 316. It may also comprise data and/or message features extracted from the short term history 326, the historical database 328, and the external database 330.

[0132] In effect, the event processing module 322 may use incoming network features in conjunction with the sequential graph learner in order to identify motifs or patterns of events in the message features. These patterns may be matched to network outcomes, risk scores, and probabilities. These network outcomes, risk scores, or probabilities, along with the message features may be passed to the short-term profile neural network in order to produce risk scores or probabilities associated with network outcomes. These risk scores and probabilities may be passed to the ensemble adaptive Kalman filter, which may filters out noise from the assorted data and summarizes the output of the short term profile neural network. This summarization may be compared against a risk threshold in order to determine if the system should throttle, reroute, or alert an administrator.

[0133] The event processing module 322 may optionally further use an ensemble adaptive Prim’s algorithm in order to determine a risk threshold. When the risk score produced by the event processing module 322 exceeds the risk threshold determined with the ensemble adaptive Prim’s algorithm, the event processing module 322 may transmit a signal to the rules module 320 which indicates to the rules module 320 that it should either throttle network traffic, reroute messages in the message queue 312 or alert a network administrator.

[0134] Prim’s algorithm produces a minimum spanning tree from a graph or subgraph, a collection of ail nodes connected only by the edges with the minimum weight. The algorithm picks an arbitrary starting node and adds it to the tree structure.

It then scans all edges connected to that node and finds the edge with the least weight. This edge and the connecting node are added to the tree. Next, Prim’s algorithm scans all edges attached to the tree, that is, all edges attached to the first or second node that connect to nodes not in the tree. Once it finds the edge with minimum weight, that edge and its connecting node are added to the tree. This is repeated until ail nodes in the graph or subgraph are in the minimum spanning tree. Prim’s algorithm may be better understood with reference to Greenberg, Harvey J. "Greedy algorithms for minimum spanning tree." University of Colorado at Denver (1998).

[013S] In a graph comprised of a number of subgraphs, a minimum spanning tree can be used to simplify the graph by combining the nodes in the subgraph into a single node, or by removing additional edges. Graph analysis algorithms can be time intensive, and reducing the number of nodes or edges can decrease the amount of time they take to complete. By using prim’s algorithm, a smaller graph can be produced that approximates the structure of the larger graph.

[0136] Two examples of simplifying a graph using Prim’s algorithm are shown in FIG. 7 Graph 700A shows a graph with a subgraph consisting of nodes A, B, D, G, and J. The minimum spanning tree between these nodes is shown with the dashed lines. Graph 700B removes all edges in the subgraph that were not part of the minimum spanning tree, the edge between G and J and the edge between B and D.

[0137] Graph 702A shows a subgraph comprising nodes C, E, F, H and I. The minimum spanning tree is shown using dashed lines. Graph 702B shows the tree collapsed into a single node. This could be accomplished using a modified Prim’s algorithm that makes use of thresholding. The modified Prim’s algorithm could proceed as normal, but never add an edge with weight over the threshold value. This enables the tree to grow through a graph until it can no longer grow, at which point the tree is collapsed into a single node. This is useful in applications where a graph exhibits several isolated subgraphs. The weight between nodes in the subgraphs may be very low compared to the weight of the edges connecting the subgraphs. In calculations where high weight is important, the subgraphs can effectively be reduced into single nodes, vastly simplifying the graph.

[0138] An ensemble of adaptive Prim^'s algorithm is an algorithm where multiple Prim’s algorithms are running concurrently and the Prim^'s algorithms adapts to the dataset. For example, 100 nodes could be sampled from the graph. The average weight of the edges connected to each of these nodes could be determined in order to establish a threshold. For example, if the average weight is 10, the threshold could be set to twice that rate, or 20, this is adaptive because as the graph changes over time (such as variations in average edge weight as more nodes are added to the graph), the threshold will change to reflect the changes in the graph. Prim’s algorithm could be applied at each of the 100 nodes, growing a minimum spanning tree at each node until no nodes can be added with weight less than the threshold, 20. At this point, each minimum spanning tree could be collapsed into a single node. This greatly reduces the size of the graph, while still retaining its general structure.

[0139] Using this technique, a graph, such as the motif graph produced by the model builder module 324 may be reduced into a smaller graph that can be used to quickly determine general properties about the graph. These general properties can be used to infer expected network outcome risk and infer a risk threshold.

[0140] As stated above, nodes in the motif graph may include events or sequences of events which are associated with network outcomes. Nodes that are connected with one another may be seen in tandem or sequence. For example, a node could represent the event“received a message from an IP address in New York” another nearby node could represent the event“received a message from an IP address in Vermont.” if these nodes were combined into a single node, that single node would roughly reflect the event“received a message from an IP address in the northeast United States.” Much like a graph of animals could collapse a“cat” node and a“dog” node into a“pet” node.

[0141] With the reduced graph, an average or representative risk level can be determined much more quickly than with the full motif graph. An abnormal network event or undesirable network outcome could be defined as a network outcome with risk score one standard deviation greater than the representative or average risk score.

Thus when incoming message factors, data, or event sequences are evaluated using the models, the event processing model 322 could compare the produced risk score to this threshold or cut-off value, and transmit a signal to the rules module 320 if the risk score exceeds the threshold value, the signal indicating that the rules module 320 should throttle messages, reroute messages, or alert a network administrator. The adaptive ensemble Prim’s algorithm can be used to generate a set of rules that may be used by the rules module 320 in deciding whether to throttle messages, reroute messages, or alert a network administrator. [0142] The AFDCS may use this cut-off value as part of the decision to transmit messages to a group of candidate processor computers or a group of other processor computers. More specifically, the AFDCS may compare the cut-off value to a“trigger value” derived from current or forecasted network outcomes if the trigger value does not exceed the cut-off value, the AFDCS may presume that network outcomes are normal or near normal, and may not determine whether the candidate processor computers can process data elements in messages received from client computers.

[0143] The rules module 320 may be better understood with reference to FIG. 8, which shows the rules module 802 and three submodules, the rules engine 806, the behavior tree 808, and the evolutionary learner 810, in addition to three connecting modules, the throttling and routing module 806, the front end processing module 812, and the event processing module 814.

[0144] The rules module 802 may generate and apply rules which govern whether the system throttles, reroutes messages, or alerts a network administrator. The rules module 802 may act if it receives a signal or indication from the event processing module 814. In which case it may evaluate the rules stored in its behavior tree 808, given the risk score provided by the event processing module 814.

[0145] The evolutionary learner 810 may receive data from the front end processing module 812 and uses it to determine an optimal behavior tree. This may be accomplished using an evolutionary learning algorithm, such as ant colony optimization.

[0146] An ant colony optimization algorithm is a probabilistic technique used to solve graph based computational problems. For example, ant colony optimization can be used to find the shortest path between two points. Ant colony optimization may also be used to make an optimal determination, e.g., the best (shortest, longest, highest scoring, etc.) determination that can be made given what is known. This is

accomplished by using a number of computational agents, known as“ants.”

[0147] In the example above, the ants each start at the starting node. Each ant selects a connected edge and traverses it to the connected node. The ants select different connecting edges based on a priori information. For example, if the ants are presented with two initial connecting edges, one with weight one and the other with weight two, the ants may preferentially pick the edge with weight one, e.g., the ants select the edge with weight one with probability 66%, and select the edge with weight two with probability 34%. The ants repeat the process at each node they visit, wandering semi-random!y until they arrive at the target node. [0148] This process is repeated a number of times. However in each subsequent run, each of the probabilities associated with each edge in the graph change. This emulates pheromones in real ant movement strategies. As an ant wanders, it leaves a trail of pheromones. When the ant finds a food source, it collects the food sources and travels along its own pheromone trail back to the nest. All ants preferentially follow pheromones over wandering around. As multiple ants travel back and forth along the same path, they deposit more and more pheromones, which in turn further strengthens the pheromone trail in this way, once an ant scouts a food source, the nest can quickly relocate that food source and collect all the food. [0149] Similarly, the computational agents or ants in the ant colony optimization lay down pheromones along the edge they travel. These pheromones increase the relative probability of future ants selecting that edge. Over time, each trail may “evaporate” some amount, decreasing its probability relative to the other edges.

Additionally, the strength of each trail, or conversely, the amount it evaporates, may be proportional to the effectiveness of the path between the start node and the end node. As an example, if the ants are searching for the shortest distance between two nodes, a path with total length 10 is more attractive than a path with total length 20. As a result, the path with total length ten has more pheromones deposited along it in a single run than the path with length 20. This increases the relative desirability of the path of length 10 over the path of length 20. Eventually, over large numbers of trials, the ants will eventually converge on an optimal or near optimal solution.

[0150] With embodiments of the invention, an ant colony optimization algorithm may be used to determine the optimal course of action given all data and rules. As an example, the rules which dictate the course of action taken (i.e. , throttle, reroute, alert) may be expressed as a graph or tree-like structure, such as a decision tree.

[0151] A decision tree is a branching structure of nodes and edges which is used to determine what decision should be made. A free is typically made up of parent nodes and child nodes. Each parent node points along an edge toward each of its child nodes. The originating node, that is, the node with no parents, is called the“root node.” A node with no children is called a leaf node. In a decision tree, each non-leaf node may be an evaluation, and each leaf node may be an action.

[0152] Although less common, decision graphs are also possible, which may work in substantially the same way. An evaluation node may have a directed edge pointing to another evaluation node or a decision node.

[0153] Each edge in such a decision graph could correspond to a weight related to the efficacy of that path. For example, a start node might evaluate“whether the message originated from the eastern or western hemisphere,” and may have two branching edges corresponding to eastern and western hemisphere respectively. The graph may be traversed much like the decision tree until eventually a decision node is reached, which may indicate something like“throttle messages by 15%.” However, this does not take into account the efficacy of that decision. Perhaps throttling messages by 15% is ineffective in that case, and does not adequately prevent an undesirable network outcome. In that case, the weight along the edges of the path from the start node to the decision node may reflect that. As an example, the weight along an“effective” edge may be greater than the weight along an“ineffective edge.”

[0154] This allows the ant colony to determine the most optimal course of action. Each ant could select edges in the decision graph probabilistically, with greater probability assigned to edges with a higher initial weight, and greater probability assigned to edges with a strong pheromone trail. As a result, the ants could travel from the starting node to any number of decision nodes, eventually producing the best (i.e., the most effective at preventing undesirable network outcomes) paths to those decision nodes. The optimal determination produced by the ant colony optimization algorithm may indicate if at least one candidate processor computer can process data elements according to predetermined parameters.

[01 S5] This collection of paths may be converted into a decision tree, which may be stored by the evolutionary learner 810 in the behavior tree database 808. The rules engine may evaluate the decision tree using the information provided to it by the event processing module 814, i.e., risk scores, message features, etc.

[0156] For example, if the first node in the decision tree evaluates“whether or not a message was within a certain IP address range,” the rules engine 806 may check the IP range of the message and proceeds to the next node depending on whether or not the message was in the range. It continues until it arrives at a leaf node, which will contain a decision of whether or not to throttle, reroute, or alert a network administrator. This decision, along with any additional data received from the front end processing 814 module is provided to the throttling and routing module 804.

[0157] Returning to FIG. 3, the throttling and routing module 318 may use a neural network in order to update the set of processor rules stored in the processor rules database 314. This neural network may be trained based on historical network outcomes. It may receive current network outcomes and the decision produced by the rules module 320 and determine, if necessary, the throttling level needed using a double exponential smoothing algorithm.

[0168] These current network outcomes may be received as responses, or inferred from responses transmitted from processor computers, including candidate processor computers and other processor computers. For example, the AFDCS could transmit a message to a processor computer and time how long it takes for the processor computer to respond. From this, the AFDCS could determine the current network outcome associated with response time. This network outcome could be applied as the input to the neural network, which may result in a“risk score,” a score associated with the risk of a bad network outcome. This risk score can be used by the AFDCS to determine whether to transmit messages to candidate processor computers or other processor computers.

[0159] Exponential smoothing is a technique for smoothing time series data it is effective in removing high-frequency noise from time series data. The smoothed series is calculated based on the current value of the time series and all prior values.

However, as time progresses, exponentially less weight is given to each prior value in the time series. As a result, an exponentially smoothed series is usually a better reflection of a timed series than a moving average, but still effectively smooths the series.

[0160] Double exponential smoothing is an extension of exponential smoothing. Exponential smoothing may not work well when there is a trend in the data, i.e., if the values of the time series tend to increase or decrease over time. Double exponential smoothing includes a trend or slope component in addition to the data component, which is also exponentially smoothed. The combination of the smoothed trend and the smoothed data component better reflects the data if there is a trend. Further, double exponential smoothing allows for prediction of future smoothed time series values based on the trend component.

[0161] The throttling and routing module 318 may use exponential smoothing to determine a throttle value. As time passes, the throttling and routing module 318 may periodically receive network outcomes and other data from the rules module 320. These network outcomes may constitute a time series. Network outcomes, particularly response time may be smoothed to produce a smoothed network outcome time series along with a smoothed trend. In doing so, removing random fluctuations in the network outcomes. [0162] Further, by using double exponential smoothing, the throttling and routing module 318 may predict whether or not those network outcomes, such as response time, will increase or decrease as time passes. This enables the throttling and routing module to adjust the throttle level as needed.

[0163] For example, the throttling and routing module 318, based on the data from the rules module 320, may have assigned an initial throttle level to messages transmitted to a particular processor server, such as 20%, meaning the processor computer is receiving 20% less messages.

[0164] However, the smoothed response time series may show a trend of increasing response time, meaning that despite the 20% throttle rate, the response time is expected to increase. The throttling and routing module 318 may determine based on this trend that the throttle rate should be increased, and may assign a throttle rate of 30% to messages sent to that particular processor computer if the throttling and routing module 318 continually increases the throttling rate, and it seems to have no effect on the response time, it may decide to reroute messages to other processor computers, rather than further increase the throttle rate.

[016S] If on the other hand, the smoothed response time series shows a trend of decreasing response time, the throttling and routing module 318 may reduce the throttle rate, allowing more messages through. Eventually, if a cessation criteria is met, the throttling and routing module 318 may reduce the throttle rate to zero. [0166] In some cases, a value or values from the smoothed network outcome time series or smoothed response time series may be referred to as a performance score.

[0167] In summary, a combination of a neural network trained on historical network outcomes and a double exponential smoothing algorithm allows the throttling and routing module 318 to determine a throttle rate, increase or decrease the throttle rate, or reroute messages to an alternative processor computer. These decisions are translated into a set of processor rules, associated with each processor computer among processor computers 306-308, and stored within the processor rules database

[0168] This analytical model enables the AFDCS to determine a throttle rate according to predetermined parameters and transmit message to candidate processor computers or other processor computers at the throttle rate. For example, if the throttling and routing module determines a throttle rate of 20%, messages may be transmitted 20% less frequently than normal message transmission, or at 20% of the normal rate of message transmission.

[0169] The short term history database 326 includes messages, message features, and network outcomes associated with the immediate or near immediate past. This could, for example include periods of time such as within the last second, within the last 10 seconds, within the last minute, last 10 minutes, last 15 minutes, last 30 minutes, last hour, last day, etc. The short term history database 326 may have data retrieved from it by the front end processing module 316, which may process that data and forward it to the rules module 320, the event processing module 322, and/or the model builder module 324.

[0170] The historical database 328 may include messages, message features, and network outcomes associated with the past it may also include averages or other measures of central tendency, such as an average value associated with one or more message features or network outcomes. The historical database may include data collected over periods of hours, days, months or years. Like the short term history 326, the historical database 328 may have data pulled from it by the front end processing module 316, which may process the data and forward it to the rules module 320, the event processing module 322, and/or the model builder module 324.

[0171] The external database 330 may include messages, message features, network outcomes, or other data. This data may have been generated by another system such as another automated fault detecting control system 310. This data may be used in order to influence the operation of the automated fault detecting control system 310, in order to achieve better or more efficient performance, or such that it learns and develops machine learning models such as the ensemble adaptive Kalman filter, sequential graph learner, short-term profile neural network, or throttling and routing neural network more efficiently. The external database may have data pulled from it by the front end processing module 316, which may process the data and forward it to the rules module 320, the event processing module 322, or the model builder module 324.

[0172] A method according to some embodiments may be understood with reference to the system description above and FIGs. 9 and 10. FIG. 9 displays an overview of a method used to evaluate message features and network outcomes and throttle or reroute messages, or alert an administrator, FIG. 10 displays a method employed by the throttling and routing module to adjust the throttle rate or reroute messages to other processor computers.

[0173] Referring to FIG. 9, in step 902, the system may receive one or more messages or requests from one or more client devices or computers. This messages or requests enter the message queue in the automated fault detection control system. The automated fault detection control system may be embodied by a server computer in some embodiments of the invention. Each message or request may be associated with one or more candidate processor computers that is/are initially intended to process the message or request.

[0174] The method may also include determining, by the server computer, at least one candidate processor computer. The server computer may determine if the at least one candidate processor computer can process data elements from the plurality of messages according to predetermined parameters using an analytical model built using a graph learning algorithm.

[01 5] If the server computer determines that one or more candidate processor computers of the at least one candidate processor computer can process the data elements according to predetermined parameters, then the server computer transmits the plurality of messages or other messages derived from the plurality of messages to the one or more candidate processor computers. If the server computer determines that one or more candidate processor computers of the at least one candidate processor computer cannot process the data elements according to the predetermined parameters, then selecting, by the server computer, one or more other processor computers to process the data elements according to the predetermined parameters, and then transmitting, by the server computer, the plurality of messages or other messages derived from the plurality of messages to the one or more other processor computers for processing. The data elements in the messages in the plurality of messages may include any suitable data in those messages. Examples of data elements may be transaction amounts, data relating to how data may have been entered into a client computer by a user, an originating IP address for a client computer, a device ID for the client computer, or any other suitable data.

[0176] In some embodiments, for each message, the server computer generates two or more subsequent messages with the data elements for transmission to two or more other processor computers, and transmits the two or more subsequent messages to the one or more other processor computers. For example, a server computer such as the AFDCS may receive a single transaction message which may contain a transaction amount, a merchant name, a credit card number, an ID for the client computer that generated the message, data relating to how data was entered into the client computer, and an IP address for the client computer. Two or more subsequent messages may be derived from the initial message received by the server computer and may be sent to two or more processor computers, respectively. For example, one processor computer may be specifically adapted to determine if the transaction is authorized by evaluating the transaction amount. Another processor computer may be specifically adapted to determine if the IP address is coming from an authentic source. Yet another processor computer may be specifically adapted to determine if the data was entered into the client computer is consistent with fraudulent behavior (e.g., the speed and number of mouse clicks). Three subsequent messages may be sent to these three processor computers, where the three subsequent messages are derived from a single message that was received by the server computer. [0177] For example, at step 904, the front end processing module of the automated fault detecting control system (which may be a server computer) may evaluate a received message and extract relevant message features or data elements from that message. It may also receive network outcomes, message features, or any additional data from the short-term history database, the historical database, or the external database. Further, it may normalize or orchestrate all data before providing it to the event processing module.

[0178] At step 906, the event processing module may generate a risk score and/or anticipated network outcome for a particular candidate processor computer based on the data received from the front end processing module. This may involve the use of one or more machine learning models, such as an ensemble adaptive unscented Kalman filter, a sequential graph learner, and/or a short-term profile neural network it may involve identifying motifs from the message features and determining network outcomes or risk scores associated with those motifs, using the network outcomes, risk scores, and message features as an input to the short term profile neural network, producing risk scores and probabilities, and remove noise and combine the risk scores and probabilities using the ensemble adaptive unscented Kalman filter Any

combination of these techniques may form the analytical model that can be used to determine if a candidate processor computer can process one or more data elements in the message from the client computer.

[01 9] At step 908, the event processing module may determine whether or not there is an impending network issue. In some embodiments, this may be based on an ensemble Prim’s algorithm. An impending network issue may comprise a predicted undesirable network outcome, such as an increase in latency or processor computer failure. This may be accomplished by evaluating the risk score and comparing it to a cutoff or threshold value established using the ensemble of adaptive Prim’s algorithms, the event processing module may transmit all appropriate data, including message features and network outcome to the rules module, and the process flow may proceed to step 910. Otherwise, if the risk score does not exceed the cutoff value, and no impending network issue is expected, process flow may proceed to step 912. [0180] At step 912 the system may continue normal operation, given that there is no impending network issue to address. This may comprise transmitting the current message to a pre-determined processor computer, and then returning to step 902 to receive and evaluate another message. [0181] At step 910 the rules module may evaluate whether a network issue is imminent. This may substantially involve evaluating a decision tree, and determining whether or not to throttle messages, reroute messages, or alert a network administrator. In effect, the rules module may determine to reroute messages or alter a network administrator if a network issue is imminent, and may throttle otherwise if the rules module determines that a network issue is imminent, process flow may proceed to step 914, otherwise, process flow may proceed to step 916. in either case, ail necessary data, including network outcomes and message features may be transmitted to the throttling and routing module.

[0182] At step 914, the rules module may decide whether or not to reroute network traffic or alert an administrator in this case, the rules module may pass its decision to the throttling and routing module which may update the associated processor rules in the processor rules database. The message queue may use these rules in order to switch the destination of messages sent to an associated processor, or may generate and transmit a message to an administrator. [0183] At step 916, the throttling and module may evaluate whether or not messages sent to a processor computer or processor computers are currently being throttled. If messages are currently being throttled, the process flow may proceed to step 922, otherwise process flow may proceed to step 918.

[0184] At step 918, the throttling and routing module may evaluate the received data and determines any next steps, e.g., whether or not to throttle or reroute messages in the message queue.

[0185] At step 920, the throttling and routing module may evaluate whether or not to throttle messages based on its evaluation of the rules and decision produced by the rules module and all associated data if the throttling and routing module decides to throttle messages, process flow may proceed to step 922, otherwise process flow may proceed to step 924.

[0186] At step 924, the throttling and routing module, using the rules and decisions produced by the rules module may either decide to continue as usual (i.e. , route the message to its predetermined destination, and receive another message at step 902), or reroute messages to an alternative processor computer.

[0187] At step 924, the throttling and routing module determines the throttle level, this process is discussed below with reference to FIG. 10.

[0188] At step 926, after determining the appropriate throttle level, the throttling and routing module may update the processor rules in the processor rules database, such that the message queue can throttle or reroute messages in order to prevent undesirable network outcomes.

[0189] A process performed by the throttling and routing module can be understood with reference to FIG. 10. [0190] At step 1002, the throttling and routing module may receive message features, risk scores, short-term history, and any decisions from the rules module.

[0191] At step 1004, the throttling and routing module may determine whether or not a particular processor computer or processor computers associated with the received data is currently being throttled. If so, process flow proceeds to step 1006, otherwise process flow proceeds to step 1008.

[0192] At step 1006, the throttling and routing module may determine whether or not network outcomes have improved, this may involve evaluating the results of the double exponential smoothing algorithm and determining whether or not the smoothed response time series and its associated trend are reducing or increasing in value. If the network outcomes have not improved, process flow may proceed to step 1014.

Otherwise, if network outcomes have improved, process flow may proceed to step 1008. [0193] At step 1008, the throttling and routing module may determine whether or not network outcomes are within performance boundaries. This may involve evaluating the results of the double exponential smoothing algorithm and determining whether the current or projected value of the smoothed response time series is less than a particular performance boundary. As an example, if the performance boundary is 300

milliseconds , the throttling and routing module may determine that the response time network outcome is within a performance boundary, in which case process flow may proceed to step 1010, otherwise process flow may proceed to step 1016.

[0194] At step 1010, the throttling and routing module may determine whether or not the network outcomes, such as response time, have meet the cessation criteria.

This criteria, boundary, or predetermined parameter may define an acceptable network outcome. A processor computer achieving this network outcome may not need to be throttled. For example, the cessation criteria may be a response time less than 150 milliseconds in this case, if the current or projected value of the smoothed response time series is less than 150 milliseconds, the cessation criteria has been met. if the cessation criteria has been met, process flow may proceed to step 1018, otherwise process flow may proceed to step 1020.

[0195] The difference between a“cessation criteria” and a performance boundary may be understood with reference to the following examples and description.

[0196] A cessation criteria generally refers to the criteria under which throttling may be stopped. As an example, a given network outcome may have an expected or mean value and a standard deviation. The average response time for a particular candidate processor computer may be 100ms, with a standard deviation of 20ms. The cessation criteria for that network outcome and that particular candidate processor computer may be 120ms or less, indicating that once the response time is within one standard deviation of the expected response time, throttling may stop.

[0197] A performance boundary generally refers to a level of performance associated with an improvement in network outcomes. Ideally, network outcomes would improve instantaneously. In reality, network outcomes improve over a period of time. A performance boundary may be used in order to adjust the throttle rate such that network outcomes improve at an acceptable rate. As an example, at some initial time, the response time associated with a given candidate processor computer may be 1000ms. The throttling and routing module may evaluate the response time associated with the candidate processor computer every five minutes. A first performance boundary may be set at 800ms. If the response time reduces to 800ms or less within the five minute increment, the network outcome (response time) is within the

performance boundary. If the response time does not reduce within the five minute increment, the network outcome is not within the performance boundary, and the throttling may be increased as a result. Essentially, a performance boundary relates to acceptable performance improvements while messages are being throttled and a cessation criteria relates to a performance level at which throttling may be stopped.

[0198] At step 1012, the throttling and routing module may determine the initial throttle methods based on received data. This may involve comparing current network outcomes to historical network outcomes to determine a deviation from normal or expected operation. For example, if historically, a particular processor computer achieves a roughly static smoother response time series centered around 200 milliseconds, the initial throttle method may be determined based on the deviation from that value. As an example, a higher throttle level may be used if the current value of the smoothed response time series is 1000 milliseconds, and a lower throttle level may be used if the current value of the smoothed response time series is 400 milliseconds.

From step 1012, process flow may proceed to step 1020.

[0199] At step 1014, the throttling and routing module may choose the next best method in order to address one or more unacceptable network outcomes. This may involve increasing the throttle level or rerouting to another processor computer, based on the rules provided by the rules module, and the throttling and routing neural network output or the smoothed response time series produced by the double exponential smoothing algorithm. From step 1014, process flow may proceed to step 1020.

[0200] At step 1018, the throttling and routing module may increase the throttling rate one cycle if the throttling rate is represented as a percentage, this may involve increasing the throttling rate by a predetermined percentage, e.g., 5%. if the throttling rate is represented as a number of cycles in which messages remain in the message queue before being transmitted to a particular processor computer or processor computers, this number of cycles could increase. For example, if messages are currently being held in the queue for three cycles before being transmitted to the processor computer, they may be held in the queue for four cycles instead. Still further, the throttle rate could be increased by one processor cycle. From step 1018, process flow may proceed to step 1020.

[0201] At step 1018, the throttling and routing module, upon determining that network outcomes have met the cessation criteria, may set the throttle to zero cycles. From step 1018, process flow may proceed to step 1020.

[0202] At step 1020, the throttling and routing module may update the processor rules in accordance with its determination in steps 1002-1018. This may involve setting an initial throttle level, indicating how messages should be routed, increasing the throttle level by a number of cycles, increasing the throttle level by a single cycle, or decreasing the throttle level to zero.

[0203] To summarize, FIG. 9 illustrates a method by which the automated fault detecting control system evaluates incoming messages using a number of modules, Al models, and signal processing models in order to determine whether to throttle messages, reroute messages, or alert an administrator. FIG. 10 illustrates a method by which the throttling and routing module established a throttling rate, modifies the throttling rate, or reroutes messages and updates the processor rules database.

[0204] FIG. 1 1 shows a method for routing messages according to some embodiments of the invention.

[0205] FIG. 1 1 shows one or more client computers 1 102, a server computer 1 104, one or more candidate processor computers 1 106, and other processor computers 1 108.

[0206] The client computers 1 102 may be substantially similar to client computer A 302 to client computer Z 304 from FIG. 3. They may compromise a processor and a computer readable medium, and may send messages to, and receive messages from a server computer 1 104. The client computers 1 102 may comprise any number of devices, such as web servers, laptop computers, desktop computers, PDAs, smart phones, smart watches, videogame consoles, tablets, and the like.

[0207] The server computer 1 104 may be substantially similar to the automated fault detecting control system 310 from FIG. 3. The server computer 1 104 may comprise a processor and a computer readable medium, and may possess code or instructions on the computer readable medium to perform any one of the functions described above, such as receiving messages from one or more client computers 1 102, storing messages in a message queue, extracting message features from those messages, and using a multi-stage machine learning model to predict network outcomes and throttle or reroute messages based on those features.

[0208] The candidate processor computers 1 106 may be substantially similar to some of processor computers A 306 through Z 308. The candidate processor computers 1 106 may be able to process messages according to a number of criteria. For example, a candidate processor computer 1 106 may be able to analyze messages and determine a threat score associated with those messages. This threat score could relate to how likely the message contains fraudulent information or was generated for a fraudulent purpose. As an example, the candidate processor computer 1 106 could evaluate an email message and determine the likelihood that the email is part of a phishing attack. As another example, the candidate processor computer 1 106 could evaluate a transactional message (such as a message sent as part of a financial transaction) and determine the likelihood that the message is part of a legitimate or fraudulent transaction

[0209] Different candidate processor computers 1 106 may perform different message processing functions. As explained above, a candidate processor computer 1 106 may evaluate transaction messages for fraud. Another candidate processor computer 1 106 may enact financial transactions between entities based on received messages, for example, a computer operated by an issuer bank. Another candidate processor computer 1 106 may evaluate or infer behavioral data based on received messages. For example, a candidate processor computer 1 106 may receive messages in the form of a collection of mouse dicks and infer behavioral information based on those mouse dicks.

[0210] In some cases, the messages received from the candidate processor computers 1 106 may not directly correspond to the messages transmitted from the client computers 1 102. For example, the server computer 1 104 could receive a number of messages from the client computers 1 102, and generate a number of subsequent messages based on the received messages. These subsequent messages may comprise collections of information received as part of the messages received form the client computers 1 102. For example, a message received from a client computer 1 102 relating to a transaction may contain information such as location, time, mouse movements and dick events, previously visited sites, personally identifying information, etc. The server computer may separate these messages into a number of subsequent messages, each containing collections of the above data. For example, a first subsequent message could comprise location, time data, and previously visited sites. A second subsequent message may contain information such as mouse movements, click events, and personally identifying information. The first subsequent message and the second subsequent message may be routed to the same or different candidate processor computers 1 106 based on the nature of each candidate processor computer 1 106 in question. Some candidate processor computers 1 106 may be better suited to process messages containing certain information and as a result, subsequent messages of that variety may be transmitted to those candidate processor computers 1 106 for processing..

[0213] Predetermined parameters may include any number of measureable or inferable aspects of message processing. As an example, a predetermined parameter could relate to a network outcome, such as response time. The server computer 1 104 could transmit messages to the candidate process computers 1106 provided that the candidate processor computers 1 106 can respond to those messages in under 300ms.

If the candidate processor computers 1 106 cannot respond to those messages in that time frame, the server computer may select a number of other processor computers 1 108 and reroute messages to those processor computers.

[0214] The server computer 1 104 may determine whether the candidate processor computer 1 106 can process messages according to predetermined parameters using any number of the methods and techniques discussed above. For example, the server computer could comprise an AFDCS and use an artificial intelligence and signal processing model such as the AISPM in order to determine whether a candidate processor computer 1 106 can process messages according to predetermined parameters (e.g., receiving messages, extracting message features, performing data orchestration, applying those features as inputs to an ensemble machine learning model, generating a score and providing it to a rules model, wherein the rules module determines whether to throttle or reroute messages, etc.) The act of rerouting messages may be substantially similar to transmitting messages to a number of other processor computers 1 108 if the candidate processor computers are unable to process those messages according to predetermined parameters.

[0216] At step S1 1 10, the server computer 1 104 receives a plurality of messages from one or more client computers 1102. These messages may comprise a number of different forms. For example, a message could comprise an email message, and may contain information such as a sending address, a receiving address, a subject, a message body, and a number of attachments. As another example, a message could be a client-server application message, and comprise a number of TCP or UDP packets. As another example, a message could comprise a transaction message, and may comprise transactional information, such as an account number, a merchant ID, a transaction amount, and any other contextual transaction information, such as the location of the transaction or the time the transaction took place.

[0216] At step S1 1 12, the server computer 1 104 may determine at least one candidate processor computer 1 108. This determination may involve selecting one or more candidate processor computers 1 108 from a list of candidate processor computers 1 106 stored in memory or another database, either internal or external. The server computer 1 104 may communicate with these candidate processor computers 1 108 as part of processing messages received from the client computers 1 102, or processing subsequent messages generated by the server computer 1 104. The server computer 1 104 may determine the candidate processor computers 1 106 based on a logical or analytical model. For example, the server computer 1 104 may determine the candidate processor computers based on their past performance in processing messages. This past performance may be related to network outcomes or predetermined parameters.

As an example, the server computer 1 104 may determine candidate processor computers 1 106 that processed data elements in historical messages with the lowest response time, or with the lowest packet loss.

[0217] At step S1 1 14, the server computer 1 104 determines if the at least one candidate processor computer can process data elements from the plurality of messages according to predetermined parameters using an analytical model build using a graph learning algorithm. This analytical model build using a graph learning algorithm may be substantially similar to the AiSPM analytical model described above. It may be implemented using a number of hardware or software modules, such as a front end processing model, a model builder model, an event processing model, a rules module, and a throttling and routing module, as shown in FIG. 3. It may make use of a graph learner algorithm such as a sequential graph learner that is able to extract data elements (e.g., message features as described above, or other data explicitly provided with the message). The analytical model may take into account data from a short term history database, a historical database, and/or an external database either as part of training the analytical model or the analysis procedure.

[0218] The predetermined parameters may relate or correspond to network outcomes among other parameters. These may include latency, message throughput, queuing delay, packet loss, and or quality of service metrics. For example, a

predetermined parameter may relate to a predetermined value related to a given network outcome, such as a total response time of 300ms. The server computer may use an analytical model to determine whether or not the at least one candidate processor computer 1 104 can process data elements from the messages in less than 300ms. As another example, the predetermined parameter may relate to a proportional network outcome. The predetermined parameter could be“within two standard deviations of the expected response time” As another example, the predetermined parameter could be a parameter determined by either a model builder module or rules model as part of a training process, generated from historical data, or provided from an external database. A predetermined parameter could directly relate to the capability of the one or more candidate processor computers 1 106 to process messages. For example, a given candidate processor computer 1 106 may not be able to process messages containing location data such as GPS coordinates. In this case, a

predetermined parameter might relate to a binary determination, such as“can the at least one candidate processor computers 1 106 process GPS coordinates?” if the candidate processor computer cannot process GPS coordinates, they cannot process data elements in the messages (the GPS coordinates) according to predetermined parameters, and the server computer 1 104 may make a determination as such

[0219] A data element may generally refer to data either contained within or inferred from a message. For example, a data element may be a message feature as described above. Data elements may include information such as the geographic location from which the message was sent, the IP address of the device sending the message, a trace route, the number of hops between the message’s source and its destination, any information contained in the message (such as a merchant identifier or account number in the case of a transactional message), etc. Historical data elements may be used by the server computer as part of training an analytical model or models, such as the analytical model built using a graph learning algorithm.

[0220] Step S1 1 14 or other steps listed above or below may additionally comprise determining a throttle rate according to predetermined parameters using an analytical model built using a neural network. This may be accomplished using one of the techniques or methods described above, with reference to the AFDCS 310 of FIG. 3 or its component modules. It may involve a method such as the method of FIG. 10, where an initial throttle rate is determined based on data received by the rules module or the throttling and routing model. Further, when the messages are sent to the candidate processor computers 1 106 or other processor computers 1 108 (see steps S1 1 16 and S1 120 below), those messages may be sent at the throttle rate. [0221] Step S1 1 14 may be accomplished through the use of the AFDCS and

AISPM described above. The server computer may store the received messages in a message queue, such as the message queue 312 from FIG. 3. it may use a front end processing module, such as front end processing module 316 from FIG. 3 or front end processing module 404 from FIG 4 The front end processing module may prepare the data for use by other modules by extracting message features (data elements) from messages in the message queue, perform data normalization or orchestration, and collect data from a short term history database, a historical database, and an external database. This data may be provided to a model builder module (see FIG. 5), a rules module (see FIG. 8) and an event processing module (see FIG. 5). The model builder module may generate Kalman filter parameters, (See section i above), a motif database based on a sequential graph (see section ii and FIG. 6) and a short term profile neural network (see section ill). The parameters for these three models may be stored in respective databases (see 506-510 from FIG. 5) and used by the event processing module as part of the determination of step S1 1 14. [0222] The event processing module (see FIG. 3), may construct use the stored parameters to forecast network outcomes using the Kalman filter and motif database (see sections i-ii), and generate a risk score using the short term profile neural network (see section iii). Further, the server computer may use an adaptive Prim’s algorithm to determine a threshold or cut-off value (see FIG. 7) that may be used to trigger the determination of step S1 1 14. The event processing module may pass all appropriate data, and risk scores to a rules module (see FIG. 8). The rules module may generate and apply rules that govern whether the server computer throttles or reroutes messages (i.e., how the server computer determines if candidate processor computers 1 106 can process messages according to predetermined parameters). The rules module may make use of an evolutionary learner, such as ant colony optimization to produce an optimal determination. The rules module may pass all appropriate rules and data to the throttling and routing module, The throttling and routing module may determine based on those rules and data whether or not to throttle or reroute messages (i.e., determine if the candidate processor computers 1 106 can process messages according to

predetermined parameters.) The throttling and routing module may make use of either of the processes described with reference to FIGs. 9-10 as part of determining if the candidate processor computers 1 106 can process data elements from the plurality of messages.

[0223] A throttle rate may be expressed as a percentage of normal rate if messages are usually sent at some rate (e.g., ten messages a second), a throttle rate of 50% may correspond to half that rate (i.e., five messages per second). A throttle rate may also be expressed as a raw or nominal quantity of messages. For example, a“no” throttle rate may be defined as ten messages per second, and a“low” throttle rate could correspond to eight messages per second, a“medium” throttle rate could correspond to five messages per second and a“high” throttle rate could correspond to three

messages per second. Alternatively, messages may be held by the server computer 1 104 for a specific number of processing or sending cycles before being sent to the candidate processor computers 1 106 or other processor computers 1 108. For example, the server computer could receive some number of messages every 10ms (a single cycle), hold those messages in a structure such as the message queue for 20 ms (two cycles) and then transmit them to either the candidate processor computers 1106 or other processor computers 1 108. In this case, throttling may involve increasing the number of“cycles” that a message is held in the message queue before transmission. For example, a message could be held for two additional cycles (20 ms, for a total hold time of 40ms) as part of a throttling process.

[0224] At step S1 1 16, if the server computer 1 104 determines that one or more processor computers of the at least one candidate processor computer 1 106 (i.e., some set or subset) can process the data elements according to the predetermined

parameters, the server computer 1 104 transmits the plurality of messages or other messages derived from the plurality of messages to the one or more candidate processor computers 1106. This transmission may involve any suitable means or transmission protocol. For example, the server computer 1 104 may transmit the plurality of messages or other messages derived from the plurality of messages to the one or more candidate processor computers 1 106 via a network such as the internet or a cellular communications network. These messages may be transmitted via a wired interconnection (e.g., USB, FireWire, I2C, etc.) or a wireless interconnection (such as BlueTooth, BlueTooth Low Energy, Zigbee, Wi-Fi, etc.)

[0225] Other messages derived from the plurality of messages may include any number of messages processed or generated by the server computer 1 104 before being transmitted to the candidate processor computers 1 106. For example, this processing may involve extracting data elements or message features from the messages and formulating new messages (i.e., the other messages) based on those data elements. For example, a given candidate processor computer may be effective at processing data elements to determine fraud scores given GPS data and network data (such as number of hops, source IP address, traceroute, etc.), but may not be able to process information such as mouse tracking data. As such, the server computer 1 104 may take a message containing GPS data, network data, and mouse tracking data received from the one or more client computers 1 102 and generate two other messages, one containing the GPS data, and network data, and the other containing the mouse tracking data. The server computer 1 104 may send the first of these other messages to a candidate processor computer capable of processing the GPS data and network data according to predetermined parameters, and send the second other message to a different candidate processor computer capable of processing the mouse tracking data according to predetermined parameters. [0226] These other messages may also be referred to as subsequent messages.

In some cases, the process detailed in FIG. 1 1 additionally comprises the server computer generating two or more subsequent messages with data elements for transmission to two or more other processor computers and transmitting those messages to the other processor computers. [0227] At step S1 1 18, if the server computer 1 104 determines that the one or more candidate processor computers of the at least one candidate processor computer (e.g., some set or subset of all candidate processor computers) cannot process data elements according to the predetermined parameters, the server computer S1 1 18 may select one or more other processor computers 1 108 to process the data elements according to predetermined parameters. These other processor computers 1 108 may be substantially similar to the candidate processor computers 1 106, and may be able to individually or collectively process data elements and message features from the plurality of received messages.

[0228] As an example, the server computer 1 104 may have received a transactional message among the plurality of messages received from the one or more client computers 1 102. The server computer 1 104 may analyze if the candidate processor computers 1106 can process the data elements in that message within a given timeframe, such as 300ms if not, the server computer selects one or more other processor computers 1108 that can process the data elements in that message within that given timeframe. As another example, the server computer 1 104 may determine that none of the candidate processor computers 1 106 can process data elements relating to, or derived from that type of message (i.e. , data elements from a

transactional message). In that case, the server computer 1 104 may determine that the candidate processor computers 1 106 cannot process data elements according to predetermined parameters and may select a number of other processor computers 1 108 to process data elements according to those parameters.

[0229] At step S1 120, the server computer 1 104 transmits the plurality of messages or other messages derived from the plurality of messages to the one or more other processor computers 1 108 for processing. This transmission may occur over any suitable means and using any suitable protocol. Once the messages are transferred to either the candidate processor computers 1 106 (as in step S1 1 16), or the other processor computers 1108 (as in step S1 120), the selected processor computers can process the messages according to predetermined parameters.

[0230] The flow diagram described in FIG. 1 1 may additionally comprise receiving, by the server computer 1 104, a plurality of subsequent messages from one or more client computers 1 102. These messages may be similar in form to the plurality of messages received by the server computer 1 104 in S11 10. For example, if the server computer 1 104 receives transactional messages from the client computers 1 102 in step S1 1 10, the plurality of subsequent messages may be transactional messages corresponding to a different set of transactions. In other cases, the subsequent messages may be entirely or partially different from the plurality of messages received in step S11 10.

[0231] Additionally, the server computer 1 104 may receive a plurality of responses from the one or more candidate processor computers 1 106 or the one or more other processor computers 1 108. These responses may comprise information relating to the processed messages. For example, a given candidate processor computer may process mouse movement data messages or data elements in those messages in order to produce a behavioral score. A response could comprise the behavioral score produced by that processor computer. Alternatively or additionally, a response could include information relating to network outcomes or other quality of service measurements. For example, a processor computer could transmit a response stating that it took some amount of time (e.g., 300ms) to process the data elements in that message. Additionally, the server computer 1 104 may be able to infer network outcomes from the response. For example, the server computer 1 104 could keep track of the time difference between when it transmitted the plurality of messages or other messages to either the candidate processor computers 1 106 or other processor computers 1 108 and when it received a response.

[0232] The server computer 1 104 may determine a risk score of the one or more other processor computers 1 108 based on the plurality of responses using an artificial neural network. The“risk score” may correspond to an immediate or short term measure of how well the other processor computers 1 108 processed messages based on the plurality of responses. For example, if the other processor computers 1 108 were selected to process messages according to a predetermined parameter such as a packet loss rate of less than 1 %, and the responses indicate that the messages were processed with a packet loss rate of 2%, the risk score could indicate that there is a risk of underperformance associated with the other processor computers 1 108. The risk score could also be used by the server computer 1 104 in order to re-eva!uate and adjust a throttling or rerouting policy determined by the rules module or throttling and routing module, e.g., as described above with reference to FIGs. 9-10. For example, the risk score could be used to determine whether network outcomes have improved and whether throttling or rerouting messages should be continued.

[0233] Further, the server computer 1 104 may transmit the plurality of subsequent messages or other messages derived from the one or more messages to the one or more candidate processor computers 1 106 based on the risk score. In effect, rerouting messages from the other processor computers 1 108 to the candidate processor computers 1 106 if the risk score is too high. For example, a high risk score could correspond to an imminent impending network issue, and messages that would be sent to the one or more other processor computers 1 108 may be sent to the candidate processor computers 1 106, either to prevent the impending network issue or mitigate its effects.

[0234] The process described in FIG. 10 may additionally comprise determining, by the server computer 1 104, based on the plurality of messages, a plurality of messages events. The process may also include generating, by the server computer 1 104, a graph describing sequential relationships between message events of the plurality of message events, wherein a plurality of nodes from the graph are message events and a plurality of edges from the graph are sequential relationships between messages events. The process may also include Identifying, by the server computer, a plurality of event sequences and a plurality of associated network outcomes using a graph learning algorithm, and comparing, by the server computer, the plurality of associated network outcomes against the predetermined parameters.

[0235] These additional process steps may be better understood with reference to FIG. 5 and the section on sequential graph learners above. In effect, the server computer may determined message events from the plurality of messages. A message event may be information or data associated with a message or message features. For example, a message event may comprise“receiving a message from New York,” or “receiving a message that is 20 bytes long.” These message events may be used to produce a sequential graph that relates message events according to observed or predicted sequencing. This sequential graph may comprise a number of nodes, each corresponding to a message event or a collection of message events that are connected by directed or undirected edges with other events or collections of events. This sequential graph may be used by a graph learning algorithm to determine or quantify associations between events. For example, the graph learning algorithm may determine that an event“B” is frequently observed after an event“A,” and may learn of an association between these two events. Once the graph learning algorithm learns of an association between events, it may match those event sequences, or motifs, to network outcomes. This plurality of associated network outcomes may be compared against predetermined parameters as part of determining whether the candidate processor computers are able to process data elements according to those

predetermined parameters.

[0236] For example, via the graph learning algorithm, the server computer 1 104 may identify a strong or highly probable event sequence, and use this event sequence to determine an associated network outcome, such as“response time increases to 400ms." This network outcome could be compared to a predetermined parameter, such as“response time less than 300ms.” As the predicted network outcome is greater than the predetermined parameter, the server computer 1104 may determine that the candidate processor computers 1 106 cannot process data elements in the messages according to predetermined parameters.

[0237] The process detailed in FIG. 1 1 may additionally comprise generating, by the server computer, a graph relating the predetermined parameters with an ability of the at least one candidate processor computer to process the data elements and generating, by the server computer, an optimal determination using an ant colony optimization algorithm applied to the graph, the optimal determination indicating if the at least one candidate processor computer is able to process the data elements according to predetermined parameters.

[0238] The use of ant colony optimization to determine an optimal course of action given a set of rules may be better understood with reference to the description of ant colony optimization above. In general, the graph relating the predetermined parameters with the ability of the candidate processor computers 1 106 to process data elements may have edge weights which relate to the relative desirability of a course of action. A collection of computational agents, or ants, may iterate through the graph and determine the highest scoring path. This path may be interpreted as a course of action, such as“throttle by 15%^’’ or reroute to other processor servers 1 108. This optimal determination may also correspond to a determination as to whether the candidate processor computers can process the data elements according to the predetermined parameters.

[0239] Additionally, the process may further involve determining, by the server computer 1 104, a cutoff value based on an adaptive Prim’s algorithm and determining, by the server computer and based on the analytical model, a trigger value, wherein the step of determining if the one or more candidate processor computers can process data elements form the plurality of messages (step S1 1 14) occurs if the trigger value exceeds the cutoff value

[0240] The use of Prim’s algorithm can be better understood with reference to the discussion on Prim’s algorithm above. This may involve modifying or parsing a graph in order to produce a reduced graph. This reduced graph may be further interpreted to determine what average or expected network outcomes are under normal operation. This average or expected network outcome level can be used as a baseline to determine what constitutes atypical operation. For example, the adaptive Prim’s algorithm could be used to determine a typical response time that could be used to establish a cutoff response time (such as 50% higher than typical).

[0241] This cutoff value can be used in order to determine whether the AFDCS needs to take action, e.g., throttle or reroute messages. The server computer can determine a trigger value that may relate to current or predicted network outcomes. If the trigger value exceeds the cutoff value, the AFDCS may throttle or reroute messages in order to prevent or mitigate predicted network outcomes.

[0242] Any of the computer systems mentioned herein may utilize any suitable number of subsystems in some embodiments, a computer system includes a single computer apparatus, where the subsystems can be components of the computer apparatus. In other embodiments, a computer system can include multiple computer apparatuses, each being a subsystem, with internal components.

[0243] A computer system can include a plurality of the components or subsystems, e.g., connected together by external interface or by an internal interface.

In some embodiments, computer systems, subsystems, or apparatuses can

communicate over a network in such instances, one computer can be considered a client and another computer a server, where each can be part of a same computer system. A client and a server can each include multiple systems, subsystems, or components.

[0244] It should be understood that any of the embodiments of the present invention can be implemented in the form of control logic using hardware (e.g., an application specific integrated circuit or field programmable gate array) and/or using computer software with a generally programmable processor in a modular or integrated manner. As used herein a processor includes a single-core processor, multi-core processor on a same integrated chip, or multiple processing units on a single circuit board or networked. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement embodiments of the present invention using hardware and a combination of hardware and software.

[0245] Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C, C++, C#, Objective-C, Swift, or scripting language such as Perl or Python using, for example, conventional or object- oriented techniques. The software code may be stored as a series of instructions or commands on a computer readable medium for storage and/or transmission, suitable media include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk), flash memory, and the like. The computer readable medium may be any combination of such storage or transmission devices.

[0246] Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the internet. As such, a computer readable medium according to an embodiment of the present invention may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via internet download). Any such computer readable medium may reside on or within a single computer product (e.g a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network. A computer system may include a monitor, printer or other suitable display for providing any of the results mentioned herein to a user.

[0247] Any of the methods described herein may be totally or partially performed with a computer system including one or more processors, which can be configured to perform the steps. Thus, embodiments can be directed to computer systems configured to perform the steps of any of the methods described herein, potentially with different components performing a respective steps or a respective group of steps. Although presented as numbered steps, steps of methods herein can be performed at a same time or in a different order. Additionally, portions of these steps may be used with portions of other steps from other methods. Also, ail or portions of a step may be optional. Additionally, and of the steps of any of the methods can be performed with modules, circuits, or other means for performing these steps.

[0248] The specific details of particular embodiments may be combined in any suitable manner without departing from the spirit and scope of embodiments of the invention. However, other embodiments of the invention may be directed to specific embodiments relating to each individual aspect, or specific combinations of these individual aspects. The above description of exemplary embodiments of the invention has been presented for the purpose of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form described, and many modifications and variations are possible in light of the teaching above. The

embodiments were chosen and described in order to best explain the principles of the invention and its practical applications to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated.

[0249] A recitation of“a”,“an” or“the^” is intended to mean“one or more” unless specifically indicated to the contrary. The use of“or” is intended to mean an“inclusive or,” and not an“exclusive or” unless specifically indicated to the contrary.

[0250] Ail patents, patent applications, publications and description mentioned herein are incorporated by reference in their entirety for all purposes. None is admitted to be prior art.

Claims

WHAT IS CLAIMED IS:

1. A method comprising:

receiving, by a server computer, a plurality of messages from one or more client computers;

determining, by the server computer, at least one candidate processor computer;

determining, by the server computer, if the at least one candidate processor computer can process data elements from the plurality of messages according to predetermined parameters using an analytical model built using a graph learning algorithm;

if the server computer determines that one or more candidate processor computers of the at least one candidate processor computer can process the data elements according to predetermined parameters, then transmitting, by the server computer, the plurality of messages or other messages derived from the plurality of messages to the one or more candidate processor computers; and

if the server computer determines that one or more candidate processor computers of the at least one candidate processor computer cannot process the data elements according to the predetermined parameters, then selecting, by the server computer, one or more other processor computers to process the data elements according to the predetermined parameters, and then transmitting, by the server computer, the plurality of messages or other messages derived from the plurality of messages to the one or more other processor computers for processing.

2. The method of claim 1 , further comprising:

for each message, generating by the server computer, two or more subsequent messages with the data elements for transmission to two or more other processor computers; and

transmitting, by the server computer, the two or more subsequent messages to the two or more other processor computers.

3. The method of claim 1 , wherein the predetermined parameters comprise at least one of latency, message throughput, queuing delay, packet loss, and/or quality of service metrics.

4. The method of claim 1 , further comprising determining, by the server computer, a throttle rate according to predetermined parameters using an analytical model built using a neural network; and

wherein the plurality of messages or subsequent message derived from the plurality of messages are transmitted, by the server computer, to the at least one candidate processor computer or the one or more other processor computers at the throttle rate.

5. The method of claim 1 , further comprising:

receiving, by the server computer, a plurality of subsequent messages from the one or more client computers;

receiving, by the server computer, a plurality of responses from the one or more other processor computers;

determining, by the server computer, a risk score of the one or more other processor computers based on the plurality of responses using an artificial neural network; and

transmitting, by the server computer, the plurality of subsequent messages or other messages derived from the one or more messages to the one or more candidate processor computers, based on the risk score

6. The method of claim 1 , wherein the analytical model takes into account data from a short term history database and a historical database.

7. The method of claim 1 , further comprising:

determining, by the server computer, based on the plurality of messages, a plurality of message events;

generating, by the server computer, a graph describing sequential relationships between message events of the plurality of message events, wherein a plurality of nodes from the graph are message events and a plurality of edges from the graph are sequential relationships between message events;

identifying, by the server computer, a plurality of event sequences and a plurality of associated network outcomes using a graph learning algorithm; and

comparing, by the server computer, the plurality of associated network outcomes against the predetermined parameters

8. The method of claim 1 , further comprising:

generating, by the server computer, a graph relating the predetermined parameters with an ability of the at least one candidate processor computer to process the data elements; and

generating, by the server computer, an optimal determination using an ant colony optimization algorithm applied to the graph, the optimal determination indicating if the at least one candidate processor computer is able to process the data elements according to the predetermined parameters.

9. The method of claim 1 , further comprising:

determining, by the server computer, a cutoff value using an adaptive Prim^’s algorithm; and

determining, by the server computer and based on the analytical model, a trigger value, wherein the step of determining if the one or more candidate processor computers can process data elements from the plurality of messages, occurs if the trigger value exceeds the cutoff value.

10. The method of claim 1 , wherein the analytical model is informed by data produced from an external database.

1 1. A server computer comprising:

a processor; and

a non-transitory computer readable medium coupled to the processor, the non-transitory computer readable medium comprising code executable by the processor for performing a method comprising:

receiving a plurality of messages from one or more client devices; determining at least one candidate processor computer;

determining if the at least one candidate processor computers can process data elements from the plurality of messages according to predetermined parameters using an analytical model built using a graph learning algorithm;

if one or more candidate processor computers of the at least one candidate processor computer can process the data elements according to

predetermined parameters, transmitting the plurality of messages or other messages derived from the plurality of messages to the one or more candidate processor computers;

if one or more candidate processor computers of the at least one candidate processor computer cannot process the data elements according to the predetermined parameters, then selecting one or more other processor computers to process the data elements according to the predetermined parameters; and

transmitting the plurality of messages or other messages derived from the plurality of messages to the one or more other processor computers for processing.

12. The server computer of claim 1 1 , wherein the method further comprises:

for each message, generating two or more subsequent messages with the data elements for transmission to two or more other processor computers; and

transmitting the two or more subsequent messages to the one or more other processor computers.

13. The server computer of claim 1 1 , wherein the predetermined parameters comprise at least one of latency, message throughput, queuing delay, packet loss, and/or quality of service metrics.

14. The server computer of claim 1 1 , wherein the method further comprises:

determining a throttle rate according to predetermined parameters using an analytical model built using a neural network; and wherein the plurality of messages or subsequent message derived from the plurality of messages are transmitted to the at least one candidate processor computer or the one or more other processor computers at the throttle rate.

15. The server computer of claim 1 1 , wherein the method further comprises:

receiving, by the server computer, a plurality of subsequent messages from the one or more client devices;

transmitting, by the server computer, the plurality of subsequent messages or other messages derived from the one or more messages to the one or more candidate processor computers, based on the risk score.

16. The server computer of claim 1 1 , wherein the analytical model takes into account data from a short term history database and a historical database.

17. The server computer of claim 1 1 , the method further comprising: determining, based on the plurality of messages, a plurality of message events;

generating, a graph describing sequential relationships between message events of the plurality of message events, wherein a plurality of nodes from the graph are message events and a plurality of edges from the graph are sequential relationships between message events;

applying, by the server computer, a graph learning algorithm, the graph learning algorithm identifying a plurality of event sequences and a plurality of associated network outcomes; and

comparing, by the server computer, the plurality of associated network outcomes against the predetermined parameters.

18. The server computer of claim 1 1 , the method further comprising: generating a graph relating the predetermined parameters with an ability of the at least one candidate processor computer to process the data elements; and generating an optimal determination using an ant colony optimization algorithm applied to the graph, the optimal determination indicating if the at least one candidate processor computer is able to process the data elements according to predetermined parameters.

19. The server computer of claim 1 1 , the method further comprising: determining a cutoff value using an adaptive Prim^'s algorithm; and determining, based on the analytical model, a trigger value, wherein the step of determining if the one or more candidate processor computers can process data elements from the plurality of messages, occurs if the trigger value exceeds the cutoff value.

20. The server computer of claim 1 1 , wherein the analytical model is informed by data produced from an external database.