Inductive logic programming (ILP) is a subfield ofsymbolic artificial intelligence which useslogic programming as a uniform representation for examples, background knowledge and hypotheses. The term "inductive" here refers tophilosophical (i.e. suggesting a theory to explain observed facts) rather thanmathematical (i.e. proving a property for all members of a well-ordered set) induction. Given an encoding of the known background knowledge and a set of examples represented as a logicaldatabase of facts, an ILP system will derive a hypothesised logic program whichentails all the positive and none of the negative examples.
Inductive logic programming is particularly useful inbioinformatics andnatural language processing.
Building on earlier work onInductive inference,Gordon Plotkin was the first to formalise induction in aclausal setting around 1970, adopting an approach of generalising from examples.[1][2] In 1981,Ehud Shapiro introduced several ideas that would shape the field in his new approach of model inference, an algorithm employing refinement and backtracing to search for a complete axiomatisation of given examples.[1][3] His first implementation was theModel Inference System in 1981:[4][5] aProlog program that inductively inferredHorn clause logic programs from positive and negative examples.[1] The termInductive Logic Programming was first introduced in a paper byStephen Muggleton in 1990, defined as the intersection of machine learning and logic programming.[1] Muggleton and Wray Buntine introduced predicate invention andinverse resolution in 1988.[1][6]
Several inductive logic programming systems that proved influential appeared in the early 1990s.FOIL, introduced byRoss Quinlan in 1990[7] was based on upgradingpropositional learning algorithmsAQ andID3.[8]Golem, introduced by Muggleton and Feng in 1990, went back to a restricted form of Plotkin's least generalisation algorithm.[8][9] TheProgol system, introduced by Muggleton in 1995, first implemented inverse entailment, and inspired many later systems.[8][10][11]Aleph, a descendant of Progol introduced by Ashwin Srinivasan in 2001, is still one of the most widely used systems as of 2022[update].[10]
At around the same time, the first practical applications emerged, particularly inbioinformatics, where by 2000 inductive logic programming had been successfully applied to drug design, carcinogenicity and mutagenicity prediction, and elucidation of the structure and function of proteins.[12] Unlike the focus onautomatic programming inherent in the early work, these fields used inductive logic programming techniques from a viewpoint ofrelational data mining. The success of those initial applications and the lack of progress in recovering larger traditional logic programs shaped the focus of the field.[13]
Recently, classical tasks from automated programming have moved back into focus, as the introduction of meta-interpretative learning makes predicate invention and learning recursive programs more feasible. This technique was pioneered with theMetagol system introduced by Muggleton, Dianhuan Lin, Niels Pahlavi and Alireza Tamaddoni-Nezhad in 2014.[14] This allows ILP systems to work with fewer examples, and brought successes in learning string transformation programs, answer set grammars and general algorithms.[15]
Inductive logic programming has adopted several different learning settings, the most common of which are learning fromentailment and learning from interpretations.[16] In both cases, the input is provided in the form ofbackground knowledgeB, a logical theory (commonly in the form ofclauses used inlogic programming), as well as positive and negative examples, denoted and respectively. The output is given as ahypothesisH, itself a logical theory that typically consists of one or more clauses.
The two settings differ in the format of examples presented.
As of 2022[update], learning from entailment is by far the most popular setting for inductive logic programming.[16] In this setting, thepositive andnegative examples are given as finite sets and of positive and negatedgroundliterals, respectively. Acorrect hypothesisH is a set of clauses satisfying the following requirements, where the turnstile symbol stands forlogical entailment:[16][17][18]Completeness requires any generated hypothesisH to explain all positive examples, and consistency forbids generation of any hypothesisH that is inconsistent with the negative examples, both given the background knowledgeB.
In Muggleton's setting of concept learning,[19] "completeness" is referred to as "sufficiency", and "consistency" as "strong consistency". Two further conditions are added: "Necessity", which postulates thatB does not entail, does not impose a restriction onH, but forbids any generation of a hypothesis as long as the positive facts are explainable without it. "Weak consistency", which states that no contradiction can be derived from, forbids generation of any hypothesisH that contradicts the background knowledgeB. Weak consistency is implied by strong consistency; if no negative examples are given, both requirements coincide. Weak consistency is particularly important in the case of noisy data, where completeness and strong consistency cannot be guaranteed.[19]
In learning from interpretations, thepositive andnegative examples are given as a set of complete or partialHerbrand structures, each of which are themselves a finite set of ground literals. Such a structuree is said to be a model of the set of clauses if for anysubstitution and any clause in such that, also holds. The goal is then to output a hypothesis that iscomplete, meaning every positive example is a model of, andconsistent, meaning that no negative example is a model of.[16]
Aninductive logic programming system is a program that takes as an input logic theories and outputs a correct hypothesisH with respect to theories. A system iscomplete if and only if for any input logic theories any correct hypothesisH with respect to these input theories can be found with its hypothesis search procedure. Inductive logic programming systems can be roughly divided into two classes, search-based and meta-interpretative systems.
Search-based systems exploit that the space of possible clauses forms acomplete lattice under thesubsumption relation, where one clause subsumes another clause if there is asubstitution such that, the result of applying to, is a subset of. This lattice can be traversed either bottom-up or top-down.
Bottom-up methods to search the subsumption lattice have been investigated since Plotkin's first work on formalising induction in clausal logic in 1970.[1][20] Techniques used include least general generalisation, based onanti-unification, and inverse resolution, based on inverting theresolution inference rule.
Aleast general generalisation algorithm takes as input two clauses and and outputs the least general generalisation of and, that is, a clause that subsumes and, and that is subsumed by every other clause that subsumes and. The least general generalisation can be computed by first computing allselections from and, which are pairs of literals sharing the same predicate symbol and negated/unnegated status. Then, the least general generalisation is obtained as the disjunction of the least general generalisations of the individual selections, which can be obtained byfirst-order syntactical anti-unification.[21]
To account for background knowledge, inductive logic programming systems employrelative least general generalisations, which are defined in terms of subsumption relative to a background theory. In general, such relative least general generalisations are not guaranteed to exist; however, if the background theoryB is a finite set ofgroundliterals, then the negation ofB is itself a clause. In this case, a relative least general generalisation can be computed by disjoining the negation ofB with both and and then computing their least general generalisation as before.[22]
Relative least general generalisations are the foundation of the bottom-up systemGolem.[8][9]
Inverse resolution is aninductive reasoning technique that involvesinverting theresolution operator.
Inverse resolution takes information about theresolvent of a resolution step to compute possible resolving clauses. Two types of inverse resolution operator are in use in inductive logic programming: V-operators and W-operators. A V-operator takes clauses andas input and returns a clause such that is the resolvent of and. A W-operator takes two clauses and and returns three clauses, and such that is the resolvent of and and is the resolvent of and.[23]
Inverse resolution was first introduced byStephen Muggleton and Wray Buntine in 1988 for use in the inductive logic programming system Cigol.[6] By 1993, this spawned a surge of research into inverse resolution operators and their properties.[23]
The ILP systems Progol,[11] Hail[24] and Imparo[25] find a hypothesisH using the principle of theinverse entailment[11] for theoriesB,E,H:. First they construct an intermediate theoryF called a bridge theory satisfying the conditions and. Then as, they generalize the negation of the bridge theoryF with anti-entailment.[26] However, the operation of anti-entailment is computationally more expensive since it is highly nondeterministic. Therefore, an alternative hypothesis search can be conducted using the inverse subsumption (anti-subsumption) operation instead, which is less non-deterministic than anti-entailment.
Questions of completeness of a hypothesis search procedure of specific inductive logic programming system arise. For example, the Progol hypothesis search procedure based on the inverse entailment inference rule is not complete byYamamoto's example.[27] On the other hand, Imparo is complete by both anti-entailment procedure[28] and its extended inverse subsumption[29] procedure.
Rather than explicitly searching the hypothesis graph, metainterpretive ormeta-level systems encode the inductive logic programming program as a meta-level logic program which is then solved to obtain an optimal hypothesis. Formalisms used to express the problem specification includeProlog andanswer set programming, with existing Prolog systems and answer set solvers used for solving the constraints.[30]
And example of a Prolog-based system isMetagol, which is based on ameta-interpreter in Prolog, while ASPAL and ILASP are based on an encoding of the inductive logic programming problem in answer set programming.[30]
Evolutionary algorithms in ILP use a population-based approach to evolve hypotheses, refining them through selection, crossover, and mutation. Methods likeEvoLearner have been shown to outperform traditional approaches on structured machine learning benchmarks.[31]
Probabilistic inductive logic programming adapts the setting of inductive logic programming to learningprobabilistic logic programs. It can be considered as a form ofstatistical relational learning within the formalism of probabilistic logic programming.[34][35]
Given
the goal of probabilistic inductive logic programming is to find a probabilistic logic program such that the probability of positive examples according to is maximized and the probability of negative examples is minimized.[35]
This problem has two variants: parameter learning and structure learning. In the former, one is given the structure (the clauses) ofH and the goal is to infer the probabilities annotations of the given clauses, while in the latter the goal is to infer both the structure and the probability parameters ofH. Just as in classical inductive logic programming, the examples can be given as examples or as (partial) interpretations.[35]
Parameter learning for languages following the distribution semantics has been performed by using anexpectation-maximisation algorithm or bygradient descent.An expectation-maximisation algorithm consists of a cycle in which the steps of expectation and maximization are repeatedly performed. In the expectation step, the distribution of the hidden variables is computed according to the current values of the probability parameters, while in the maximisation step, the new values of the parameters are computed. Gradient descent methods compute the gradient of the target function and iteratively modify the parameters moving in the direction of the gradient.[35]
Structure learning was pioneered byDaphne Koller and Avi Pfeffer in 1997,[36] where the authors learn the structure offirst-order rules with associated probabilistic uncertainty parameters. Their approach involves generating the underlyinggraphical model in a preliminary step and then applying expectation-maximisation.[35]
In 2008,De Raedt et al. presented an algorithm for performingtheory compression onProbLog programs, where theory compression refers to a process of removing as many clauses as possible from the theory in order to maximize the probability of a given set of positive and negative examples. No new clause can be added to the theory.[35][37]
In the same year, Meert, W. et al. introduced a method for learning parameters and structure ofground probabilistic logic programs by considering theBayesian networks equivalent to them and applying techniques for learning Bayesian networks.[38][35]
ProbFOIL, introduced by De Raedt and Ingo Thon in 2010, combined the inductive logic programming systemFOIL withProbLog. Logical rules are learned from probabilistic data in the sense that both the examples themselves and their classifications can be probabilistic. The set of rules has to allow one to predict the probability of the examples from their description. In this setting, the parameters (the probability values) are fixed and the structure has to be learned.[39][35]
In 2011, Elena Bellodi and Fabrizio Riguzzi introduced SLIPCASE, which performs a beam search among probabilistic logic programs by iteratively refining probabilistic theories and optimizing the parameters of each theory using expectation-maximisation.[40] Its extension SLIPCOVER, proposed in 2014, uses bottom clauses generated as inProgol to guide the refinement process, thus reducing the number of revisions and exploring the search space more effectively. Moreover, SLIPCOVER separates the search for promising clauses from that of the theory: the space of clauses is explored with abeam search, while the space of theories is searchedgreedily.[41][35]
This article incorporates text from afree content work. Licensed under CC-BY 4.0 (license statement/permission). Text taken fromA History of Probabilistic Inductive Logic Programming, Fabrizio Riguzzi, Elena Bellodi and Riccardo Zese,Frontiers Media.