Movatterモバイル変換


[0]ホーム

URL:


Next Article in Journal
A Novel Faults Diagnosis Method for Rolling Element Bearings Based on EWT and Ambiguity Correlation Classifiers
Next Article in Special Issue
Can a Robot Have Free Will?
Previous Article in Journal
Ion Hopping and Constrained Li Diffusion Pathways in the Superionic State of Antifluorite Li2O
Previous Article in Special Issue
Cockroach Swarm Optimization Algorithm for Travel Planning
 
 
Search for Articles:
Title / Keyword
Author / Affiliation / Email
Journal
Article Type
 
 
Section
Special Issue
Volume
Issue
Number
Page
 
Logical OperatorOperator
Search Text
Search Type
 
add_circle_outline
remove_circle_outline
 
 
Journals
Entropy
Volume 19
Issue 5
10.3390/e19050230
Font Type:
ArialGeorgiaVerdana
Font Size:
AaAaAa
Line Spacing:
Column Width:
Background:
Article

Specific and Complete Local Integration of Patterns in Bayesian Networks

1
Araya Incorporation, 2F Mori 15 Building, 2-8-10 Toranomon, Minato-ku, Tokyo 105-0001, Japan
2
School of Computer Science, University of Hertfordshire, Hatfield AL10 9AB, UK
3
Department of General Systems Studies, University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan
*
Author to whom correspondence should be addressed.
Entropy2017,19(5), 230;https://doi.org/10.3390/e19050230
Submission received: 19 March 2017 /Revised: 11 May 2017 /Accepted: 12 May 2017 /Published: 18 May 2017
(This article belongs to the Special IssueComplexity, Criticality and Computation (C³))

Abstract

:
We present a first formal analysis of specific and complete local integration. Complete local integration was previously proposed as a criterion for detecting entities or wholes in distributed dynamical systems. Such entities in turn were conceived to form the basis of a theory of emergence of agents within dynamical systems. Here, we give a more thorough account of the underlying formal measures. The main contribution is the disintegration theorem which reveals a special role of completely locally integrated patterns (what we callι-entities) within the trajectories they occur in. Apart from proving this theorem we introduce the disintegration hierarchy and its refinement-free version as a way to structure the patterns in a trajectory. Furthermore, we construct the least upper bound and provide a candidate for the greatest lower bound of specific local integration. Finally, we calculate theι-entities in small example systems as a first sanity check and find thatι-entities largely fulfil simple expectations.

    1. Introduction

    This paper investigates a formal measure and a corresponding criterion we developed in order to capture the notion ofwholes orentities within Bayesian networks in general and multivariate Markov chains in particular. The main focus of this paper is to establish some formal properties of this criterion.
    The main intuition behind wholes or entities is that combinations of some events/phenomena in space(-time) can be considered as more of asingle or coherent “thing” than combinations of other events in space(-time). For example, the two halves of a soap bubble (The authors thank Eric Smith for pointing out the example of a soap bubble.) together seem to form more of a single thing than one half of a floating soap bubble together with a piece of rock on the ground. Similarly, the soap bubble at timet1 and the “same” soap bubble att2 seem more like temporal parts of thesame thing than the soap bubble att1 and the piece of rock att2. We are trying to formally define and quantify what it is that makes some spatially and temporally extended combinations of parts entities but not others.
    We envisage spatiotemporal entities as a way to establish not only the problem ofspatial identity but also that oftemporal identity (also calledidentity over time [1]). In other words, in addition to determining which events in “space” (e.g., which values of different degrees of freedom) belong to the same structure spatiotemporal entities should allow the identification of the structure at a timet2 that is the future (or past ift2<t1) of a structure at timet1. Given a notion of identity over time, it becomes possible to capture which things persist and in what way they persist. Without a notion of identity over time, it seems persistence is not defined. The problem is how to decide whether something persisted fromt1 tot2 if we cannot tell what att2 would count as the future of the original thing.
    In everyday experience problems concerning identity over time are not of great concern. Humans routinely and unconsciously connect perceived events to spatially and temporally extended entities. Nonetheless, the problem has been known since ancient times, in particular with respect to artefacts that exchange their parts over time. A famous example is the Ship of Theseus which has all of its planks exchanged over time. This leads to the question whether it is still thesame ship. From the point of view of physics and chemistry living organisms also exchange their parts (e.g., the constituting atoms or molecules) over time. In the long term we hope our theory can help to understand identity over time for these cases. For the moment, we are particularly interested in identity over time in formal settings like cellular automata, multivariate Markov chains, and more generally dynamical Bayesian networks. In these cases a formal notion of spatiotemporal entities (i.e., one defining spatial and temporal identity) would allow us to investigate persistence of entities/individuals formally. The persistence (and disappearance) of individuals are in turn fundamental to Darwinian evolution [2,3]. This suggests that spatiotemporal entities may be important for the understanding of the emergence of Darwinian evolution in dynamical systems.
    Another area in which a formal solution to the problem of identity over time, and thereby entities (In the following, if not stated otherwise, we always meanspatiotemporal entities when we refer to entities.), might become important is a theory of intelligent agents that are space-time embedded as described by Orseau and Ring [4]. Agents are examples of entities fulfilling further properties e.g., exhibition of actions, and goal-directedness (cf. e.g., [5]). Using the formalism of reinforcement learning Legg and Hutter [6] proposes a definition of intelligence. Orseau and Ring [4] argue that this definition is insufficient. They dismiss the usual assumption that the environment of the reinforcement agent cannot overwrite the agent’s memory (which in this case is seen as the memory/tape of a Turing machine). They conclude that in the most realistic case there only ever is one memory that the agent’s (and the environment’s) data is embedded in. They note that the difference between agent and environment then disappears. Furthermore, that the policy of the agent cannot be freely chosen anymore, only the initial condition. In order to measure intelligence according to Legg and Hutter [6] we must be able to define reward functions. This seems difficult without the capability to distinguish the agent according to some criterion. Towards the end of their publication Orseau and Ring [4] propose to define a “heart” pattern and use the duration of its existence as a reward. This seems a too specific approach to us since it basically defines identity over time (of the heart pattern) as invariance. In more general settings a pattern that maintains a more general criterion of identity over time would be desirable. Ideally, this criterion would also not need a specifically designed heart pattern. Another advantage would be that reward functions different from lifetime could be used if the agent were identifiable. An entity criterion in the sense of this paper would be a step in this direction.

    1.1. Illustration

    In order to introduce the contributions of this paper we illustrate the setting of our work further.
    This illustration should only be taken as a motivation for what follows and not be confused with a result. The reason we don’t use a concrete example is simply that we lack the necessary computational means (which are considerable as we will discuss inSection 5).
    Let us assume we are given the entire time-evolution (what we will call atrajectory) of some known multivariate dynamical system or stochastic process. For example, a trajectory of a one-dimensional elementary cellular automaton showing a glider collision likeFigure 1a (This is produced by the rule 62 elementary cellular automaton with time increasing from left to right. However, this does not matter here. For more about this system see e.g., Boccara et al. [7]).
    We take the point of view here argued for in previous work [8] that entities are phenomena that occur within trajectories and that they can be represented by (spatiotemporal) patterns. Patterns in this sense fix a part of the variables in a trajectory to definite values and leave the rest undetermined. InFigure 1b–d we show such patterns that occur inFigure 1a with the undetermined variables coloured grey and the determined ones taking on those of the trajectory. Visually speaking, a pattern is a snippet from a trajectory that it occurs in.
    FromFigure 1a we would probably expect that what we are seeing are two gliders colliding and forming a third. However, it may also be that one of the gliders absorbs the other, maintains some form of identity, and only changes its appearance (e.g., it “grows”). This highlights the problem of identity over time. While the spatial identity of such patterns has been treated multiple times in the literature their identity of over time is rarely dealt with.
    Our approach evaluates the “integration” of spatiotemporally extended patterns at once. According to our proposed entity-criterion a pattern is anι-entity if, due to the dynamics of the system, every part of this pattern (which is again a pattern) makes all other parts more probable. Identity over time is then included since future parts have to make past parts more probable and vice versa. In principle this would allow us to detect if one of the gliders absorbs another one without loosing its identity. For example, this could result in an entity as inFigure 1e.
    In order to detect entities the straightforward approach is to evaluate the entity-criterion for every spatiotemporal pattern in a given trajectory. Evaluating our entity-criterion of positive complete local integration (CLI) for a given pattern corresponds to splitting the pattern into parts in every possible way and calculating whether all the resulting parts make each other more probable. This means evaluating the specific local integration (SLI) with respect to allpartitions of the set of variables occupied by the pattern.

    1.2. Contributions

    This paper contains four contributions.
    We first give a more formal definition of patterns. Since each pattern uniquely specifies a set of trajectories (those trajectories that the pattern occurs in) one might be tempted to reduce the analysis to that of sets of trajectories. We show that this is not possible since not all sets of trajectories have a pattern that specifies them.
    Second, we try to get a general intuition for the patterns whose parts make all other parts more probable. For this we show how to construct patterns that, for given probability of the whole pattern, achieve the least upper bound of specific local integration (SLI). These turn out to be patterns for which each part only occurs if and only if the whole pattern occurs. We also construct a pattern that, again for given probability of the whole pattern, hasnegative SLI. These pattern (which may achieve the greatest lower bound of SLI) occur if either the whole pattern occurs or the pattern occurs up to exactly one part of it, which does not occur.
    Third, we prove the disintegration theorem. This is the main contribution. We saw that patterns are snippets of trajectories. We can also look at the whole trajectory as a single pattern. Like all patterns the trajectory can be split up into parts, i.e., partitioned, resulting in a set of patterns. Among the partitions we find examples such as those inFigure 1g,h. These are very particular partitions picking out the gliders among all possible parts. This suggests that finding such special partitions provides a (possibly different) notion of entities.
    One intuition we might have is that entities are the most “independent” parts of a trajectory. In other words we could look for the partition whose parts make the other partsless probable. The disintegration theorem then shows that this approach again leads to theι-entities. This shows thatι-entities do not only have an intuitive motivation but also play a particular role in the structure of probabilities of entire trajectories.
    It is not directly the parts of the partitions that minimise SLI for a trajectory which areι-entities. To getι-entities we first classify all partitions of the trajectory according to their SLI value. Then within each such class we choose the partitions for which no refining partition (A refining partition is one that further partitions any of the parts of the original partition.) achieves an even lower level of SLI.
    So according to the disintegration theorem aι-entity is not only a pattern that is integrated with respect to every possible partition of the pattern but also a pattern that occurs in partitions that minimise (in a certain sense) the integration of trajectories.
    A side effect of the disintegration theorem is that we naturally get a kind of hierarchy ofι-entities called the disintegration hierarchy. For each trajectory and its different levels of SLI we find different decompositions of the trajectory intoι-entities.
    Fourth, we calculate theι-entities and disintegration hierarchy for two simple example systems. Our example systems show that in general the partitions at a particular disintegration level are not unique. This means that there are overlappingι-entities at those levels. Furthermore, the sameι-entity can occur on multiple levels of the disintegration.
    We do not thoroughly discuss the disintegration hierarchies in this paper and postpone this to future publications. Here we only note that many entities in the real world occur within hierarchies as well. For example, animals are entities that are composed of cells which are themselves entities.

    1.3. Related Work

    We now give a quick overview of related work. More in depth discussions will be provided after we formally introduce our definitions.
    To our knowledge the measure of CLI has been proposed for the first time by us in [8]. However, this publication contained none of the formal or numerical results in the present paper. From a formal perspective the measures of SLI and CLI are a combination of existing concepts. SLI localises multi-information [9,10] in the way proposed by Lizier [11] for other information theoretic measures. In order to get the CLI we apply the weakest-link approach proposed by Tononi and Sporns [12], Balduzzi and Tononi [13] to SLI.
    Conceptually, our work is most closely related to Beer [14]. The notion of spatiotemporal patterns used there to capture blocks, blinkers, and gliders is equivalent to thepatterns we define more formally here. This work also contains an informal entity-criterion that directly deals with identity over time (not only space). It differs significantly from our proposal as it depends on the re-occurrence of certain transitions at later times in a pattern whereas our criterion only depends on the probabilities of parts of the patterns without the need for any re-occurrences.
    Theorganisations of chemical organisation theory [15] may also be interpreted as entity-criteria. In Fontana and Buss [15] these are defined in the following way:
    The observer will conclude that the system is an organisation to the extent that there is a compressed description of its objects and of their relations.
    The direct intuition is different from ours and it is not clear to us in how far our entity-criterion is equivalent to this. This will be further investigated in the future.
    It is worth noting that viewing entities/objects/individuals as patterns occurring within a trajectory is in contrast to an approach that models them as sets of random variables/stochastic processes (e.g., a set of cells in a CA in contrast to a set of specific values of a set of cells). An example of the latter approach are the information theoretic individuals of Krakauer et al. [16]. These individuals are identified using an information theoretic notion of autonomy due to Bertschinger et al. [17]. The latter notion of autonomy is also somewhat related to the idea of integration here. Autonomy contains a term that measures the degree to which a random variable representing an individual at timestept determines the random variable representing it att+1. Similarly, CLI requires that every part of an entity pattern makes every other part more probable, in the extreme case this means that every part determines that every other part of the pattern also occurs. However, formally autonomy evaluates random variables and not patterns directly.
    At the most basic level the intuition behind entities is that some spatiotemporal patterns are more special than others. Defining (and usually finding) more important spatiotemporal patterns or structures (also called coherent structures) has a long history in the theory of cellular automata and distributed dynamical systems. As Shalizi et al. [18] have argued most of the earlier definitions and methods [19,20,21,22] require previous knowledge about the patterns being looked for. They are therefore not suitable for a general definition of entities. More recent definitions based on information theory [18,23,24] do not have this limitation anymore. The difference to our entity-criterion is that they do not treat identity over time. They are well suited to identify gliders at each time-step for example, but if two gliders collide and give rise to a third glider as inFigure 1a these methods (by design) say nothing about the identity of the third glider. i.e., they cannot make a difference between a glider absorbing another one and two gliders producing a new one. While we have not been able to show that our approach actually makes such distinctions for gliders, it could do so in principle.
    We note here that the approach of identifying individuals by Friston [25] using Markov blankets has the same shortcoming as the spatiotemporal filters. For each individual time-step it returns a partition of all degrees of freedom into internal, sensory, active, and external degrees. However, it does not provide a way to resolve ambiguities in the case of multiple such partitions colliding.
    Among research related to integrated information theory (IIT) there are approaches (a first one by Balduzzi [26] and a more recently by Hoel et al. [27]) that can be used to determine specific spatiotemporal patterns in a trajectory. They can therefore be interpreted to define a notion of entities even if that is not their main goal. These approaches are aimed at establishing the optimal spatiotemporal coarse-graining to describe the dynamics of a system. For a given trajectory we can then identify the patterns that instantiate a macro-state/coarse-grain that is optimal according to their criterion.
    In contrast to our approach the spatiotemporal grains are determined by their interactions with other grains. In our case the entities are determined first and foremost by their internal relations.
    The consequence seems to be that a pattern can be an entity in one trajectory and not an entity in another even if it occurs in both. In our conception a pattern is an entity in all trajectories it occurs in.

    2. Notation and Background

    In this section we briefly introduce our notation for sets of random variables (Since every set of jointly distributed random variables can be seen as a Bayesian network and vice versa we use these terms interchangeably.) and their partition lattices.
    In general, we use the convention that upper-case lettersX,Y,Z are random variables, lower-case lettersx,y,z are specific values/outcomes of random variables, and calligraphic lettersX,Y,Z are state spaces that random variables take values in. Furthermore:
    Definition 1.
    Let{Xi}iV be a set of random variables with totally ordered finite index set V and state spaces{Xi}iV respectively. Then forA,BV define:
    1. 
    XA:=(Xi)iA as the joint random variable composed of the random variables indexed by A, where A is ordered according to the total order of V,
    2. 
    XA:=iAXi as the state space ofXA,
    3. 
    xA:=(xi)iAXA as a value ofXA,
    4. 
    pA:XA[0,1] as the probability distribution (or more precisely probability mass function) ofXA which is the joint probability distribution over the random variables indexed by A. IfA={i} i.e., a singleton set, we drop the parentheses and just writepA=pi,
    5. 
    pA,B:XA×XB[0,1] as the probability distribution overXA×XB. Note that in general for arbitraryA,BV,xAXA, andyBXB this can be rewritten as a distribution over the intersection of A and B and the respective complements. The variables in the intersection have to coincide:
    pA,B(xA,yB):=pA\B,AB,B\A,AB(xA\B,xAB,yB\A,yAB)
    =δxAB(yAB)pA\B,AB,B\A(xA\B,xAB,yB\A).
    Here δ is the Kronecker delta (seeAppendix A). IfAB= andC=AB we also writepC(xA,yB) to keep expressions shorter.
    6. 
    pB|A:XA×XB[0,1] with(xA,xB)pB|A(xB|xA) as the conditional probability distribution overXB givenXA:
    pB|A(yB|xA):=pA,B(xA,yB)pA(xA).
    We also just writepB(xB|xA) if it is clear from context what variables we are conditioning on.
    If we are givenpV we can obtain everypA through marginalisation. In the notation of Definition 1 this is formally written:
    pA(xA)=x¯V\AXV\ApA,V\A(xA,x¯V\A)
    =x¯V\AXV\ApV(xA,x¯V\A).
    Next we define the partition lattice of a set of random variables. Partition lattices occur as a structure of the set of possible ways to split an object/pattern into parts. Subsets of the partition lattices play an important role in the disintegration theorem.
    Definition 2 (Partition lattice of a set of random variables).
    Let{Xi}iV be a set of random variables.
    1. 
    Then its partition latticeL(V) is the set of partitions of V partially ordered by refinement (see alsoAppendix B).
    2. 
    For two partitionsπ,ρL(V) we writeπρ if π refines ρ andπ:ρ if π covers ρ. The latter means thatπρ,πρ, and there is noξL(V) withπξρ such thatπξρ.
    3. 
    We write0 for the zero element of a partially ordered set (including lattices) and1 for the unit element.
    4. 
    Given a partitionπL(V) and a subsetAV we define the restricted partitionπ|A of π to A via:
    π|A:={bA:bπ}.
    For some examples of partition lattices seeAppendix B and for more background see e.g., Grätzer [28]. For our purpose it is important to note that the partitions of sets of random variables or Bayesian networks we are investigating are partitions of the index setV of these and not partitions of their state spacesXV.

    3. Patterns, Entities, Specific, and Complete Local Integration

    This section contains the formal part of this contribution.
    First we introducepatterns. Patterns are the main structures of interest in this publication. Entities are seen as special kinds of patterns. The measures of specific local integration and complete local integration, which we use in our criterion forι-entities, quantify notions of “oneness” of patterns. We give a brief motivation and show that while each pattern defines a set of “trajectories” of a set of random variables not every such set is defined by a pattern. This justifies studying patterns for their own sake.
    Then we motivate briefly the use of specific and complete local integration (SLI and CLI) for an entity criterion on patterns. We then turn to more formal aspects of SLI and CLI. We first prove an upper bound for SLI and construct a candidate for a lower bound. We then go on to define the disintegration hierarchy and its refinement-free version. These structures are used to prove the main result, thedisintegration theorem. This relates the SLI of whole trajectories of a Bayesian network to the CLI of parts of these trajectories and vice versa.

    3.1. Patterns

    This section introduces the notion of patterns. These form the basic candidate structures for entities.
    The structures we are trying to capture by entities should be analogous to spatially and temporally extended objects we encounter in everyday life (e.g., soap bubbles, living organisms). These objects seem to occur in the single history of the universe that also contains us. The purpose of patterns is then to capture arbitrary structures that occur within single trajectories or histories of a multivariate discrete dynamical system (seeFigure 2 for an example of a Bayesian network of such a system, any cellular automaton is also such a system).
    We emphasise the single trajectory since many structures of interest (e.g., gliders) occur in some trajectories in some “places”, in other trajectories in other “places” (compare e.g.,Figure 1a andFigure 3a), and in some trajectories not at all. We explicitly want to be able to capture such trajectory dependent structures and therefore choose patterns. Examples of formal structures for which it makes no sense to say that they occur within a trajectory are for example the random variables in a Bayesian network and, as we will see, general sets of trajectories of the Bayesian network.
    Unlike entities, which we conceive of as special patterns that fulfil further criteria, patterns are formed byany combination of events at arbitrary times and positions. As an example, we might think of cellular automaton again. The time evolutions over multiple steps of the cells attributed to a glider see [14] for a principled way to attribute cells to theseas inFigure 1b,e should be patterns but also arbitrary choices of events in a trajectory as inFigure 3b.
    In the more general context of (finite) Bayesian networks there may be no interpretation of time or space. Nonetheless, we can define that a trajectory in this case fixes every random variable to a particular value. We then define patterns formally in the following way.
    Definition 3 (Patterns and trajectories).
    Let{Xi}iV be set of random variables with index set V and state spaces{Xi}iV respectively.
    1. 
    A pattern atAV is an assignment
    XA=xA
    wherexAXA. If there is no danger of confusion we also just writexA for the patternXA=xA at A.
    2. 
    The elementsxV of the joint state spaceXV are isomorphic to the patternsXV=xV at V which fix the complete set{Xi}iV of random variables. Since they will be used repeatedly we refer to them as the trajectoriesof{Xi}iV.
    3. 
    A patternxA is said to occur in trajectoryx¯VXV ifx¯A=xA.
    4. 
    Each patternxA uniquely defines (or captures) a set of trajectoriesT(xA) via
    T(xA)={x¯VXV:x¯A=xA},
    i.e., the set of trajectories thatxA occurs in.
    5. 
    It is convenient to allow the empty patternx for which we defineT(x)=XV.
    Remarks:
    • Note that for everyxAXA we can form a patternXA=xA so the set of all patterns isAVXA.
    • Our notion of patterns is similar to “patterns” as defined in [29] and to “cylinders” as defined in [30]. More precisely, these other definitions concern (probabilistic) cellular automata where all random variables have identical state spacesXi=Xj for alli,jV. They also restrict the extent of the patterns or cylinders to a single time-step. Under these conditions our patterns are isomorphic to these other definitions. However, we drop both the identical state space assumption and the restriction to single time-steps.
      Our definition is inspired by the usage of the term “spatiotemporal pattern” in [14,31,32]. There is no formal definition of this notion given in these publications but we believe that our definition is a straightforward formalisation. Note that these publications only treat the Game of Life cellular automaton. The assumption of identical state space is therefore implicitly made. At the same time the restriction to single time-steps is explicitly dropped.
    Since every pattern defines a subset ofXV, one could think that every subset ofXV is also a pattern. In that case studying patterns in a set of random variables{Xi}iV would be the same as studying subsets of its set of trajectoriesXV. However, the set of subsets ofXV defined by patterns and the set of all subsets2XV (i.e., the power set) ofXV of a set of random variables{Xi}iV are not identical. Formally:
    BV{T(xB)XV:xBXB}2XV.
    While patterns define subsets ofXV, not every subset ofXV is captured by a pattern. The difference of the two sets is characterised in Theorem 1 below. We first present a simple example of a subsetD2XV that cannot be captured by a pattern.
    LetV={1,2} and{Xi}iV={X1,X2} the set of random variables. LetX1=X2={0,1}. ThenXV={(0,0),(0,1),(1,0),(1,1)}. Now letA=V={1,2}, choose patternxA=(0,0) and patternx¯A=(1,1). Then let
    D:={xAx¯A}={(0,0),(1,1)}.
    In this case we can easily list the set of all patternsCVXC:
    Entropy 19 00230 i001
    and verify thatD is not among them. Before we formally characterise the difference, we define some extra terminology.
    Definition 4.
    Let{Xi}iV be set of random variables with index set V and state spaces{Xi}iV respectively. For a subsetDXV the setDA of all patterns at A that occur in one of the trajectories inD is defined as
    DA:={xAXA:x¯VD,x¯A=xA}.
    So in the previous exampleD{1}={0,1},D{2}={0,1},D{1,2}={(0,0),(1,1)}. In then get the following theorem which establishes the difference between the subsets ofXV captured by patterns and general subsets.
    Theorem 1.
    Given a set of random variables{Xi}iV, a subsetDXV cannot be represented by a pattern of{Xi}iV if and only if there existsAV withDAXA (proper subset) and|DA|>1, i.e., if neither all patterns at A are possible nor a unique pattern at A is specified byD.
    Proof. 
    SeeAppendix D. ☐
    We saw that in the previous example the subsetD cannot be captured by a pattern. ForA={1} we haveD{1}={0,1}=X{1} and forA={2} we haveD{2}={0,1}=X{2} so these do not fulfil the conditions of Theorem 1. However, forA={1,2} we haveD{1,2}={(0,0),(1,1)}X{1,2} and|D{1,2}|>1 so the conditions of Theorem 1 are fulfilled and as expectedD cannot be captured by a pattern.
    The proof of the following corollary shows how to construct a subset that cannot be represented by a pattern for all sets of random variables{Xi}iV with|XV|>2.
    Corollary 1.
    Given a set of random variables{Xi}iV, if|XV|>2 then
    BV{T(xB)XV:xBXB}2XV
    (proper subset).
    Proof. 
    ChooseD={xV,yV}2XV withyV{x¯VXV:iV,x¯ixi}. Then for allAV we have|DA|=2 andDAXA. SoD cannot be represented by a pattern according to Theorem 1 and soDBV{T(xB)XV:xBXB}. ☐
    This means that in every set of random variables that not only consists of a single binary random variable there are subsets ofXV that cannot be captured by a pattern. We can interpret this result in the following way. Patterns were constructed to be structures that occur within trajectories. It then turned out that each pattern also defines a subset of all trajectories of a system. So for sets of trajectories captured by patterns it could make sense to say they “occur” within one trajectory. However, there are sets of trajectories that are not captured by patterns. For these sets of trajectories it would then not be well-defined to say that they occur within a trajectory. This is the reason we choose to investigate patterns specifically and not sets of trajectories.

    3.2. Motivation of Complete Local Integration as an Entity Criterion

    We proposed to use patterns as the candidate structures for entities since patterns comprise arbitrary structures that occur within single trajectories of multivariate systems. Here we heuristically motivate our choice of using positive complete local integration as a criterion to select entities among patterns. In general such a criterion would give us, for any Bayesian network{Xi}iV a subsetE({Xi}iV)AVXA of the patterns.
    So what is an entity? We can rephrase the problem of finding an entity criterion by saying an entity is composed of parts that share the same identity. So if we can define when parts share the same identity we also define entities by finding all parts that share identity with some given part. For the moment, let us decompose (as is often done [33]) the problem of identity into two parts:
    • spatial identity and
    • temporal identity.
    Our solution will make no distinction between these two aspects in the end. We note here that conceiving of entities (or objects) as composite of spatial and temporal parts as we do in this paper is referred to as four-dimensionalism or perdurantism in philosophical discussions (see e.g., [34]). The opposing view holds that entities are spatial only and endure over time. This view is called endurantism. Here we will not go into the details of this discussion.
    The main intuition behind complete local integration is that every part of an entity should make every other part more probable.
    This seems to hold for example for the spatial identity of living organisms. Parts of living organisms rarely exist without the rest of the living organisms also existing. For example, it is rare that an arm exists without a corresponding rest of a human body existing compared to an arm and the rest of a human body existing. The body (without arm) seems to make the existence of the arm more probable and vice versa. Similar relations between parts seem to hold for all living organisms but also for some non-living structures. The best example of a non-living structure we know of for which this is obvious are soap bubbles. Half soap bubbles (or thirds, quarters,...) only ever exist for split seconds whereas entire soap bubbles can persist for up to minutes. Any part of a soap bubble seems to make the existence of the rest more probable. Similarly, parts of hurricanes or tornadoes are rare. So what about spatial parts of structures that are not so entity-like? Does the existence of an arm make things more probable that are not parts of the corresponding body? For example, does the arm make the existence of some piece of rock more probable? Maybe to a small degree as without the existence of any rocks in the universe humans are probably impossible. However, this effect is much smaller than the increase of probability of the existence of the rest of the body due to the arm.
    These arguments concerned the spatial identity problem. However, for temporal identity similar arguments hold. The existence of a living organism at one point in time makes it more probable that there is a living organism (in the vicinity) at a subsequent (and preceding) point in time. If we look at structures that are not entity-like with respect to the temporal dimension we find a different situation. An arm at some instance of time does not make the existence of a rock at a subsequent instance much more probable. It does make the existence of a human body at a subsequent instance much more probable. So the human body at the second instance seems to be more like a future part of the arm than the rock. Switching now to patterns in sets of random variables we can easily formalise such intuitions. We required that for an entity every part of the structure, which is now a patternxO, makes every other part more probable. A part of a pattern is a patternxb withbO. If we require that every part of a pattern makes every other part more probable then we can write thatxO is an entity if:
    minbOpO\b(xO\b|xb)pO\b(xO\b)>1.
    This is equivalent to
    minbOpO(xO)pO\b(xO\b)pb(xb)>1.
    If we writeL2(O) for the set of all bipartitions ofO we can rewrite this further as
    minπL2(O)pO(xO)bπpb(xb)>1.
    We can interpret this form as requiring that for every possible partitionπL2(O) into two partsxb1,xb2 the probability of the whole patternxO=(xb1,xb2) is bigger than its probability would be if the two parts were independent. To see this, note that if the two partsxb1,xb2 were independent we would have
    pO(xO)=:pb1,b2(xb1,xb2)=pb1(xb1)pb2(xb2).
    Which would give us
    pO(xO)bπpb(xb)=1
    for this partition.
    From this point of view the choice of bipartitions only seems arbitrary. For example, the existence a partitionξ into three parts such that
    pO(xO)=cξpc(xc)
    seems to suggest that the patternxO is not an entity but instead composite of three parts. We can therefore generalise Equation (16) to include all partitionsL(O) (see Definition 2) ofO except the unit partition1O. Then we would say thatxO is an entity if
    minπL(O)\1OpO(xO)bπpb(xb)>1.
    This measure already results in the same entities as the measure we propose.
    However, in order to connect with information theory, log-likelihoods, and related literature we formally introduce the logarithm into this equation. We then arrive at the following entity-criterion
    minπL(O)\1OlogpO(xO)bπpb(xb)>0.
    where the left hand side is the complete local integration (CLI), the function minimised is the specific local integration (SLI), and the inequality provides the criterion forι-entities. For reference, we define these notions formally. We begin with SLI which quantifies for a given partitionπ of a pattern in how far the probability of the whole pattern is bigger than its probability would be if the blocks of the partition would be independent.
    Definition 5 (Specific local integration (SLI)).
    Given a Bayesian network{X}iV and a patternxO the specific local integrationmiπ(xO) ofxO with respect to a partition π ofOV is defined as
    miπ(xO):=logpO(xO)bπpb(xb).
    In this paper we use the convention thatlog00:=0.
    Definition 6. ((Complete) local integration).
    Given a Bayesian network{Xi}iV and a patternxO of this network the complete local integrationι(xO) ofxO is the minimum SLI over the non-unit partitionsπL(O)\1O:
    ι(xO):=minπL(O)\1Omiπ(xO).
    We call a patternxO completely locally integrated ifι(xO)>0.
    Remarks:
    • The reason for excluding the unit partition1O ofL(O) (where1O={O} see Definition 2) is that with respect to it every pattern hasmi1O(xO)=0.
    • Looking for a partition that minimises a measure of integration is known as theweakest link approach [35] to dealing with multiple partitions. We note here that this is not the only approach that is being discussed. Another approach is to look at weighted averages of all integrations. For a further discussion of this point in the case of the expected value of SLI see Ay [35] and references therein. For our interpretation taking the average seems less well suited since requiring a positive average will allow SLI to be negative with respect to some partitions.
    Definition 7 (ι-entity).
    Given a multivariate Markov chain{Xi}iV a patternxO is a ι-entity if
    ι(xO)>0.
    The entire set ofι-entitiesEι({Xi}iV) is then defined as follows.
    Definition 8 (ci-entity-set).
    Given a multivariate Markov chain{Xi}iV the ι-entity-set is the entity-set
    Eι({Xi}iV):={xOAVXA:ι(xO)>0}.
    Next, we look at some interpretations that the introduction of the logarithm allows.
    • A first consequence of introducing the logarithm is that we can now formulate the condition of Equation (24) analogously to an old phrase attributed to Aristotle that “the whole is more than the sum of its parts”. In our case this would need to be changed to “the log-probability of the (spatiotemporal) whole is greater than the sum of the log-probabilities of its (spatiotemporal) parts”. This can easily be seen by rewriting Equation (22) as:
      miπ(xO)=logpO(xO)bπlogpb(xb).
    • Another side effect of using the logarithm is that we can interpret Equation (24) in terms of the surprise value (also called information content)logpO(xO) [36] of the patternxO and the surprise value of its parts with respect to any partitionπ. Rewriting Equation (22) using properties of the logarithm we get:
      miπ(xO)=bπ(logpb(xb))(logpO(xO)).
      Interpreting Equation (24) from this perspective we can then say that a pattern is an entity if the sum of the surprise values of its parts is larger than the surprise value of the whole.
    • In coding theory, the Kraft-McMillan theorem [37] tells us that the optimal length (in a uniquely decodable binary code) of a codeword for an eventx isl(x)=logp(x) ifp(x) is thetrue probability ofx. If the encoding is not based on the true probability ofx but instead on a different probabilityq(x) then the difference between the optimal codeword length and the chosen codeword length is
      logq(x)(logp(x))=logp(x)q(x).
      Then we can interpret the specific local integration as a difference in codeword lengths. Say we want to encode what occurs at the nodes/random variables indexed byO, i.e., we encode the random variableXV. We can encode every event (now a pattern)xO based onpO(xO). Let’s call this thejoint code. Given a partitionπL(O) we can also encode every eventxO based on its product probabilitybπOpb(xb). Let’s call this theproduct code with respect to π. For a particular eventxO the difference of the codeword lengths between the joint code and the product code with respect toπ is then just the specific local integration with respect toπ.
      Complete local integration then requires that the joint code codeword is shorter than all possible product code codewords. This means there is no partition with respect to which the product code for the patternxO has a shorter codeword than the joint code. Soι-entities are patterns that are shorter to encode with the joint code than a product code. Patterns that have a shorter codeword in a product code associated to a partitionπ have negative SLI with respect to thisπ and are therefore notι-entities.
    • We can relate our measure of identity to other measures in information theory. For this we note that the expectation value of specific local integration with respect to a partitionπ is the multi-informationMIπ(XO) [9,10] with respect toπ, i.e.,
      MIπ(XO):=xOXOpO(xO)logpO(xO)bπpb(xb)
      =xOXOpO(xO)miπ(xO).
      The multi-information plays a role in measures of complexity and information integration [35]. The generalisation from bipartitions to arbitrary partitions is applied to expectation values similar to the multi-information above in Tononi [38]. The relations of our localised measure (in the sense of [11]) to multi-information and information integration measures also motivates the name specificlocal integration. Relations to these measures will be studied further in the future. Here we note that these are not suited for measuring identity of patterns since they are properties of the random variablesXO and not of patternsxO. We also show in Corollary 2 that ifxO is anι-entity thatXO (the joint random variable) has a positiveMIπ(XO) for all partitionsπ and is therefore a set of “integrated” random variables.

    3.3. Properties of Specific Local Integration

    This section investigates the specific local integration (SLI) (see Definition 5). After giving its expression for deterministic systems it proves upper bounds constructively and constructs an example of negative SLI.

    3.3.1. Deterministic Case

    Theorem 2 (Deterministic specific local integration).
    Given a deterministic Bayesian network (Definition A10), a uniform initial distribution overXV0 (V0 is the set of nodes without parents), and a patternxO withOV the SLI ofxO with respect to partition π can be expressed more specifically: LetN(xO) refer to the number of trajectories in whichxO occurs. Then
    miπ(xO)=(|π|1)log|XV0|+logN(xO)bπlogN(xb).
    Proof. 
    SeeAppendix C.2. ☐
    The first term in Equation (30) is always positive if the partition and the set of random variables are not trivial (i.e., have cardinality larger than one) and is a constant for partitions of a given cardinality. The second term is also always non-negative for patternsxO that actually occur in the system and rises with the number of trajectories that lead to it. The third term is always non-positive and becomes more and more negative the higher the number of trajectories that lead to the parts of the pattern occurring.
    This shows that to maximise SLI for fixed partition cardinality we need to find patterns that have a high number of trajectories leading to them and a low number of occurrences for all their parts. Since the number of occurrences of the parts cannot be lower than the number of occurrences of the whole, we should get a maximum SLI for patterns whose parts occur only if the whole occurs. This turns out to be true also for the non-deterministic systems as we prove in Theorem 4.
    Conversely, if we can increase the number of occurrences of the parts of the pattern without increasing the occurrences of the whole pattern occurring we minimise the SLI. This leads to the intuition that as often as possible as many parts as possible (i.e., all but one) should co-occur without the whole pattern occurring. This consistently leads to negative SLI as we will show for the non-deterministic case in Theorem 5.

    3.3.2. Upper Bounds

    In this section we present the upper bounds of SLI. These are of general interest, but the constructive proof also provides an intuition for what kind of patterns have large SLI.
    We first show constructively that if we can choose the Bayesian network and the pattern then SLI can be arbitrary large. This construction sets the probabilities of all blocks equal to the probability of the pattern and implies that each of the parts of the pattern occurs only if the entire pattern occurs. The simplest example is one binary random variable determining another to always be in the same state, then the two patterns with both variables equal have this property. In the subsequent theorem we show that this property in general gives the upper bound of SLI if the cardinality of the partition is fixed. A simple extension of this example is used in the proof of the least upper bound. First we prove that there are Bayesian networks that achieve a particular SLI value. This will be used in the proofs that follow. For this we first define the anti-patterns which are patterns that differ to a given pattern at every random variable that is specified.
    Definition 9 (Anti-pattern).
    Given a patternxO define its set of anti-patterns¬(xO) that have values different from those ofxO on all variables in O:
    ¬(xO):={x¯OXO:iO,x¯ixi}.
    Remark:
    • It is important to note that for an element of¬(xO) to occur it is not sufficient thatxO does not occur. Only ifevery random variableXi withiO differs from the valuexi specified byxO does an element of¬(xO) necessarily occur. This is why we call¬(xO) the anti-pattern ofxO.
    Theorem 3 (Construction of a pattern with maximum SLI).
    Given a probabilityq(0,1) and a positive natural number n there is a Bayesian network{Xi}iV with|V|n and a patternxO such that
    miπ(xO)=(n1)logq.
    Proof. 
    We construct a Bayesian network which realises two conditions on the probabilitypO. From these two conditions (which can also be realised by other Bayesian networks) we can then derive the theorem.
    Choose a Bayesian network{Xi}iV with binary random variablesXi={0,1} for alliV. Choose all nodes inO dependent only on nodejO, the dependence of the nodes inV\O is arbitrary:
    • for alliOV letpa(i)(V\O)=, i.e., nodes inO have no parents in the complement ofO,
    • for a specificjO and all otheriO\{j} letpa(i)={j}, i.e., all nodes inO apart fromj havejO as a parent,
    • for alliO\{j} letpi(x¯i|bx¯j)=δx¯j(x¯i), i.e., the state of all nodes inO is always the same as the state of nodej,
    • also choosepj(xj)=q andx¯jxjpj(xj)=1q.
    Then it is straightforward to see that:
    • pO(xO)=q,
    • x¯O¬(xO)pO(x¯O)=1q.
    Note that there are many Bayesian networks that realise the latter two conditions for somexO. These latter two conditions are the only requirements for the following calculation.
    Next note that the two conditions imply thatpO(x¯O)=0 if neitherx¯O=xO norx¯O¬(xO). Then for every partitionπ ofO with|π|=n andn>1 we have
    miπ(xO)=logpO(xO)bπpb(xb)
    =logpO(xO)bπx¯O\bpO(xb,x¯O\b)
    =logpO(xO)bπpO(xO)+x¯O\bxO\bpO(xb,x¯O\b)
    =logpO(xO)bπpO(xO)
    =logpO(xO)pO(xO)n
    =(n1)logq.
     ☐
    Theorem 4 (Upper bound of SLI).
    For any Bayesian network{X}iV and patternxO with fixedpO(xO)=q
    1. 
    The tight upper bound of the SLI with respect to any partition π with|π|=n fixed is
    max{{Xi}iV:xO,pO(xO)=q}max{π:|π|=n}miπ(xO)(n1)logq.
    2. 
    The upper bound is achieved if and only if for allbπ we have
    pb(xb)=pO(xO)=q.
    3. 
    The upper bound is achieved if and only if for allbπ we have thatxb occurs if and only ifxO occurs.
    Proof. 
    ad 1 
    By Definition 5 we have
    miπ(xO)=logpO(xO)bπpb(xb).
    Now note that for anyxO andbO
    pb(xb)=x¯O\bpO(xb,x¯O\b)
    =pO(xO)+x¯O\bxO\bpO(xb,x¯O\b)
    pO(xO).
    Plugging this into Equation (41) for everypb(xb) we get
    miπ(xO)=logpO(xO)bπpb(xb)
    logpO(xO)pO(xO)|π|
    =(|π|1)logpO(xO).
    This shows that(|π|1)logpO(xO) is indeed an upper bound. To show that it is tight we have to show that for a givenpO(xO) and|π| there are Bayesian networks with patternsxO such that this upper bound is achieved. The construction of such a Bayesian network and a patternxO was presented in Theorem 3.
    ad 2 
    If for allbπ we havepb(xb)=pO(xO) then clearlymiπ(xO)=(|π|1)logpO(xO) and the least upper bound is achieved. If on the other handmiπ(xO)=(|π|1)logpO(xO) then
    logpO(xO)bπpb(xb)=(|π|1)logpO(xO)
    logpO(xO)bπpb(xb)=logpO(xO)pO(xO)|π|
    bπpb(xb)=pO(xO)|π|,
    and becausepb(xb)pO(xO) (Equation (44)) any deviation of any of thepb(xb) frompO(xO) leads tobπpb(xb)>pO(xO)|π| such that for allbπ we must havepb(xb)=pO(xO).
    ad 3 
    By definition for anybπ we havebO such thatxb always occurs ifxO occurs. Now assumexb occurs andxO does not occur. In that case there is a positive probability for a pattern(xb,x¯O\b) withx¯O\bxO\b i.e.,pO(xb,x¯O\b)>0. Recalling Equation (43) we then see that
    pb(xb)=pO(xO)+x¯O\bxO\bpO(xb,x¯O\b)
    >pO(xO).
    which contradicts the fact thatpb(xb)=pO(xO) soxb cannot occur withoutxO occurring as well.
     ☐
    Remarks:
    • Note that this is the least upper bound for Bayesian networks in general. For a specific Bayesian network there might be no pattern that achieves this bound.
    • The least upper bound of SLI increases with the improbability of the pattern and the number of parts that it is split into. IfpO(xO)0 then we can havemiπ(xO).
    • Using this least upper bound it is easy to see the least upper bound for the SLI of a patternxO across all partitions|π|. We just have to note that|π||O|.
    • Since it is the minimum value of SLI with respect to arbitrary partitions the least upper bound of SLI is also an upper bound for CLI. It may not be the least upper bound however.

    3.3.3. Negative SLI

    This section shows that SLI of a patternxO with respect to partitionπ can be negativeindependently of the probability ofxO (as long as it is not 1) and the cardinality of the partition (as long as that is not 1). The construction which achieves this also serves as an example of patterns with low SLI. We conjecture that this construction might provide the greatest lower bound but have not been able to prove this yet. An intuitive description of the construction is that patterns which either occur as a whole or missing exactly one part always have negative SLI.
    Theorem 5.
    For any given probabilityq<1 and cardinality|π|=n>1 of a partition π there exists a Bayesian network{Xi}iV with a patternxO such thatq=pO(xO) and
    miπ(xO)=logq11qnn<0.
    Proof. 
    We construct the probability distributionpO:XO[0,1] and ignore the behaviour of the Bayesian network{Xi}iV outside ofOV. In any case{Xi}iO is also by itself a Bayesian network. We define (see remarks below for some intuitions behind these definitions and Definition 9 for¬(xA)):
    • for alliO let|Xi|=n
    • for every blockbπ let|b|=|O||π|,
    • forx¯OXO let:
      pO(x¯O):=qifx¯O=xO,1qdbπ|¬(xb)|ifcπs.t.x¯O\c=xO\cx¯cxc,d|¬(xO)|ifx¯O¬(xO),0else.
    Hered parameterises the probability of any pattern in¬(xO) occurring. We will carry it through the calculation but then end up setting it to zero.
    Next we calculate the SLI. First note that, according to 1. and 2., we have|Xb|=|Xc| for allb,cπ and therefore also|¬(xb)|=|¬(xc)| for allb,cπ. So letm:=|¬(xb)|. Then note that, according to 3, for allbπ
    x¯O\bxO\bpO(xb,x¯O\b)=cπ\bx¯cxcpO(xb,xO\(bc),x¯c)
    =cπ\bx¯cxc1qdbπ|¬(xb)|
    =cπ\bx¯cxc1qdm|π|
    =cπ\b1qdm|π||¬(xc)|
    =|π|1|π|(1qd)
    Plug this into the SLI definition:
    miπ(xO)=logpO(xO)bπpb(xb)
    =logqbπq+x¯O\bxO\bpO(xb,x¯O\b)
    =logqbπq+|π|1|π|(1qd)
    =logqq+|π|1|π|(1qd)|π|
    If we now setd=0 we get:
    miπ(xO)=logq11q|π||π|.
    Then we can use Bernoulli’s inequality (The authors thank von Eitzen [39] for pointing this out. An example reference for Bernoulli’s inequality is Bullen [40]). to prove that this is negative for0<q<1 and|π|2. Bernoulli’s inequality is
    (1+x)n1+nx
    forx1 andn a natural number. Replacingx by(1q)/|π| we see that
    11q|π||π|>q
    such that the argument of the logarithm is smaller than one which gives us negative SLI. ☐
    Remarks:
    • The achieved value in Equation (53) is also our best candidate for a greatest lower bound of SLI for givenpO(xO) and|π|. However, we have not been able to prove this yet.
    • The construction equidistributes the probability1q (left to be distributed after the probabilityq of the whole pattern occurring is chosen) to the patternsx¯O that arealmost the same as the patternxO. These are almost the same in a precise sense: They differ in exactly one of the blocks ofπ, i.e., they differ by as little as can possibly be resolved/revealed by the partitionπ.
    • In order to achieve the negative SLI of Equation (64) the requirement is only that Equation (59) is satisfied. Our construction shows one way how this can be achieved.
    • For a pattern and partition such that|O|/|π| is not a natural number, the same bound might still be achieved however a little extra effort has to go into the construction 3. of the proof such that Equation (59) still holds. This is not necessary for our purpose here as we only want to show the existence of patterns achieving the negative value.
    • Since it is the minimum value of SLI with respect to arbitrary partitions the candidate for the greatest lower bound of SLI is also a candidate for the greatest lower bound of CLI.

    3.4. Disintegration

    In this section we define the disintegration hierarchy and its refinement-free version. We then prove the disintegration theorem which is the main formal result of this paper. It exposes a connection between partitions minimising the SLI of a trajectory and the CLI of the blocks of such partitions. More precisely for a given trajectory the blocks of thefinest partitions among those leading to a particular value of SLI consist only of completely locally integrated blocks. Conversely,each completely locally integrated pattern is a block in such a finest partition leading to a particular value of SLI. The theorem therefore reveals thatι-entities can not only be motivated heuristically as we tried to do inSection 3.2 but in fact play a special role within the trajectories they occur in. Furthermore, this theorem allows additional interpretations of theι-entities which will be discussed inSection 3.5.
    The main tool we use for the proof, the disintegration hierarchy and especially its refinement free version are also interesting structure in their own right since they define a hierarchy among the partitions of trajectories that we did not anticipate. In the case of the refinement free version the disintegration theorem tells us that this hierarchy among partitions of trajectories turns out to be a hierarchy of splits of the trajectory intoci-entities.
    Definition 10 (Disintegration hierarchy).
    Given a Bayesian network{Xi}iV and a trajectoryxVXV, the disintegration hierarchy ofxV is the setD(xV)={D1,D2,D3,...} of sets of partitions ofxV with:
    1. 
    D1(xV):=arg minπL(V)miπ(xV)
    2. 
    and fori>1:
    Di(xV):=arg minπL(V)\Di(xV)miπ(xV).
    whereDi(xV):=j<iDj(xV). We callDi(xV) the i-th disintegration level.
    Remark:
    • Note that arg min returns all partitions that achieve the minimum SLI if there is more than one.
    • Since the Bayesian networks we use are finite, the partition latticeL(V) is finite, the set of attained SLI values is finite, and the number|D| of disintegration levels is finite.
    • In most cases the Bayesian network contains some symmetries among their mechanisms which cause multiple partitions to attain the same SLI value.
    • For each trajectoryxV the disintegration hierarchyD then partitions the elements ofL(V) into subsetsDi(xV) of equal SLI. The levels of the hierarchy have increasing SLI.
    Definition 11.
    LetL(V) be the lattice of partitions of set V and letE be a subset ofL(V). Then for every elementπL(V) we can define the set
    Eπ:={ξE:ξπ}.
    That isEπ is the set of partitions inE that are refinements of π.
    Definition 12 (Refinement-free disintegration hierarchy).
    Given a Bayesian network{Xi}iV, a trajectoryxVXV, and its disintegration hierarchyD(xV) the refinement-free disintegration hierarchy ofxV is the setD(xV)={D1,D2,D3,...} of sets of partitions ofxV with:
    1. 
    D1(xV):={πD1(xV):D1(xV)π=},
    2. 
    and fori>1:
    Di(xV):={πDi(xV):Di(xV)π=}
    Remark:
    • Each levelDi(xV) in the refinement-free disintegration hierarchyD(xV) consists only of those partitions that neither have refinements at their own nor at any of the preceding levels. So each partition that occurs in the refinement-free disintegration hierarchy at thei-th level is a finest partition that achieves such a low level of SLI or such a high level of disintegration.
    • As we will see below, the blocks of the partitions in the refinement-free disintegration hierarchy are the main reason for defining the refinement-free disintegration hierarchy.
    Theorem 6 (Disintegration theorem).
    Let{Xi}iV be a Bayesian network,xVXV one of its trajectories, andD(xV) the associated refinement-free disintegration hierarchy.
    1. 
    Then for everyDi(xV)D(xV) we find for everybπ withπDi(xV) that there are only the following possibilities:
    (a) 
    b is a singleton, i.e.,b={i} for someiV, or
    (b) 
    xb is completely locally integrated, i.e.,ι(xb)>0.
    2. 
    Conversely, for any completely locally integrated patternxA, there is a partitionπAL(V) and a levelDiA(xV)D(xV) such thatAπA andπADiA(xV).
    Proof. 
    ad 1 
    We prove the theorem by contradiction. For this assume that there is blockb in a partitionπDi(xV) which is neither a singleton nor completely integrated. LetπDi(xV) andbπ. Assumeb is not a singleton i.e., there existijV such thatib andjb. Also assume thatb is not completely integrated i.e., there exists a partitionξ ofb withξ1b such thatmiξ(xb)0. Note that a singleton cannot be completely locally integrated as it does not allow for a non-unit partition. So together the two assumptions implypb(xb)dξpd(xd) with|ξ|>1. However, then
    miπ(xV)=logpV(xV)pb(xb)cπ\bpc(xc)
    logpV(xV)dξpd(xd)cπ\bpc(xc)
    We treat the cases of “>” and “=” separately. First, let
    miπ(xV)=logpV(xV)dξpd(xd)cπ\bpc(xc).
    Then we can defineρ:=(π\b)ξ such that
    • miρ(xV)=miπ(xV) which implies thatρDi(xV) becauseπDi(xV), and
    • ρπ which contradictsπDi(xV).
    Second, let
    miπ(xV)>logpV(xV)dξpd(xd)cπ\bpc(xc).
    Then we can defineρ:=(π\b)ξ such that
    miρ(xV)<miπ(xV),
    which contradictsπDi(xV).
    ad 2 
    By assumptionxA is completely locally integrated. Then letπA:={A}{{j}}jV\A. SinceπA is a partition ofV it is an element of some disintegration levelDiA. Then partitionπA is also an element of the refinement-free disintegration levelDiA(xV) as we will see in the following. This is because any refinements must (by construction ofπA break upA into further blocks which means that the local specific integration of all such partitions is higher. Then they must be at lower disintegration levelDk(xV) withkiA. Therefore,πA has no refinement at its own or a higher disintegration level. More formally, letξL(V),ξπA andξπA sinceπA only contains singletons apart fromA the partitionξ must split the blockA into multiple blockscξ|A. Sinceι(xA)>0 we know that
    miξ|A(xA)=logpA(xA)cξ|Apc(xc)>0
    so thatcξ|Apc(xc)<pA(xA) and
    miξ(xV)=logpV(xV)cξ|Apc(xc)iV\Api(xi)
    >logpV(xV)pA(xA)iV\Api(xi)
    =miπA(xV).
    Thereforeξ is on a disintegration levelDk(xV) withk>iA, but this is true for any refinement ofπA soDiA(xV)πA= andπADiA(xV).
     ☐
    We mentioned inSection 3.2 that the expectation value of SLImiπ(xA) is the (specific) multi-informationMIπ(XA). A positive SLI value ofxA implies a positive expectation valueMIπ(XA). Therefore everyι-entityxA implies positive specific multi-informationsMIπ(XA) with respect to any partitionπ. We put this into the following corollary.
    Corollary 2.
    Under the conditions of Theorem 6 and for everyDi(xV)D(xV) we find for everybπ withπDi(xV) that there are only the following possibilities:
    1. 
    b is a singleton, i.e.,b={i} for someiV, or
    2. 
    Xb is completely (not only locally) integrated, i.e.,I(Xb)>0.
    here
    I(XA):=minπL(A)\0MIπ(XA).
    Proof. 
    SinceMIπ(XA) is a Kullback–Leibler divergence we know from Gibbs’ inequality thatMIπ(XA)0 andMIπ(XA)=0 if and only if for allxAXA we havepA(xA)=bπpb(xb). To see thatMIπ(XA) is a Kullback–Leibler divergence note:
    MIπ(XA):=xAXApA(xA)miπ(xA)
    =xAXApA(xA)logpA(xA)bpb(xb)
    =KL[pA||bπpb].
    Now let a specificxAXA be aι-entity. Then for allπL(A)\0 we have
    logpA(xA)bpb(xb)>0,
    which implies that
    pA(xA)bpb(xb)
    and therefore
    KL[pA||bπpb]>0
    which impliesI(XA)>0. ☐

    3.5. Disintegration Interpretation

    InSection 3.2 we motivated our choice of positive complete local integration as a criterion for entities. This motivation is purely heuristic and starts from the intuition that an entity is a structure for which every part makes every other part more probable. While this heuristic argument seems sufficiently intuitive to be of a certain value we would much rather have a formal reason why an entity criterion is a “good” entity criterion. In other words we would ideally have a formal problem that is best solved by the entities satisfying the criterion. An example of a measure that has such an associated interpretation is the mutual information whose maximum over the input distributions is the channel capacity. Without a formal problem associated toι-entities there remains a risk that they (and maybe the whole concept of entities and identity over time) are artefacts of an ill-conceived conceptual approach.
    Currently, we are not aware of an analogous formal problem that is solved byι-entities. However, the different viewpoint provided by the disintegration theorem may be a first step towards finding such a problem. We will now discuss some alternative interpretations of SLI and see how CLI can be seen from a different perspective due to the disintegration theorem. These interpretations also exhibit why we chose to include the logarithm into the definition of SLI.
    Using the disintegration theorem (Theorem 6) allows us to take another point of view onι-entities. The theorem states that for each trajectoryxVXV of a multivariate Markov chain the refinement-free disintegration hierarchy only contains partitions whose blocks are completely integrated patterns i.e., they only containι-entities. At the same time the blocks of all those partitions together areallι-entities that occur in that trajectory.
    A partition in the refinement-free disintegration hierarchy is always a minimal/finest partition reaching such a low specific local integration.
    Eachι-entity is then a blockxc withcπ of a partitionπD(xV) for some trajectoryxVXV of the multivariate Markov chain.
    Let us recruit the interpretation from coding theory above. If we want to find the optimal encoding for the entire multivariate Markov chain{Xi}iV this means finding the optimal encoding for the random variableXV whose values are the trajectoriesxVXV. The optimal code has the codeword lengthslogpV(xV) for each trajectoryxV. The partitions in the lowest levelD1(xV) in the refinement-free disintegration hierarchy forxV have minimal specific local integration i.e.,
    miπ(xV)=logpV(xV)cπpc(xc)
    is minimal among all partitions. At the same time these partitions are the finest partitions that achieve this low specific local integration. This implies on the one hand that the codeword lengths of the product codes associated to these partitions are the shortest possible forxV among all partitions. On the other hand these partitions split up the trajectory in as many parts as possible while generating these shortest codewords. In this combined sense the partitions inD1(xV) generate the “best” product codes for the particular trajectoryxV.
    Note that theexpected codeword length of the product code:
    xVXVpV(xV)(logcπpc(xc))
    which is the more important measure for encoding in general, might not be short at all, i.e., it might not be an efficient code for arbitrary trajectories. The product codes based on partitions inD1(xV) are specifically adapted to assign a short codeword toxV, i.e., to a single trajectory or story of this system. As product codes they are constructed/forced to describexV as a composition of stochastically independent parts. More precisely they are constructed in the way that would be optimal for stochastically independent parts.
    Nonetheless, the product codes exist (they can be generated using Huffman coding or arithmetic coding [37] based on the product probability) and are uniquely decodable. The parts/blocks of them are theι-entities. We mentioned before that we would like to find a problem that is solved byι-entities. This is then equivalent to finding a problem that is solved by the according product codes. Can we construct such a problem? This question is still open. A possible direction for finding such a problem may be the following line of reasoning. Say for some reason the trajectoryxV is more important than any other and that we want to “tell its story” as a story of as many as possible (stochastically) independent parts (that are maybe not really stochastically independent) i.e., say we wanted to encode the trajectoryas if it were a combination of as many as possible stochastically independent parts/events. And becausexV is more important than all other trajectories we wanted the codeword forxV to be the shortest possible. Then we would use the product codes of partitions in the refinement-free disintegration hierarchy because those combine exactly these two conditions. The pseudo-stochastically-independent parts would then be the blocks of these partitions which according to the disintegration theorem are exactly theι-entities occurring inxV.
    Speculating about where the two conditions may arise in an actual problem, we mention that the trajectory/history that we (real living humans) live in is more important to us than all other possible trajectories of our universe (if there are any). What happens/happened in this trajectory needs to be communicated more often than what happens/happened in counterfactual trajectories. Furthermore, a good reason to think of a system as composite of as many parts as possible is that this reduces the number of parameters that need to be learned which in turn improves the learning speed (see e.g., [41]). So the entities that mankind has partitioned its history into might be related to theι-entities of the universe’s history. These would compose the shortest product codes for what actually happened. The disintegration level might be chosen to optimise rates of model learning.
    Recall that this kind of product code is not the optimal code in general (which would be the one with shortest expected codeword length). It is possibly more of a naive code that does not require deep understanding of the dynamical system but instead can be learned fast and works. The language of physics for example might be more optimal in the sense of shortest expected codeword lengths reflecting a desire to communicate efficiently about all counterfactual possibilities as well.

    3.6. Related Approaches

    We now discuss in some more detail than inSection 1.3 the approaches of Beer [14] and Balduzzi [26].
    In Beer [14] the construction of the entities proceeds roughly as follows. First the maps from the Moore neighbourhood to the next state of a cell are classified into five classes oflocal processes. Then these are used to reveal the dynamical structure in the transitions from one time-slice (or temporal part) of a pattern to the next. The used example patterns are the famous block, blinker, and glider and they are considered including their temporal extension. Using both the processes and the spatial patterns/values/components (the black and white values of cells are called components) networks characterising the organisation of the spatiotemporally extended patterns are constructed. These can then be investigated for theirorganisational closure. Organisational closure occurs if the same process-component relations reoccur at a later time. Boundaries of the spatiotemporal patterns are identified by determining the cells around the pattern that have to be fixed to get re-occurrence of the organisation.
    Beer [14] mentions that the current version of this method of identifying entities has its limitations. If the closure is perturbed or delayed and then recovered the entity still looses its identity according to this definition. Two possible alternatives are also suggested. The first is to define thepotential for closure as enough for the ascription of identity. This is questioned as well since a sequence of perturbations can take the entity further and further away from its “defining” organisation and make it hard to still speak of a defining organisation at all. The second alternative is to define that the persistence of any organisational closure indicates identity. It is suggested that this would allow blinkers to transform to gliders.
    We note that using the entity criterion we propose does not need similar choices to be made since it is not based on the re-occurrence of any organisation. Later time-slices ofι-entities need no organisational (or any other) similarity to earlier ones. Another, possibly only small, advantage is that our criterion is formalised and reasonably simply to state. Whether this is possible for the organisational closure based entities remains to be seen.
    This is related to the philosophical discussion about identity across possible worlds [33].
    Some further parallels can be drawn between the present work and Balduzzi [26] especially if we take into account the disintegration theorem. Given a trajectory (entire time-evolution) of the system in both cases a partition is sought which fulfills a particular trajectory-wide optimality criterion. Also in both cases, each block of the trajectory-wide partition fulfills a condition with respect to its own partitions. For our conditions the disintegration theorem exposes the direct connection between the trajectory-wide and the block-specific conditions. Such a connection is not known for other approaches. The main reason for this might be the simpler formal expression of CLI and SLI compared to the IIT approaches.
    In how far our approach and the IIT approaches lead to coinciding or contradicting results is beyond the scope of this paper and constitutes future work. One avenue to pursue here are differences with respect to entities occurring in multiple trajectories as well as the possibility of overlapping entities within single trajectories.

    4. Examples

    In this section we investigate the structure of integrated and completely locally integrated spatiotemporal patterns as it is revealed by the disintegration hierarchy. First we take a quick look at the trivial case of a set of independent random variables. Then we look at two very simple multivariate Markov chains. We use the disintegration theorem (Theorem 6) to extract the completely locally integrated spatiotemporal patterns.

    4.1. Set of Independent Random Variables

    Let us first look at a set{Xi}iV of independently and identically distributed random variables. For each trajectoryxVXV we can then calculate SLI with respect to a partitionπL(V). For everyAV and everyxAXA we havepA(xA)=iApi(xi). Then we find for everyπL(V):
    miπ(xV)=0.
    This shows that the disintegration hierarchy for eachxVXV contains only a single disintegration levelD(xV)={D1} withD1=L(V). The finest partition ofL(V) is its zero element0 which then constitutes the only element of the refinement-free disintegration levelD1={0}. Recall that the zero element of a partition lattice only consists of singleton sets as blocks. The set of completely locally integrated patterns i.e., the set ofι-entities in a given trajectoryxV is then the set{xi:iV}.
    Next we will look at more structured systems.

    4.2. Two Constant and Independent Binary Random Variables:MC=

    4.2.1. Definition

    Define the time- and space-homogeneous multivariate Markov chainMC= with Bayesian network{Xj,t}j{1,2},t{0,1,2} and
    pa(j,t)=ift=0,{(j,t1)}else,
    pj,t(xj,t|xj,t1)=δxj,t1(xj,t)=1ifxj,t=xj,t1,0else,
    pj,0(xj,0)=1/4.
    The Bayesian network can be seen inFigure 4.

    4.2.2. Trajectories

    In order to get the disintegration hierarchyD(xV) we have to choose a trajectoryxV and calculate the SLI of each partitionπL(V). There are only four different trajectories possible inMC= and they are:
    xV=(x1,0,x2,0,x1,1,x2,1,x1,2,x2,2)=(0,0,0,0,0,0)ifx1,0=0,x2,0=0;(0,1,0,1,0,1)ifx1,0=0,x2,0=1;(1,0,1,0,1,0)ifx1,0=1,x2,0=0;(1,1,1,1,1,1)ifx1,0=1,x2,0=1.
    Each of these trajectories has probabilitypV(xV)=1/4 and all other trajectories havepV(xV)=0. We call the four trajectories thepossible trajectories. We visualise the possible trajectories as a grid with each cell corresponding to one variable. The spatial indices are constant across rows and time-slicesVt correspond to the columns. A white cell indicates a 0 and a black cell indicates a 1. This results in the grids ofFigure 5.

    4.2.3. Partitions of Trajectories

    The disintegration hierarchy is composed out of all partitions in the lattice of partitionsL(V). Recall that we are partitioning the entire spatially and temporally extended index setV of the Bayesian network and not only the time-slices. Blocks in the partitions ofL(V) are then, in general, spatiotemporally and not only spatially extended patterns.
    The number of partitions|L(V)| of a set of|V|=6 elements isB6=203 (Bn is the Bell number ofn). These partitionsπ can be classified according to their cardinality|π| (number of blocks in the partition). The number of partitions of a set of cardinality|V| into|π| blocks is the Stirling number(|V|,|π|). For|V|=6 we find the Stirling numbers:
    Entropy 19 00230 i002
    It is important to note that the partition latticeL(V) is the same for all trajectories as it is composed out of partitions ofV. On the other hand the values of SLImiπ(xV) with respect to the partitions inL(V) generally depend on the trajectoryxV.

    4.2.4. SLI Values of the Partitions

    We can calculate the SLImiπ(xV) of every trajectoryxV with respect to each partitionπL(V) according to Definition 5:
    miπ(xV):=logpV(xV)bπpb(xb).
    In the case ofMC= the SLI values with respect to each partition do not depend on the trajectories. For an overview we plotted the values of SLI with respect to each partitionπL(V) for any trajectory ofMC= inFigure 6.
    We can see inFigure 6 that the cardinality does not determine the value of SLI. At the same time there seems to be a trend to higher values of SLI with increasing cardinality of the partition. We can also observe that only five different values of SLI are attained by partitions on this trajectory. We will collect these classes of partitions with equal SLI values in the disintegration hierarchy next.

    4.2.5. Disintegration Hierarchy

    In order to get insight into the internal structure of the partitions of a trajectoryxV we obtain the disintegration hierarchyD(xV) (see Definition 10) and look at the Hasse diagrams of each of the disintegration levelsDi(xV) partially ordered by refinement. If we sort the partitions of any trajectory ofMC= according to increasing SLI value we obtainFigure 7. There we see groups of partitions attaining the SLI values{0,1,2,3,4} (precisely) these groups are the disintegration levels{D1(xV),D2(xV),D3(xV),D4(xV),D5(xV)}. The exact numbers of partitions in each of the levels are:
    Entropy 19 00230 i003
    Next we look at the Hasse diagram of each of those disintegration levels. Since the disintegration levels are subsets of the partition latticeL(V), they are in general not lattices by themselves. The Hasse diagrams (seeAppendix B for the definition) visualise the set of partitions in each disintegration level partially ordered by refinement ⊲ . The Hasse diagrams are shown inFigure 8. We see immediately that within each disintegration level apart from the first and the last the Hasse diagrams contain multiple connected components.
    Furthermore, within a disintegration level the connected components often have the same Hasse diagrams. For example, inD2 (Figure 8b) we find six connected components with three partitions each. The identical refinement structure of the connected components is related to the symmetries of the probability distribution over the trajectories. As it requires further notational overhead and is straightforward we do not describe these symmetry properties formally. In order to see the symmetries, however, we visualise the partitions themselves in the Hasse diagrams inFigure 9. We also visualise examples of the different connected components in each disintegration level inFigure 10.
    Recall that due to the disintegration theorem (Theorem 6) we are interested especially in partitions that do not have refinements at their own or any preceding (i.e., lower indexed) disintegration level. These partitions consist of blocks that are completely integrated. i.e., all possible partitions of each of the blocks results in a positive SLI value or is a single node of the Bayesian network. The refinement-free disintegration hierarchyD(xV) contains only these partitions and is shown in a Hasse diagram inFigure 11.

    4.2.6. Completely Integrated Patterns

    Having looked at the disintegration hierarchy we now make use of it by extracting the completely (When it is clear from context that we are talking about complete local integration we drop “local” for the sake of readability.) integrated patterns (ι-entities) of the four trajectories ofMC=. Recall that due to the disintegration theorem (Theorem 6) we know that all blocks in partitions that occur in the refinement-free disintegration hierarchy are either singletons or correspond toι-entities. If we look at the refinement-free disintegration hierarchy inFigure 11 we see that many blocks occur in multiple partitions and across disintegration levels. We also see that there are multiple blocks that are singletons. If we ignore singletons, which are trivially integrated as they cannot be partitioned, we end up with eight different blocks. Since the disintegration hierarchy is the same for all four possible trajectories these blocks are also the same for each of them (note that this is the case forMC= but not in general as we will see inSection 4.3). However, the patterns that result are different due to the different values within the blocks. We show the eightι-entities and their complete local integration (Definition 6) on the first trajectory inFigure 12 and on the second trajectory inFigure 13.
    Since the disintegration hierarchies are the same for the four possible trajectories ofMC= we get the same refinement-free partitions and therefore the same blocks containing theι-entities. This is apparent when comparingFigure 12 andFigure 13 and noting that each pattern occurring on the first trajectory has a corresponding pattern on the second trajectory that differs (if at all) only in the values of the cells it fixes and not in what values it fixes. More visually speaking, for each pattern inFigure 12 there is a corresponding pattern inFigure 13 leaving the same cells grey.
    If we are not interested in a particular trajectory, we can also look at all differentι-entities on any trajectory. ForMC= these are shown inFigure 14.
    We see that allι-entitiesxO have the same value of complete local integrationι(xO)=1. This can be explained using the deterministic expression for the SLI of Equation (30) and noting that forMC= if any of the valuesxj,t is fixed by a pattern then(xj,s)sT=xj,T are determined since they must be the same value. This means that the number of trajectoriesN(xj,S) in which any patternxj,S withST occurs is eitherN(xj,S)=0, if the pattern is impossible, orN(xj,S)=2 since there are two trajectories compatible with it. Note that all blocksxb in any of theι-entities and allι-entitiesxO themselves are of the formxj,S withST. LetN(xj,S)=:N and plug this into Equation (30) for an arbitrary partitionπ:
    miπ(xO)=(|π|1)log|XV0|logbπN(xb)N(xO)
    =(|π|1)log|XV0|logN|π|N
    =(|π|1)log|XV0|N.
    To get the complete local integration value we have to minimise this with respect toπ where|π|2. So for|XV0|=4 andN=2 we getι(xO)=1.
    Another observation is that theι-entities are all limited to one of the two rows. This shows on a simple example that, as we would expect,ι-entities cannot extend from one independent process to another.

    4.3. Two Random Variables with Small Interactions

    In this section we look at a system almost identical to that ofSection 4.2 but with a kind of noise introduced. This allows all trajectories to occur and is designed to test whether the spatiotemporal patterns maintain integration in the face of noise.

    4.3.1. Definition

    We define the time- and space-homogeneous multivariate Markov chainMCϵ via the Markov matrixP with entries
    Pf(x1,t+1,x2,t+1),f(x1,t,x2,t)=pJ,t+1(x1,t+1,x2,t+1|x1,t,x2,t)
    where we define the functionf:{0,1}2[1:4] via
    f(0,0)=1,f(0,1)=2,f(1,0)=3,f(1,1)=4.
    With this conventionP is
    P=13ϵϵϵϵϵ13ϵϵϵϵϵ13ϵϵϵϵϵ13ϵ
    This means that the state ofboth random variables remains the same with probability13ϵ and transitions into each of the other possible combinations with probabilityϵ. The noise then does not act independently on both random variables but disturbs the joint state. This makesι-entities possible that extend across the two processes. In the following we setϵ=1/100. The initial distribution is again the uniform distribution
    pj,0(xj,0)=1/4.
    Writing this multivariate Markov chain as a Bayesian network is possible but the conversion is tedious. The Bayesian network one obtains can be seen inFigure 15.

    4.3.2. Trajectories

    In this system all trajectories are possible trajectories. This means there are26=64 possible trajectories, since every one of the six random variables can be in any of its two states. There are three classes of trajectories with equal probability of occurring. The first class with the highest probability of occurring are the four possible trajectories ofMC=. Then there are 24 trajectories that make a singleϵ-transition (i.e., a transition where the next pair is not the same as the current one(x1,t+1,x2,t+1)(x1,t,x2,t), these transitions occur with probabilityϵ), and 36 trajectories with twoϵ-transitions. We pick only one trajectory from each class. The representative trajectories are shown inFigure 16 and will be denotedxV1,xV2, andxV3 respectively. The probabilities arepV(xV1)=0.235225,pV(xV2)=0.0024250,pV(xV3)=0.000025.

    4.3.3. SLI Values of the Partitions

    Again we calculate the SLImiπ(xV) of every trajectoryxV with respect to each partitionπL(V). In contrast toMC= the SLI values with respect to each partition ofMCϵ do depend on the trajectories. We plot the values of SLI with respect to each partitionπL(V) for the three representative trajectories inFigure 17.
    It turns out that the SLI values ofxV1 are almost the same as those ofMC= inFigure 6 with small deviations due to the noise. This should be expected asxV1 is also a possible trajectory ofMC=. Also note that trajectoriesxV2,xV3 exhibit negative SLI with respect to some partitions. In particular,xV3 has non-positive SLI values with respect to any partition. This is due to the low probability of this trajectory compared to its parts. The blocks of any partition have so much higher probability than the entire trajectory that the product of their probabilities is still greater or equal to the trajectory probability.

    4.3.4. Completely Integrated Patterns

    In this section we look at theι-entities for each of the three representative trajectoriesxVk,k{1,2,3}. They are visualised together with their complete local integration values inFigure 18,Figure 19 andFigure 20. In contrast to the situation ofMC= we now haveι-entities with varying values of complete local integration.
    On the first trajectoryxV1 we find all the eight patterns that are completely locally integrated inMC= (seeFigure 13). These are also more than an order of magnitude more integrated than the rest of theι-entities. This is also true for the other two trajectories.

    5. Discussion

    InSection 3.1 we have argued for the use of patterns as candidates for entities. Patterns can be composed of arbitrary spatially and temporally extended parts of trajectories. We have seen in Theorem 1 that they are distinct from arbitrary subsets of trajectories. The important insight here is that patterns are structures that occur within trajectories but this cannot be said of sets of trajectories.
    One of the main target applications of patterns is in time-unrolled Bayesian networks of cellular automata like those inFigure 1. Patterns in such Bayesian networks become spatiotemporal patterns like those used to describe the glider, block, and blinker in the Game of Life cellular automaton by Beer [14]. We would also like to investigate whether the latter spatiotemporal patterns areι-entities. However, at the present state of the computational models and, without approximations, this was out of reach computationally. We will discuss this further below.
    InSection 3.3 we defined SLI and inSection 3.3 gave its expression for deterministic Bayesian networks (including cellular automata) as well. We also established the least upper bound of SLI with respect to a partitionπ of cardinalityn for a patternxA with probabilityq. This upper bound is achieved if each of the blocksxb in the partitionπ occur if and only if the whole patternxO occurs. This is compatible with our interpretation of entities since in this case clearly the occurrence of any part of the pattern leads necessarily to the occurrence of the entire pattern (and not only vice versa).
    We also presented a candidate for a greatest lower bound of SLI with respect to a partition of cardinalityn for a pattern with probabilityq. Whether this is the greatest lower bound or not it shows a case for which SLI is always negative. This happens if either the whole patternxA occurs (with probabilityq) or one of the “almost equal” patterns occurs, each with identical probability. A patternyA is almost equal toxA with respect toπ in this sense if it only differs at one of the blocksbπ i.e., ifyA=(xA\b,zb) wherezbxb. This construction makes as many parts as possible (i.e., all but one) occur as many times as possible without the whole pattern occurring. This creates large marginalised probabilitiespb(xb) for each part/block which means that their product probability also becomes large.
    Beyond these quantitative interpretations an interpretation of the greatest lower bound candidate seems difficult. A more intuitive candidate for the opposite of an integrated pattern seem to be patterns with independent parts. i.e., zero SLI but quantitatively these are not on the opposite end of the SLI spectrum. A more satisfying interpretation of the presented candidate is still to be found.
    We also proved the disintegration theorem which relates states that the refinement-free partitions of a trajectory among those partitions achieving a particular SLI value consist ofι-entities only, where anι-entity is a pattern with positive CLI. This theorem allows us to interpret theι-entities in new ways and may lead to a more formal or quantitative justification ofι-entities. It is already a first step in this direction since it establishes a special role of theι-entities within trajectories of Bayesian networks. A further justification would tell us what in turn the refinement-free partitions can be used for. We have discussed a possible direction for further investigation in detail inSection 3.5. This tried to connect theι-entities with a coding problem.
    InSection 4 we investigated SLI and CLI in three simple example sets of random variables. We found that if the random variables are all independently distributed the according entities are just all the possiblexjXj of each of the random variablesXj{Xi}iV. This is what we would expect from an entity criterion. There are no entities with any further extent than a single random variable and each value corresponds to a different entity.
    For the simple Markov chainMC= composed out of two independent and constant processes we presented the entire disintegration hierarchy and the Hasse diagrams of each disintegration level ordered by refinement. The Hasse diagrams reflected the highly symmetric dynamics of the Markov chain via multiple identical components. For the refinement-free disintegration hierarchy we then get multiple partitions at the same disintegration level as well. Different partitions of the trajectory imply overlapping blocks which in the case of the refinement-free partition areι-entities. So in general theι-entities at a particular disintegration level are not necessarily unique and can overlap. We also saw inFigure 11 that the sameι-entities can occur on multiple disintegration levels.
    Theι-entities ofMC= included the expected three timestep constant patterns within each of the two independent processes. It also included the two timestep parts of these constant patterns. This may be less expected. It shows that parts ofι-entities can beι-entities themselves. We note that these “sub-entities (those that are parts of larger entities) are always on a different disintegration level than their” super-entities (the larger entities). We can speculate that the existence of such sub- and super-entities on different disintegration levels may find an interpretation through multicellular organisms or similar structures. However, the overly simplistic examples here only serve as basic models for the potential phenomena, but are still far too simplistic to warrant any concrete interpretation in this direction.
    We also looked at a version ofMC= perturbed by noise, denotedMCϵ. We found that the entities ofMC= remain the most strongly integrated entities inMCϵ. At the same time new entities occur. So we observe that inMCϵ the entities vary from one trajectory to another (Figure 18,Figure 19 andFigure 20). We also observe spatially extended entities i.e., entities that extend across both (formerly independent) processes. We also observe entities that switch from one process to the other (from top row to bottom row or vice versa). The capacity of entities to exhibit this behaviour may be necessary to capture the movement or metabolism of entities in more realistic scenarios. In Biehl et al. [8] we argued that these properties are important and showed that they hold for a crude approximation of CLI (namely for SLI with respect toπ=0) but not for the full CLI measure.
    We established that theι-entities:
    • correspond to fixed single random variables for a set of independent random variables,
    • can vary from one trajectory to another,
    • and can change the degrees of freedom that they occupy over time,
    • can be ambiguous at a fixed level of disintegration due to symmetries of the system,
    • can overlap at the same level of disintegration due to this ambiguity,
    • can overlap across multiple levels of disintegration i.e., parts ofι-entities can beι-entities again.
    In general the examples we investigated concretely are too small to sufficiently support the concept of positive CLI as an entity criterion. Due to the extreme computational burden, this may remain the case for a while. For a straightforward calculation of the minimum SLI of a trajectory of a Bayesian network{Xi}iV with|V|=k nodes we have to calculate the SLI with respect toBk partitions. According to (Bruijn [43], p. 108) the Bell numbersBn grow super-exponentially. Furthermore, to evaluate the SLI we need the joint probability distribution of the Bayesian network{Xi}iV. Naively, this means we need the probability (a real number between 0 and 1) of each trajectory. If we only have binary random variables, the number of trajectories is2|V| which make the straightforward computation of disintegration hierarchies unrealistic even for quite small systems. If we take a seven by seven grid of the game of life cellular automaton and want to look at three time-steps we have|V|=147. If we use 32 bit floating numbers this gives us around1030 petabytes of storage needed for this probability distribution. We are sceptical that the exact evaluation of reasonably large systems can be achieved even with non-naive methods. This suggests that formal proofs may be the more promising way to investigate SLI and CLI further.

    Acknowledgments

    Part of the research was performed during Martin Biehl’s time as a JSPS International Research Fellow and as an ELSI Origins Network (EON) long term visitor. Daniel Polani was supported in part by the EC Horizon 2020 H2020-641321 socSMCs FET Proactive project.

    Author Contributions

    Martin Biehl, Takashi Ikegami, and Daniel Polani conceived the problem and the measures, and wrote the paper; Martin Biehl proved the theorems and conceived and calculated the examples. All authors have read and approved the final manuscript.

    Conflicts of Interest

    The authors declare no conflict of interest.

    Abbreviations

    The following abbreviations are used in this manuscript:
    SLISpecific local integration
    CLIComplete local integration

    Appendix A. Kronecker Delta

    The Kronecker–delta is used in this paper to represent deterministic conditional distributions.
    Definition A1 (Delta).
    Let X be a random variable with state spaceX then forxX and a subsetCX define
    δx(C):=1ifxC,0else.
    We will abuse this notation if C is a singleton setC={x¯} by writing
    δx(x¯):=1ifx{x¯},0else.
    =1ifx=x¯,0else.
    The second line is a more common definition of the Kronecker–delta.
    Remark:
    • LetX,Y be two random variables with state spacesX,Y andf:XY a function such that
      p(y|x)=δf(x)(y),
      then
      p(y)=xpY(y|x)pX(x)
      =xδf(x)(y)pX(x)
      =xδx(f1(y))pX(x)
      =xf1(y)pX(x)
      =pX(f1(y)).

    Appendix B. Refinement and Partition Lattice Examples

    This appendix recalls the definitions of set partitions, refinement and coarsening of set partitions, and Hasse diagrams. It also shows the Hasse diagram of an example partition lattices. The definitions are due to Grätzer [28].
    Definition A2.
    A (set) partitionπ of a setX is a set of non-empty subsets (called blocks) ofX satisfying
    1. 
    for allx1,x2π, ifx1x2, thenx1x2=,
    2. 
    xπ=X.
    We writeL(X) for the set of all partitions ofX.
    Remark:
    • In words, a partition of a set is a set of disjoint non-empty subsets whose union is the whole set.
    Definition A3.
    If two elementsx1,x2X belong to the same block of a partition π ofX writex1πx2. Also writex1/π for the block{x2X:x2πx1}.
    Definition A4 (Refinement and coarsening).
    We define the binary relationbetween partitionsπ,ρL(X) as:
    πρifx1πx2impliesx1ρx2.
    In this case π is called a refinementof ρ and ρ is called a coarseningof π.
    Remarks:
    • More intuitively,π is a refinement ofρ if all blocks ofπ can be obtained by further partitioning the blocks ofρ. Conversely,ρ is a coarsening ofπ if all blocks inρ are unions of blocks inπ.
    • Examples are contained in the Hasse diagrams (defined below) shown inFigure A1.
    Definition A5 (Hasse diagram).
    A Hasse diagram is a visualisation of a poset (including lattices). Given a poset A ordered bythe Hasse diagram represents the elements of A by dots. The dots representing the elements are arranged in such a way that ifa,bA,ab, andab then the dot representing a is drawn below the dot representing b. An edge is drawn between two elementsa,bA ifa:b, i.e., if b covers a. If edges cross in the diagram this does not mean that there is an element of A where they cross and edges never pass through a dot representing an element.
    Remarks:
    • No edge is drawn between two elementsa,bA ifab but nota:b.
    • Only drawing edges for the covering relation does not imply a loss of information about the poset since the covering relation determines the partial order completely.
    • For an example Hasse diagrams seeFigure A1.
    Entropy 19 00230 g021 550
    Figure A1. Hasse diagrams of the partition lattice of the four element set.
    Figure A1. Hasse diagrams of the partition lattice of the four element set.
    Entropy 19 00230 g021

    Appendix C. Bayesian Networks

    Intuitively a Bayesian network is a graph representation of the inter-dependencies of a set of random variables. Recall that any joint probability distribution over a set{Xi}iI withI={1,...,n} of random variables can always be written as a product of conditional probabilities:
    pI(x1,...,xn)=i=1n1pi(xi|xi+1,...,xn)p(xn).
    This also holds for any reordering of the indicesif(i) withf:{1,...n}{1,...n} bijective.
    In many cases however this factorisation can be simplified. Often some of the conditional probabilitiesp(xi|xi+1,...,xn) do not depend on all variables{xi+1,...,xn} listed in the product of Equation (A11). For example,X1 might only depend onX2 so we would havep(x1|x2,...,xn)=p(x1|x2). Note that the latter conditional probability is determined by fixing|X2|(|X1|1) probabilities whereas the former needsi=1n|Xi+1|(|X1|1) probabilities to be fixed. This means the number of parameters (the probabilities) of the joint distributionp(x1,...,xn) is often much smaller than suggested by Equation (A11). One way to encode this simplification and make sure that we are dealing only with joint probabilities that reflect the dependencies we allow are Bayesian networks. These can be defined as follows. First we define graphs that are factorisation compatible with joint probability distributions over a set of random variables and then define the Bayesian networks as pairs of joint probability distributions and a factorisation compatible graph.
    Definition A6.
    A directed acyclic graphG=(V,E) with nodes V and edges E is factorisation compatiblewith the joint probabilities the probabilities of a probability distributionpV:XV[0,1] iff the latter can be factorised in the way suggested by G which means:
    pV(xV)=iVp(xi|xpa(i)).
    wherepa(i) denotes the parents of node i according to G.
    Remark:
    • In general there are multiple directed acyclic graphs that are factorisation compatible with the same probability distribution. For example, if we choose any total order for the nodes inV and define a graph bypa(i)={jV:j<i} then Equation (A12) becomes Equation (A11) which always holds.
    Definition A7 (Bayesian network).
    A Bayesian networkis a (here assumed finite) set of random variables{Xi}iV and a directed acyclic graphG=(V,E) with nodes indexed by V such that the joint probability distributionpV:XV[0,1] of{Xi}iV is factorisation compatible with G. We also refer to the graph set of random variables{Xi}iV as a Bayesian network implying the graph G.
    Remarks:
    • On top of constituting the vertices of the graphG the setV is also assumed to be totally ordered in an (arbitrarily) fixed way. Whenever we use a subsetAV to index a sequence of variables in the Bayesian network (e.g., inpA(xA)) we orderA according to this total order as well.
    • Since{Xi}iV is finite andG is acyclic there is a setV0 of nodes without parents.
    Definition A8 (Mechanism).
    Given a Bayesian network{Xi}iV with index set V for each node with parents i.e., for each nodeiV\V0 (withV0 the set of nodes without parents) the mechanism of node i or also called the mechanism of random variableXi is the conditional probability (also called a transition kernel)pi:Xpa(i)×Xi[0,1] mapping(xpa(i),xi)pi(xi|xpa(i)). For eachxpa(i) the mechanism defines a probability distributionpi(.|xpa(i)):Xi[0,1] satisfying (like any other probability distribution)
    xiXipi(xi|xpa(i))=1.
    Remarks:
    • We could define the set of all mechanisms to formally also include the mechanisms of the nodes without parentsV0. However, in practice it makes sense to separate the nodes without parents as those that we choose an initial probability distribution over (similar to a boundary condition) which is then turned into a probability distributionpV over the entire Bayesian network{Xi}iV via Equation (A12). Note that in Equation (A12) the nodes inV0 are not explicit as they are just factorspi(xi|xpa(i)) withpa(i)=.
    • To construct a Bayesian network, take graphG=(V,E) and equip each nodei(V\V0) with a mechanismpi:Xpa(i)×Xi[0,1] and for each nodeiV0 choose a probability distributionpi:Xi[0,1]. The joint probability distribution is then calculated by the according version of Equation (A12):
      pV(xV)=iV\V0pi(xi|xpa(i))jV0pj(xj).

    Appendix C.1. Deterministic Bayesian Networks

    Definition A9 (Deterministic mechanism).
    A mechanismpi:Xpa(i)×Xi[0,1] is deterministic if there is a functionfi:Xpa(i)Xi such that
    pi(xi|xpa(i))=δfi(xpa(i))(xi)=1ifxi=fi(xpa(i)),0else.
    Definition A10 (Deterministic Bayesian network).
    A Bayesian network{Xi}iV is deterministic if all its mechanisms are deterministic.
    Theorem A1.
    Given a deterministic Bayesian network{Xi}iV there exists a functionfV\V0:XV0XV\V0 which given a valuexV0 of the random variables without parentsXV0 returns the valuexV\V0 fixing the values of all remaining random variables in the network.
    Proof. 
    According to Equation (A12), the definition of conditional probabilities, and using the deterministic mechanisms we have:
    pV\V0(xV\V0|xV0)=iV\V0pi(xi|xpa(i))
    =iV\V0δfi(xpa(i))(xi).
    For everyxV0 the product on the right hand side is a probability distribution and therefore is always greater or equal to zero and maximally one. Also for everyxV0 the sum of the probabilities over allxV\V0XV\V0 is equal to one. As a product of zeros and/or ones the right hand side on the second line can only either be zero or one. This means for everyxV0 there must be a uniquexV\V0 such that the right hand side is equal to one. Define this as the value of the functionfV\V0(xV0). ☐
    Theorem A2 (Pattern probability in a deterministic Bayesian network).
    Given a deterministic Bayesian network (Definition A10) and uniform initial distributionpV0:XV0[0,1] the probability of the occurrence of a patternxA is:
    pA(xA)=N(xA)|XV0|
    whereN(xA) is the number of trajectoriesx¯V in whichxA occurs.
    Proof. 
    Recall that in a deterministic Bayesian network we have a functionfV\V0:XV0XV\V0 (see Theorem A1) which maps a given value ofxV0 to the value of the rest of the networkxV\V0. We calculatepA(xA) for an arbitrary subsetAV. To make this more readable letAV0=A0,A\V0=Ar,B:=V\A,BV0=B0, andB\V0=Br. Then
    pA(xA)=x¯BpV(xA,x¯B)
    =x¯B0,x¯BrpV(xAr,x¯Br|xA0,x¯B0)pV0(xA0,x¯B0)
    =x¯B0,x¯BrδfV\V0(xA0,x¯B0)(xAr,x¯Br)pV0(xA0,x¯B0)
    =x¯Br{x¯B0:(xA0,x¯B0)fV\V01(xAr,x¯Br)}pV0(xA0,x¯B0)
    =1|XV0|x¯Br|{x¯B0XB0:(xA0,x¯B0)fV\V01(xAr,x¯Br)}|
    =1|XV0|N(xA)
    In the second to last line we used the uniformity of the initial distributionpV0. The second sum in the second to last line counts all initial conditions that are compatible withxA0 and lead to the occurrence ofxAr together with somex¯Br. The first one then sums over all suchx¯Br to get all initial conditions that are compatible withxA0 and lead to the occurrence ofxAr. Together these are all initial conditions compatible withxA. In a deterministic system the number of initial conditions that lead to the occurrence of a patternxA is equal to the number of trajectoriesN(xA) since every different initial condition will produce a single, unique trajectory. ☐
    Remark:
    • Due to the finiteness of the network, deterministic mechanisms, and chosen uniform initial distribution the minimum possible non-zero probability for a patternxA is1/|XV0|. This happens for any pattern that only occurs in a single trajectory. Furthermore, the probability of any pattern is a multiple of1/|XV0|.

    Appendix C.2. Proof of Theorem 2

    Proof. 
    Follows by replacing the probabilitiespO(xO) andpb(xb) in Equation (22) with their deterministic expressions from (Theorem A2), i.e.,pA(xA)=N(XA)/|XV0|. Then:
    miπ(xO):=logpO(xO)bπpb(xb)
    =logN(xO)|XV0|bπN(xb)|XV0|
    =logN(xO)|XV0||XV0||π|bπN(xb)
    =log|XV0||π|1N(xO)bπN(xb)
    =(|π|1)log|XV0|logbπN(xb)N(xO).
     ☐

    Appendix D. Proof of Theorem 1

    Proof. 
    Given a set of random variables{Xi}iV, a subsetDXV cannot be represented by a pattern of{Xi}iV if and only if there existsAV withDAXA (proper subset) and|DA|>1, i.e., if neither all patterns atA are possible nor a unique pattern atA is specified byD.
    We first show that if there existsAV withDAXA and|DA|>1 then there is no patternx˜BCVXC withD=T(x˜B). Then we show that if no suchA exists then there is such a patternx˜B.
    SinceDA>1 we havexA,x¯ADAXA withxAx¯A. Next note that we can write any patternx˜B as
    x˜B=(x˜B\A,x˜BA).
    IfBA we can see sincex˜BA must take a single value it cannot containD since there are trajectories inD taking valuexBA onBA and trajectories inD taking valuesx¯BA. More formally, we must have eitherx˜BA=xA orx˜BAxA. First, letx˜BA=xA but thenT(x¯A)T(x˜B) soDT(x˜B). Next choosex˜BAxA but thenT(xA)T(x˜B) so alsoDT(x˜B). So we must haveBA=.
    Now we show that ifBA= there are trajectories inx˜B that are not inD. We construct one explicitly by fixing its value onA to the value inXA that is not inDA and its value onB tox˜B. More formally: chooseyAXA\DA, thenyAxA andyAx¯A. This is always possible sinceDAXA (proper subset). Then consider a trajectoryx^V=(x˜B,yA,xˇD) with arbitraryxˇDXD whereD=V\(BA). Thenx^VT(x˜B) butx^VD.
    Conversely, we show how to constructx˜B if no suchA exists. the idea is just to fix all random variables where|DA|=1 and leave them unspecified whereDA=XA. More formally: if there exists noAV withDAXA and|DA|>1, then for eachCV eitherDC=XC or|DC|=1. Then letB={CV:|DC|=1} then|DB|=1 so that we can definex˜B as the unique element inDB. Then ifyVD we haveyB=x˜B soDT(x˜B). IfzVT(x˜B) we havezB=x˜BDB and forAV withAB= by construction ofB we haveDA=XA such thatDV\B=XV\B which meanszV\BDV\B and thereforezVD andT(x˜B)D. So this givesT(x˜B)=D. ☐

    References

    1. Gallois, A. Identity over Time. InThe Stanford Encyclopedia of Philosophy; Zalta, E.N., Ed.; Metaphysics Research Laboratory, Stanford University: Stanford, CA, USA, 2012. [Google Scholar]
    2. Grand, S.Creation: Life and How to Make It; Harvard University Press: Harvard, MA, USA, 2003. [Google Scholar]
    3. Pascal, R.; Pross, A. Stability and its manifestation in the chemical and biological worlds.Chem. Commun.2015,51, 16160–16165. [Google Scholar] [CrossRef] [PubMed]
    4. Orseau, L.; Ring, M. Space-Time Embedded Intelligence. InArtificial General Intelligence; Number 7716 in Lecture Notes in Computer Science; Bach, J., Goertzel, B., Iklé, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 209–218. [Google Scholar]
    5. Barandiaran, X.E.; Paolo, E.D.; Rohde, M. Defining Agency: Individuality, Normativity, Asymmetry, and Spatio-temporality in Action.Adapt. Behav.2009,17, 367–386. [Google Scholar] [CrossRef]
    6. Legg, S.; Hutter, M. Universal Intelligence: A Definition of Machine Intelligence.arXiv, 2007; arXiv: 0712.3329. [Google Scholar]
    7. Boccara, N.; Nasser, J.; Roger, M. Particlelike structures and their interactions in spatiotemporal patterns generated by one-dimensional deterministic cellular-automaton rules.Phys. Rev. A1991,44, 866–875. [Google Scholar] [CrossRef] [PubMed]
    8. Biehl, M.; Ikegami, T.; Polani, D. Towards information based spatiotemporal patterns as a foundation for agent representation in dynamical systems. In Proceedings of the Artificial Life Conference, Cancun, Mexico, 2016; The MIT Press: Cambridge, MA, USA, 2016; pp. 722–729. [Google Scholar]
    9. McGill, W.J. Multivariate information transmission.Psychometrika1954,19, 97–116. [Google Scholar] [CrossRef]
    10. Amari, S.I. Information geometry on hierarchy of probability distributions.IEEE Trans. Inf. Theory2001,47, 1701–1711. [Google Scholar] [CrossRef]
    11. Lizier, J.T.The Local Information Dynamics of Distributed Computation in Complex Systems; Springer: Berlin/Heidelberg: Germany, 2012. [Google Scholar]
    12. Tononi, G.; Sporns, O. Measuring information integration.BMC Neurosci.2003,4, 31. [Google Scholar] [CrossRef] [PubMed]
    13. Balduzzi, D.; Tononi, G. Integrated Information in Discrete Dynamical Systems: Motivation and Theoretical Framework.PLoS Comput. Biol.2008,4, e1000091. [Google Scholar] [CrossRef] [PubMed]
    14. Beer, R.D. Characterizing autopoiesis in the game of life.Artif. Life2014,21, 1–19. [Google Scholar] [CrossRef] [PubMed]
    15. Fontana, W.; Buss, L.W. “The arrival of the fittest”: Toward a theory of biological organization.Bull. Math. Biol.1994,56, 1–64. [Google Scholar]
    16. Krakauer, D.; Bertschinger, N.; Olbrich, E.; Ay, N.; Flack, J.C. The Information Theory of Individuality.arXiv, 2014; arXiv:1412.2447. [Google Scholar]
    17. Bertschinger, N.; Olbrich, E.; Ay, N.; Jost, J. Autonomy: An information theoretic perspective.Biosystems2008,91, 331–345. [Google Scholar] [CrossRef] [PubMed]
    18. Shalizi, C.R.; Haslinger, R.; Rouquier, J.B.; Klinkner, K.L.; Moore, C. Automatic filters for the detection of coherent structure in spatiotemporal systems.Phys. Rev. E2006,73, 036104. [Google Scholar] [CrossRef] [PubMed]
    19. Wolfram, S. Computation theory of cellular automata.Commun. Math. Phys.1984,96, 15–57. [Google Scholar] [CrossRef]
    20. Grassberger, P. Chaos and diffusion in deterministic cellular automata.Phys. D Nonlinear Phenom.1984,10, 52–58. [Google Scholar] [CrossRef]
    21. Hanson, J.E.; Crutchfield, J.P. The attractor—Basin portrait of a cellular automaton.J. Stat. Phys.1992,66, 1415–1462. [Google Scholar] [CrossRef]
    22. Pivato, M. Defect particle kinematics in one-dimensional cellular automata.Theor. Comput. Sci.2007,377, 205–228. [Google Scholar] [CrossRef]
    23. Lizier, J.T.; Prokopenko, M.; Zomaya, A.Y. Local information transfer as a spatiotemporal filter for complex systems.Phys. Rev. E2008,77, 026110. [Google Scholar] [CrossRef] [PubMed]
    24. Flecker, B.; Alford, W.; Beggs, J.M.; Williams, P.L.; Beer, R.D. Partial information decomposition as a spatiotemporal filter.Chaos Interdiscip. J. Nonlinear Sci.2011,21, 037104. [Google Scholar] [CrossRef] [PubMed]
    25. Friston, K. Life as we know it.J. R. Soc. Interface2013,10, 20130475. [Google Scholar] [CrossRef] [PubMed]
    26. Balduzzi, D. Detecting emergent processes in cellular automata with excess information.arXiv, 2011; arXiv:1105.0158. [Google Scholar]
    27. Hoel, E.P.; Albantakis, L.; Marshall, W.; Tononi, G. Can the macro beat the micro? Integrated information across spatiotemporal scales.Neurosci. Conscious.2016,2016, niw012. [Google Scholar] [CrossRef]
    28. Grätzer, G.Lattice Theory: Foundation; Springer: New York, NY, USA, 2011. [Google Scholar]
    29. Ceccherini-Silberstein, T.; Coornaert, M. Cellular Automata and Groups. InEncyclopedia of Complexity and Systems Science; Meyers, R.A., Ed.; Springer: New York, NY, USA, 2009; pp. 778–791. [Google Scholar]
    30. Busic, A.; Mairesse, J.; Marcovici, I. Probabilistic cellular automata, invariant measures, and perfect sampling.arXiv, 2010; arXiv: 1010.3133. [Google Scholar]
    31. Beer, R.D. The cognitive domain of a glider in the game of life.Artif. Life2014,20, 183–206. [Google Scholar] [CrossRef] [PubMed]
    32. Beer, R.R.Autopoiesis and Enaction in the Game of Life; The MIT Press: Cambridge, MA, USA, 2016; p. 13. [Google Scholar]
    33. Noonan, H.; Curtis, B. Identity. InThe Stanford Encyclopedia of Philosophy; Zalta, E.N., Ed.; Metaphysics Research Laboratory, Stanford University: Stanford, CA, USA, 2014. [Google Scholar]
    34. Hawley, K. Temporal Parts. InThe Stanford Encyclopedia of Philosophy; Zalta, E.N., Ed.; Metaphysics Research Laboratory, Stanford University: Stanford, CA, USA, 2015. [Google Scholar]
    35. Ay, N. Information Geometry on Complexity and Stochastic Interaction.Entropy2015,17, 2432–2458. [Google Scholar] [CrossRef]
    36. MacKay, D.J.Information Theory, Inference and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
    37. Cover, T.M.; Thomas, J.A.Elements of Information Theory; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
    38. Tononi, G. An information integration theory of consciousness.BMC Neurosc.2004,5, 42. [Google Scholar] [CrossRef] [PubMed]
    39. Von Eitzen, H. Prove (1 − (1 −q)/n)nq for 0 <q < 1 andn ≥ 2 a Natural Number. Mathematics Stack Exchange. 2016. Available online:http://math.stackexchange.com/q/1974262 (accessed on 18 October 2016).
    40. Bullen, P.S.Handbook of Means and Their Inequalities; Springer Science+Business Media: Dordrecht, The Netherlands, 2003. [Google Scholar]
    41. Kolchinsky, A.; Rocha, L.M. Prediction and modularity in dynamical systems. InAdvances in Artificial Life, ECAL; The MIT Press: Cambridge, MA, USA, 2011; pp. 423–430. [Google Scholar]
    42. Pemmaraju, S.; Skiena, S.Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica®; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
    43. De Bruijn, N.G.Asymptotic Methods in Analysis; Dover Publications: New York, NY, USA, 2010. [Google Scholar]
    Entropy 19 00230 g001 550
    Figure 1. Illustration of concepts from this paper on the time-evolution (trajectory) of a one-dimensional elementary cellular automaton. Time-steps increase from left to right. None of the shown structures are derived from principles. They are manually constructed for illustrative purposes. In (a) we show the complete (finite) trajectory. Naively, two gliders can be seen to collide and give rise to a third glider; In (b–d) we show (spatiotemporal) patterns fixing the variables (allegedly) pertaining to a first, second, and a third glider; In (e) we show a pattern fixing the variables of what could be a glider that absorbs the first glider from before and maintains its identity; In (f) we show a partition into the time-slices of the pattern of the first glider; In (g) we show a partition of the trajectory with three parts coinciding with the gliders and one part encompassing the rest; In (h) we show again a partition with three parts coinciding with the gliders but now all other variables are considered as individual parts.
    Figure 1. Illustration of concepts from this paper on the time-evolution (trajectory) of a one-dimensional elementary cellular automaton. Time-steps increase from left to right. None of the shown structures are derived from principles. They are manually constructed for illustrative purposes. In (a) we show the complete (finite) trajectory. Naively, two gliders can be seen to collide and give rise to a third glider; In (b–d) we show (spatiotemporal) patterns fixing the variables (allegedly) pertaining to a first, second, and a third glider; In (e) we show a pattern fixing the variables of what could be a glider that absorbs the first glider from before and maintains its identity; In (f) we show a partition into the time-slices of the pattern of the first glider; In (g) we show a partition of the trajectory with three parts coinciding with the gliders and one part encompassing the rest; In (h) we show again a partition with three parts coinciding with the gliders but now all other variables are considered as individual parts.
    Entropy 19 00230 g001
    Entropy 19 00230 g002 550
    Figure 2. First time steps of a Bayesian network representing a multivariate dynamical system (or multivariate Markov chain){Xi}iV. Here we usedV=J×T withJ indicating spatial degrees of freedom andT the temporal extension. Then each node is indexed by a tuple(j,t) as shown. The shown edges are just an example, edges are allowed to point from any node to another one within the same or in the subsequent column.
    Figure 2. First time steps of a Bayesian network representing a multivariate dynamical system (or multivariate Markov chain){Xi}iV. Here we usedV=J×T withJ indicating spatial degrees of freedom andT the temporal extension. Then each node is indexed by a tuple(j,t) as shown. The shown edges are just an example, edges are allowed to point from any node to another one within the same or in the subsequent column.
    Entropy 19 00230 g002
    Entropy 19 00230 g003 550
    Figure 3. In (a) we show a trajectory of the same cellular automaton as inFigure 1 with a randomly chosen initial condition. The set of gliders and their paths occurring in this trajectory is clearly different from those inFigure 1a. In (b) we show an example of a random pattern that occurs in the trajectory of (a) and is probably not an entity in any sense.
    Figure 3. In (a) we show a trajectory of the same cellular automaton as inFigure 1 with a randomly chosen initial condition. The set of gliders and their paths occurring in this trajectory is clearly different from those inFigure 1a. In (b) we show an example of a random pattern that occurs in the trajectory of (a) and is probably not an entity in any sense.
    Entropy 19 00230 g003
    Entropy 19 00230 g004 550
    Figure 4. Bayesian network ofMC=. There is no interaction between the two processes.
    Figure 4. Bayesian network ofMC=. There is no interaction between the two processes.
    Entropy 19 00230 g004
    Entropy 19 00230 g005 550
    Figure 5. Visualisation of the four possible trajectories ofMC=. In each trajectory the time index increases from left to right. There are two rows corresponding to the two random variables at each time step and three columns corresponding to the three time-steps we are considering here.
    Figure 5. Visualisation of the four possible trajectories ofMC=. In each trajectory the time index increases from left to right. There are two rows corresponding to the two random variables at each time step and three columns corresponding to the three time-steps we are considering here.
    Entropy 19 00230 g005
    Entropy 19 00230 g006 550
    Figure 6. Specific local integrationsmiπ(xV) of any of the four trajectoriesxV seen inFigure 5 with respect to allπL(V). The partitions are ordered according to an enumeration with increasing cardinality|π| ((see Pemmaraju and Skiena [42], Chapter 4.3.3) for the method). We indicate with vertical lines at what partitions the cardinality|π| increases by one.
    Figure 6. Specific local integrationsmiπ(xV) of any of the four trajectoriesxV seen inFigure 5 with respect to allπL(V). The partitions are ordered according to an enumeration with increasing cardinality|π| ((see Pemmaraju and Skiena [42], Chapter 4.3.3) for the method). We indicate with vertical lines at what partitions the cardinality|π| increases by one.
    Entropy 19 00230 g006
    Entropy 19 00230 g007 550
    Figure 7. Same asFigure 6 but with the partitions sorted according to increasing SLI.
    Figure 7. Same asFigure 6 but with the partitions sorted according to increasing SLI.
    Entropy 19 00230 g007
    Entropy 19 00230 g008a 550Entropy 19 00230 g008b 550
    Figure 8. Hasse diagrams of the five disintegration levels of the trajectories ofMC=. Every vertex corresponds to a partition and edges indicate that the lower partition refines the higher one.
    Figure 8. Hasse diagrams of the five disintegration levels of the trajectories ofMC=. Every vertex corresponds to a partition and edges indicate that the lower partition refines the higher one.
    Entropy 19 00230 g008aEntropy 19 00230 g008b
    Entropy 19 00230 g009 550
    Figure 9. Hasse diagram ofD2 ofMC= trajectories. Here we visualise the partitions at each vertex. The blocks of a partition are the cells of equal colour. Note that we can obtain all six disconnected components from one by symmetry operations that are respected by the joint probability distributionpV. For example, we can shift each row individually to the left or right since every value is constant in each row. We can also switch top and bottom row since they have the same probability distributions even if 1 and 0 are exchanged.
    Figure 9. Hasse diagram ofD2 ofMC= trajectories. Here we visualise the partitions at each vertex. The blocks of a partition are the cells of equal colour. Note that we can obtain all six disconnected components from one by symmetry operations that are respected by the joint probability distributionpV. For example, we can shift each row individually to the left or right since every value is constant in each row. We can also switch top and bottom row since they have the same probability distributions even if 1 and 0 are exchanged.
    Entropy 19 00230 g009
    Entropy 19 00230 g010 550
    Figure 10. For each disintegration level of the trajectories ofMC= we here show example connected components of Hasse diagrams with the partitions at each vertex visualised. The disintegration level increases clockwise from the top left. The blocks of a partition are the cells of equal colour.
    Figure 10. For each disintegration level of the trajectories ofMC= we here show example connected components of Hasse diagrams with the partitions at each vertex visualised. The disintegration level increases clockwise from the top left. The blocks of a partition are the cells of equal colour.
    Entropy 19 00230 g010
    Entropy 19 00230 g011 550
    Figure 11. Hasse diagrams of the refinement-free disintegration hierarchyD ofMC= trajectories. Here we visualise the partitions at each vertex. The blocks of a partition are the cells of equal colour. It turns out that partitions that are on the samehorizontal level in this diagram correspond exactly to a level in the refinement-free disintegration hierarchyD. Thei-th horizontal level starting from the top corresponds toDi. Take for example the second horizontal level from the top. The partitions on this level are just the minimal elements of the posetD2 which was visualised inFigure 9. To connect this toFigure 8 note that for each disintegration levelDi shown there as a Hasse diagram, the partitions on thei-th horizontal level (counting from the top) in the present figure are the minimal elements of that disintegration level.
    Figure 11. Hasse diagrams of the refinement-free disintegration hierarchyD ofMC= trajectories. Here we visualise the partitions at each vertex. The blocks of a partition are the cells of equal colour. It turns out that partitions that are on the samehorizontal level in this diagram correspond exactly to a level in the refinement-free disintegration hierarchyD. Thei-th horizontal level starting from the top corresponds toDi. Take for example the second horizontal level from the top. The partitions on this level are just the minimal elements of the posetD2 which was visualised inFigure 9. To connect this toFigure 8 note that for each disintegration levelDi shown there as a Hasse diagram, the partitions on thei-th horizontal level (counting from the top) in the present figure are the minimal elements of that disintegration level.
    Entropy 19 00230 g011
    Entropy 19 00230 g012 550
    Figure 12. All distinct completely integrated composite patterns (singletons are not shown) on the first possible trajectory ofMC=. The value of complete local integration is indicated above each pattern. We display patterns by colouring the cells corresponding to random variables that are not fixed to any value by the pattern in grey. Cells corresponding to random variables that are fixed by the pattern are coloured according to the value i.e., white for 0 and black for 1.
    Figure 12. All distinct completely integrated composite patterns (singletons are not shown) on the first possible trajectory ofMC=. The value of complete local integration is indicated above each pattern. We display patterns by colouring the cells corresponding to random variables that are not fixed to any value by the pattern in grey. Cells corresponding to random variables that are fixed by the pattern are coloured according to the value i.e., white for 0 and black for 1.
    Entropy 19 00230 g012
    Entropy 19 00230 g013 550
    Figure 13. All distinct completely integrated composite patterns on the second possible trajectory ofMC=. The value of complete local integration is indicated above each pattern.
    Figure 13. All distinct completely integrated composite patterns on the second possible trajectory ofMC=. The value of complete local integration is indicated above each pattern.
    Entropy 19 00230 g013
    Entropy 19 00230 g014 550
    Figure 14. All distinct completely integrated composite patterns on all four possible trajectories ofMC=. The value of complete local integration is indicated above each pattern.
    Figure 14. All distinct completely integrated composite patterns on all four possible trajectories ofMC=. The value of complete local integration is indicated above each pattern.
    Entropy 19 00230 g014
    Entropy 19 00230 g015 550
    Figure 15. Bayesian network ofMCϵ.
    Figure 15. Bayesian network ofMCϵ.
    Entropy 19 00230 g015
    Entropy 19 00230 g016 550
    Figure 16. Visualisation of three trajectories ofMCϵ. In each trajectory the time index increases from left to right. There are two rows corresponding to the two random variables at each time step and three columns corresponding to the three time-steps we are considering here. We can see that the first trajectory (in (a)) makes no e-transitions, the second (in (b)) makes one fromt = 2 tot = 3, and the third (in (c)) makes two.
    Figure 16. Visualisation of three trajectories ofMCϵ. In each trajectory the time index increases from left to right. There are two rows corresponding to the two random variables at each time step and three columns corresponding to the three time-steps we are considering here. We can see that the first trajectory (in (a)) makes no e-transitions, the second (in (b)) makes one fromt = 2 tot = 3, and the third (in (c)) makes two.
    Entropy 19 00230 g016
    Entropy 19 00230 g017 550
    Figure 17. Specific local integrationsmiπ(xV) of one of the four trajectories ofMC= (measured w.r.t. the probability distribution ofMC=), here denotedxVMC= (this is the same data as inFigure 6), and the three representative trajectoriesxVk,x{1,2,3} ofMCϵ (measured w.r.t. the probability distribution ofMCϵ) seen inFigure 16 with respect to allπL(V). The partitions are ordered as inFigure 6 with increasing cardinality|π|. Vertical lines indicate partitions where the cardinality|π| increases by one. Note that the values ofxVMC= are almost completely hidden from view by those ofxV1.
    Figure 17. Specific local integrationsmiπ(xV) of one of the four trajectories ofMC= (measured w.r.t. the probability distribution ofMC=), here denotedxVMC= (this is the same data as inFigure 6), and the three representative trajectoriesxVk,x{1,2,3} ofMCϵ (measured w.r.t. the probability distribution ofMCϵ) seen inFigure 16 with respect to allπL(V). The partitions are ordered as inFigure 6 with increasing cardinality|π|. Vertical lines indicate partitions where the cardinality|π| increases by one. Note that the values ofxVMC= are almost completely hidden from view by those ofxV1.
    Entropy 19 00230 g017
    Entropy 19 00230 g018 550
    Figure 18. All distinct completely integrated composite patterns on the first trajectoryxV1 ofMCϵ. The value of complete local integration is indicated above each pattern. SeeFigure 12 for colouring conventions.
    Figure 18. All distinct completely integrated composite patterns on the first trajectoryxV1 ofMCϵ. The value of complete local integration is indicated above each pattern. SeeFigure 12 for colouring conventions.
    Entropy 19 00230 g018
    Entropy 19 00230 g019 550
    Figure 19. All distinct completely integrated composite patterns on the second trajectoryxV2 ofMCϵ. The value of complete local integration is indicated above each pattern.
    Figure 19. All distinct completely integrated composite patterns on the second trajectoryxV2 ofMCϵ. The value of complete local integration is indicated above each pattern.
    Entropy 19 00230 g019
    Entropy 19 00230 g020 550
    Figure 20. All distinct completely integrated composite patterns on the third trajectoryxV3 ofMCϵ. The value of complete local integration is indicated above each pattern.
    Figure 20. All distinct completely integrated composite patterns on the third trajectoryxV3 ofMCϵ. The value of complete local integration is indicated above each pattern.
    Entropy 19 00230 g020

    © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

    Share and Cite

    MDPI and ACS Style

    Biehl, M.; Ikegami, T.; Polani, D. Specific and Complete Local Integration of Patterns in Bayesian Networks.Entropy2017,19, 230. https://doi.org/10.3390/e19050230

    AMA Style

    Biehl M, Ikegami T, Polani D. Specific and Complete Local Integration of Patterns in Bayesian Networks.Entropy. 2017; 19(5):230. https://doi.org/10.3390/e19050230

    Chicago/Turabian Style

    Biehl, Martin, Takashi Ikegami, and Daniel Polani. 2017. "Specific and Complete Local Integration of Patterns in Bayesian Networks"Entropy 19, no. 5: 230. https://doi.org/10.3390/e19050230

    APA Style

    Biehl, M., Ikegami, T., & Polani, D. (2017). Specific and Complete Local Integration of Patterns in Bayesian Networks.Entropy,19(5), 230. https://doi.org/10.3390/e19050230

    Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further detailshere.

    Article Metrics

    No
    No

    Article Access Statistics

    For more information on the journal statistics, clickhere.
    Multiple requests from the same IP address are counted as one view.
    Entropy, EISSN 1099-4300, Published by MDPI
    RSSContent Alert

    Further Information

    Article Processing Charges Pay an Invoice Open Access Policy Contact MDPI Jobs at MDPI

    Guidelines

    For Authors For Reviewers For Editors For Librarians For Publishers For Societies For Conference Organizers

    MDPI Initiatives

    Sciforum MDPI Books Preprints.org Scilit SciProfiles Encyclopedia JAMS Proceedings Series

    Follow MDPI

    LinkedIn Facebook X
    MDPI

    Subscribe to receive issue release notifications and newsletters from MDPI journals

    © 1996-2025 MDPI (Basel, Switzerland) unless otherwise stated
    Terms and Conditions Privacy Policy
    We use cookies on our website to ensure you get the best experience.
    Read more about our cookieshere.
    Accept
    Back to TopTop
    [8]ページ先頭

    ©2009-2025 Movatter.jp