Incomputer science andgraph theory, the termcolor-coding refers to analgorithmic technique which is useful in the discovery ofnetwork motifs. For example, it can be used to detect asimple path of lengthk in a givengraph. The traditional color-coding algorithm isprobabilistic, but it can bederandomized without much overhead in the running time.
Color-coding also applies to the detection ofcycles of a given length, and more generally it applies to thesubgraph isomorphism problem (anNP-complete problem), where it yieldspolynomial time algorithms when the subgraph pattern that it is trying to detect has boundedtreewidth.
The color-coding method was proposed and analyzed in 1994 byNoga Alon,Raphael Yuster, andUri Zwick.[1][2]
The following results can be obtained through the method of color-coding:
To solve the problem of finding a subgraph in a given graphG = (V,E), whereH can be a path, a cycle, or any boundedtreewidth graph where, the method of color-coding begins by randomly coloring each vertex ofG with colors, and then tries to find a colorful copy ofH in coloredG. Here, a graph is colorful if every vertex in it is colored with a distinct color. This method works by repeating (1) random coloring a graph and (2) finding colorful copy of the target subgraph, and eventually the target subgraph can be found if the process is repeated a sufficient number of times.
Suppose a copy ofH inG becomes colorful with some non-zero probabilityp. It immediately follows that if the random coloring is repeated1/p times, then this copy is expected to become colorful once. Note that thoughp is small, it is shown that if,p is only polynomially small. Suppose again there exists an algorithm such that, given a graphG and a coloring which maps each vertex ofG to one of thek colors, it finds a copy of colorfulH, if one exists, within some runtimeO(r). Then the expected time to find a copy ofH inG, if one exists, is.
Sometimes it is also desirable to use a more restricted version of colorfulness. For example, in the context of finding cycles inplanar graphs, it is possible to develop an algorithm that finds well-colored cycles. Here, a cycle is well-colored if its vertices are colored by consecutive colors.
An example would be finding a simple cycle of lengthk in graphG = (V,E).
By applying random coloring method, each simple cycle has a probability of to become colorful, since there are ways of coloring thek vertices on the cycle, among which there are colorful occurrences. Then an algorithm (described next) can be used to find colorful cycles in the randomly colored graphG in time, where is the matrix multiplication constant. Therefore, it takes overall time to find a simple cycle of lengthk inG.
The colorful cycle-finding algorithm works by first finding all pairs of vertices inV that are connected by a simple path of lengthk − 1, and then checking whether the two vertices in each pair are connected. Given a coloring functionc :V → {1, ...,k} to color graphG, enumerate all partitions of the color set{1, ...,k} into two subsetsC1,C2 of size each. Note thatV can be divided intoV1 andV2 accordingly, and letG1 andG2 denote the subgraphs induced byV1 andV2 respectively. Then, recursively find colorful paths of length in each ofG1 andG2. Suppose the boolean matrixA1 andA2 represent the connectivity of each pair of vertices inG1 andG2 by a colorful path, respectively, and letB be the matrix describing the adjacency relations between vertices ofV1 and those ofV2, the boolean product gives all pairs of vertices inV that are connected by a colorful path of lengthk − 1. Thus, the recursive relation of matrix multiplications is, which yields a runtime of. Although this algorithm finds only the end points of the colorful path, another algorithm by Alon and Naor[4] that finds colorful paths themselves can be incorporated into it.
Thederandomization of color-coding involves enumerating possible colorings of a graphG, such that the randomness of coloringG is no longer required. For the target subgraphH inG to be discoverable, the enumeration has to include at least one instance where theH is colorful. To achieve this, enumerating ak-perfect familyF of hash functions from{1, ..., |V|} to{1, ...,k} is sufficient. By definition,F isk-perfect if for every subsetS of{1, ..., |V|} where, there exists a hash functionh inF such thath :S → {1, ...,k} isperfect. In other words, there must exist a hash function inF that colors any givenk vertices withk distinct colors.
There are several approaches to construct such ak-perfect hash family:
In the case of derandomizing well-coloring, where each vertex on the subgraph is colored consecutively, ak-perfect family of hash functions from{1, ..., |V|} to{1, ...,k!} is needed. A sufficientk-perfect family which maps from{1, ..., |V|} to{1, ...,kk} can be constructed in a way similar to the approach 3 above (the first step). In particular, it is done by usingnklogk random bits that are almostklogk independent, and the size of the resultingk-perfect family will be.
The derandomization of color-coding method can be easily parallelized, yielding efficientNC algorithms.
Recently, color-coding has attracted much attention in the field of bioinformatics. One example is the detection ofsignaling pathways inprotein-protein interaction (PPI) networks. Another example is to discover and to count the number ofmotifs in PPI networks. Studying bothsignaling pathways andmotifs allows a deeper understanding of the similarities and differences of many biological functions, processes, and structures among organisms.
Due to the huge amount of gene data that can be collected, searching for pathways or motifs can be highly time consuming. However, by exploiting the color-coding method, the motifs or signaling pathways with vertices in a networkG withn vertices can be found very efficiently in polynomial time. Thus, this enables us to explore more complex or larger structures in PPI networks.