Movatterモバイル変換

Affinity propagation

From Wikipedia, the free encyclopedia

Algorithm in data mining

Instatistics anddata mining,affinity propagation (AP) is aclustering algorithm based on the concept of "message passing" between data points.^[1]Unlike clustering algorithms such ask-means ork-medoids, affinity propagation does not require the number of clusters to be determined or estimated before running the algorithm. Similar tok-medoids, affinity propagation finds "exemplars," members of the input set that are representative of clusters.^[1]

Algorithm

[edit]

Letx₁ throughx_n be a set of data points, with no assumptions made about their internal structure, and lets be a function that quantifies the similarity between any two points, such thats(i,j) >s(i,k)if and only ifx_i is more similar tox_j than tox_k. For this example, the negative squared distance of two data points was used i.e. for pointsx_i andx_k, $s(i,k)=-\left\|x_{i}-x_{k}\right\|^{2}$ .^[1]

The diagonal ofs (i.e. $s(i,i)$ ) is particularly important, as it represents the instance preference, meaning how likely a particular instance is to become an exemplar. When it is set to the same value for all inputs, it controls how many classes the algorithm produces. A value close to the minimum possible similarity produces fewer classes, while a value close to or larger than the maximum possible similarity produces many classes. It is typically initialized to the median similarity of all pairs of inputs.

The algorithm proceeds by alternating between two message-passing steps, which update two matrices:^[1]

The "responsibility" matrixR has valuesr(i,k) that quantify how well-suitedx_k is to serve as the exemplar forx_i, relative to other candidate exemplars forx_i.
The "availability" matrixA contains valuesa(i,k) that represent how "appropriate" it would be forx_i to pickx_k as its exemplar, taking into account other points' preference forx_k as an exemplar.

Both matrices are initialized to all zeroes, and can be viewed aslog-probability tables. The algorithm then performs the following updates iteratively:

First, responsibility updates are sent around: $r(i,k)\gets s(i,k)-\max _{k'\neq k}{\left\{a(i,k')+s(i,k')\right\}}$
Then, availability is updated per $a(i,k)\gets \min {\left(0,r(k,k)+\sum _{i'\not \in \{i,k\}}\max(0,r(i',k))\right)}\quad {\text{ for }}i\neq k$ and $a(k,k)\leftarrow \sum _{i'\neq k}\max(0,r(i',k)).$

Iterations are performed until either the cluster boundaries remain unchanged over a number of iterations, or some predetermined number (of iterations) is reached. The exemplars are extracted from the final matrices as those whose 'responsibility + availability' for themselves is positive (i.e. $(r(i,i)+a(i,i))>0$ ).

Applications

[edit]

The inventors of affinity propagation showed it is better for certain computer vision andcomputational biology tasks, e.g. clustering of pictures of human faces and identifying regulated transcripts, thank-means,^[1] even whenk-means was allowed many random restarts and initialized usingPCA.^[2]A study comparing affinity propagation andMarkov clustering onprotein interaction graph partitioning found Markov clustering to work better for that problem.^[3] A semi-supervised variant has been proposed fortext mining applications.^[4] Another recent application was in economics, when the affinity propagation was used to find some temporal patterns in the output multipliers of the US economy between 1997 and 2017.^[5]

Software

[edit]

AJava implementation is included in theELKI data mining framework.
AJulia implementation of affinity propagation is contained in Julia Statistics's Clustering.jl package.
APython version is part of thescikit-learn library.
AnR implementation is available in the "apcluster" package.

References

[edit]

^^a ^b ^c ^d ^eBrendan J. Frey; Delbert Dueck (2007). "Clustering by passing messages between data points".Science.315 (5814):972–976.Bibcode:2007Sci...315..972F.CiteSeerX 10.1.1.121.3145.doi:10.1126/science.1136800.PMID 17218491.S2CID 6502291.
^Delbert Dueck; Brendan J. Frey (2007).Non-metric affinity propagation for unsupervised image categorization. Int'l Conf. on Computer Vision.doi:10.1109/ICCV.2007.4408853.
^James Vlasblom; Shoshana Wodak (2009)."Markov clustering versus affinity propagation for the partitioning of protein interaction graphs".BMC Bioinformatics.10 (1): 99.doi:10.1186/1471-2105-10-99.PMC 2682798.PMID 19331680.
^Renchu Guan; Xiaohu Shi; Maurizio Marchese; Chen Yang; Yanchun Liang (2011). "Text Clustering with Seeds Affinity Propagation".IEEE Transactions on Knowledge and Data Engineering.23 (4):627–637.doi:10.1109/tkde.2010.144.hdl:11572/89884.S2CID 14053903.
^Almeida, Lucas Milanez de Lima; Balanco, Paulo Antonio de Freitas (2020-06-01)."Application of multivariate analysis as complementary instrument in studies about structural changes: An example of the multipliers in the US economy".Structural Change and Economic Dynamics.53:189–207.doi:10.1016/j.strueco.2020.02.006.ISSN 0954-349X.S2CID 216406772.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Affinity_propagation&oldid=1222816116"

Category:

Cluster analysis algorithms

Hidden categories:

[8]ページ先頭