- Notifications
You must be signed in to change notification settings - Fork3
dawnranger/spark-pscan
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A linear time complexity community detection algorithm for large scale graph.
This is a spark implementation of
Zhao, W., Martha, V., & Xu, X. (2013, March).PSCAN: a parallel Structural clustering algorithm for big networks in MapReduce. In Advanced Information Networking and Applications (AINA), 2013 IEEE 27th International Conference on (pp. 862-869). IEEE
Which is the parallel version of:
X.Xu, N.Yuruk, Z. Feng, T. Schweiger.SCAN: a structural clustering algorithm for networks,Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 824-833, 2007.
val graph:Graph[Int, Int] = GraphLoader.edgeListFile(sc, "path_to_graph_file")val components:Graph[VertexId, Int] = PSCAN.pscan(graph, epsilon = 0.2)println("num communities: " + components.vertices.map{case (vId,cId)=>cId}.distinct.count)println("nodes of every communities:")components.vertices.map(v=>(v._2, v._1)).groupByKey().collect .foreach(x=>println("%d: %s".format(x._1, x._2.mkString(" "))))
This repo is a re-organization of the PSCAN implementation inthis repo with some minor change in order to make it more readable and easy to use.