NotificationsYou must be signed in to change notification settings
Fork5
Star27

Studying collective memories of internet users using Wikipedia viewership statistics

blog.miz.space/research/2017/08/14/wikipedia-collective-memory-dynamic-graph-analysis-graphx-spark-scala-time-series-network/

License

MIT license

27 stars 5 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
ipython		ipython
project		project
src/main/scala		src/main/scala
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Repository files navigation

UPDATE: the latest version of implementation for Anomaly detection in Web and Social Networks paper is availablehere.

To run the code, you will need to pre-process Wikipedia pagedumps and pagecounts. To do that, follow theinstructions.

Once you have pre-processed the dumps, you can run the algorithm usingspark-submit from thesparkwiki repository that you've used for dumps pre-processing. See an example of the command below:

spark-submit --class ch.epfl.lts2.wikipedia.PeakFinder --master 'local[*]' --executor-memory 30g --driver-memory 30g --packages org.rogach:scallop_2.11:3.1.5,com.datastax.spark:spark-cassandra-connector_2.11:2.4.0,com.typesafe:config:1.3.3,neo4j-contrib:neo4j-spark-connector:2.4.0-M6,com.github.servicenow.stl4j:stl-decomp-4j:1.0.5,org.apache.commons:commons-math3:3.6.1,org.scalanlp:breeze_2.11:1.0 target/scala-2.11/sparkwiki_<VERSION OF SPARKWIKI>.jar --config config/peakfinder.conf --language en --parquetPagecounts --parquetPagecountPath <PATH TO THE OUTPUT OF PagecountProcessor> --outputPath <PATH WHERE YOU'LL HAVE RESULTING GRAPHS WITH ANOMALIES>

Parameters explained:

spark-submit --class ch.epfl.lts2.wikipedia.PeakFinder --master 'local[*]' [use all available cores]--executor-memory [amount of RAM allocated for executor (30% of available RAM)] --driver-memory [amount of RAM allocated for driver (40% of available RAM)]--packages  org.rogach:scallop_2.11:3.1.5,            com.datastax.spark:spark-cassandra-connector_2.11:2.4.0,            com.typesafe:config:1.3.3,            neo4j-contrib:neo4j-spark-connector:2.4.0-M6,            com.github.servicenow.stl4j:stl-decomp-4j:1.0.5,            org.apache.commons:commons-math3:3.6.1,            org.scalanlp:breeze_2.11:1.0 target/scala-2.11/sparkwiki_<VERSION OF SPARKWIKI>.jar--config [path to config file where you specify parameters of the algorithm]--language [language code. You can choose any language code but you should have a graph of a corresponding language edition of Wikipedia]--parquetPagecounts--parquetPagecountPath [path to the output files of ch.epfl.lts2.wikipedia.PagecountProcessor]--outputPath [output path where you will have your graphs with anomalous pages]

Also, we have implemented a very intuitive and concise (but inefficient) Python implementation for practitioners to provide overall understanding of the algorithm. More detailshere.

WikiBrain

In this repository, you can find an implementation of the graph learning algorithm presented inWikipedia graph mining: dynamic structure of collective memory. The learning algorithm is inspired by the Hebbian learning theory.

We also reported the results with interactive graph visualizations in anaccompanying blog post.

Dataset

To reproduce the experiments, download the dataset from.

Clone this project and extract the downloaded .zip files to/src/main/resources/ folder.

ChangePATH_RESOURCES inGlobals.scala to the path to this project on your computer.

If you want to reproduce just a part of the experiments, download pre-processed data fromhere and unzip the files intoPATH_RESOURCES. This should be enough to run most of the scripts in this repository.

Runing the experiments

OpenWikiBrainHebbStatic.scala and run the code (Shift+F10 in Intellij Idea).

You may have to change your Spark configuration according to RAM availability on your computer.

valspark=SparkSession.builder  .master("local[*]")// use all available cores  .appName("Wiki Brain")  .config("spark.driver.maxResultSize","20g")// change this if needed  .config("spark.executor.memory","50g")// change this if needed  .getOrCreate()

Resulting graphs

You will find the resulting graphgraph.gexf inPATH_RESOURCES. This file can be opened inGephi.

About

Studying collective memories of internet users using Wikipedia viewership statistics

blog.miz.space/research/2017/08/14/wikipedia-collective-memory-dynamic-graph-analysis-graphx-spark-scala-time-series-network/

Releases1

Dynamic patterns generator on static base graph Latest

Jul 3, 2017

Packages

No packages published

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

WikiBrain

Dataset

Runing the experiments

Resulting graphs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages

Languages

Movatterモバイル変換

License

mizvol/WikiBrain

Folders and files

Latest commit

History

Repository files navigation

WikiBrain

Dataset

Runing the experiments

Resulting graphs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases1

Packages0

Languages

Packages