- Notifications
You must be signed in to change notification settings - Fork44
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).
License
benedekrozemberczki/Splitter
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
APyTorch implementation ofSplitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019).
Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to 90%. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.
This repository provides a PyTorch implementation of Splitter as described in the paper:
Splitter: Learning Node Representations that Capture Multiple Social Contexts.Alessandro Epasto and Bryan Perozzi.WWW, 2019.[Paper]
The original Tensorflow implementation is available[here].
The codebase is implemented in Python 3.5.2. package versions used for development are just below.
networkx 1.11tqdm 4.28.1numpy 1.15.4pandas 0.23.4texttable 1.5.0scipy 1.1.0argparse 1.1.0torch 1.1.0gensim 3.6.0
The code takes the **edge list** of the graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. A sample graph for `Cora` is included in the `input/` directory.
The embeddings are saved in the `input/` directory. Each embedding has a header and a column with the node IDs. Finally, the node embedding is sorted by the node ID column.
The training of a Splitter embedding is handled by the `src/main.py` script which provides the following command line arguments.
--edge-path STR Edge list csv. Default is `input/chameleon_edges.csv`. --embedding-output-path STR Embedding output csv. Default is `output/chameleon_embedding.csv`. --persona-output-path STR Persona mapping JSON. Default is `output/chameleon_personas.json`.
--seed INT Random seed. Default is 42. --number of walks INT Number of random walks per node. Default is 10. --window-size INT Skip-gram window size. Default is 5. --negative-samples INT Number of negative samples. Default is 5. --walk-length INT Random walk length. Default is 40. --lambd FLOAT Regularization parameter. Default is 0.1 --dimensions INT Number of embedding dimensions. Default is 128. --workers INT Number of cores for pre-training. Default is 4. --learning-rate FLOAT SGD learning rate. Default is 0.025
The following commands learn an embedding and save it with the persona map. Training a model on the default dataset.
python src/main.py
Training a Splitter model with 32 dimensions.
python src/main.py --dimensions 32
Increasing the number of walks and the walk length.
python src/main.py --number-of-walks 20 --walk-length 80
License
About
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).