Posted onDec 14, 2022 • Originally published atmemgraph.com

Introduction to Node Embedding

Introduction

In this article, we will try to provide an explanation to the following questions:

What arenode embeddings?
How to generate node embeddings?
When can we even use node embeddings?

This is a lot to cover in one article, but let's give it our best. Any prior knowledge of graphs ormachine learning is not necessary, just a bonus.
First, let's start with graphs. 🚀

Graphs

Graphs consist ofnodes andedges - connections between the nodes.

Node and edge on a graph

In social networks, nodes could represent users, and links between them could represent friendships.

One interesting thing you can do with graphs is to predict which tweets on Twitter are from bots, and which are from organic users. How would we achieve that? Well, stick around and you will get an idea of how it can be done.

What are node embeddings?

So what does embedding mean, and why is it useful?

Toembed, per the English dictionary, means to fix something in a substance or solid object. With graphs, it would mean to map the whole graph inN-dimensional space. Take a look at the example below. In this example, we mapped all the nodes in 2-dimensional space. Now, it should be obvious we have two clusters (or communities) in the graph. For us humans, it is easier to identify clusters in 2-dimensional space. In this example, it is also easy to spot clusters just from the graph layout, but imagine the graph having 1000 nodes - things aren't as straightforward anymore.

Furthermore, for a computer, it is easier to work with node embeddings (vectors of numbers), because it is easier to calculate how similar (close in space) 2 nodes are from embeddings in N-dimensional space than it would be to calculate from a graph only. On the other hand, there is no proper way how we could calculate the closeness of two nodes just from the graph. You could use something like theshortest path algorithm, but that itself is not representative enough. With vectors, it's easier. The most often used metric is calledcosine similarity.

Now we have something a computer can work with:

Example of embedding

Now we know what embeddings are, but what do we use node embeddings for?

Supervised machine learning is a subset of machine learning where algorithms try to learn from data. Data is represented byinput-output pairs, i.e. [2] -> 2, [1] -> 1. Our model tries to learn from data in such a way that it maps inputs to the correct outputs. In our example ([2] -> 2, [1] -> 1) model would try to learn function y=x. Here, it would be pretty easy for the model to learn input-output mapping, but imagine a problem where a lot of different points from input spacemap to same output value. That's why we can't directly apply a machine learning algorithm to our input-output pairs, but we first need to find a set ofinformative,discriminating, andindependent features amongst input data points. Finding such features is an often difficult task.

Inprediction problems onnetworks, we would need to do the same for the nodes and edges. A typical solution involves hand-engineering domain-specific features based on expert knowledge. Even if one discounts the tedious effort required forfeature engineering, such features are usually designed for specific tasks and do not generalize across different prediction tasks.

This sounds like a bad news. 👎

We want our algorithm to be independent of the downstream prediction task and that the representations can be learned in a purelyunsupervised way. This is wherenode embeddings come into place.

We will make our algorithm learnembeddings, and after that, we can apply those embeddings in any of the following applications, one of which is Twitter bot detection. Let's dig in.

How to generate node embeddings?

Researches have divided these methods into three broad categories:
1)Factorization based
2)Random Walk based
3)Deep Learning based

1. Factorization based

Factorization based algorithms represent the connections between nodes in the form of a matrix and factorize this matrix to obtain the embedding. In one such method calledLocal Linear Embedding, there is the assumption thatevery node is a linear combination of its neighbors, so the algorithm tries to represent the embedding of every node as a linear combination of itsneighbors' embeddings. It is like the example from high school where you need to represent one vector as alinear combination of other two vectors. Only here, you can have multiple vectors, and they are much more complex.

2. Random walks

Random walk based methods use a walk approach to generate (sample) network neighborhoods for nodes. For every node, we would generate its network neighborhood by choosing in some way (depends on the method, can be random or it can include probabilities) the next node of our walk. You can take a look at the picture below of how it would look like.

The maximum walk length is determined before this process ofwalk sampling, and for every node, we generateN random walks. By doing so, we have created anetwork neighborhood of a node. And now our goal would be to make a nodeas similar as possible to nodes in its network neighborhood.

Example of a graph walk with three steps

Again with this boring question, but why do this?

Turns out this process was proven to be very good in another area callednatural language processing dealing with words/documents where you want to find similar words. For example, words"intelligent" and "smart" should be similar words. This method in natural language processing is calledword2vec. Words that appear in a similar context (words before or after that word), should be similar. Thankfully, the same applies to nodes. Nodes that appear in a similar context (sampled walks) should be similar.Our process of walk sampling is used to create a dataset on which we will try to make node embeddings as similar as possible. 🤯 And that is it. The dataset inword2vec methods is every sentence of a document, and analogously for us, it is every sampled graph random walk.

3. Deep learning

The growing research ondeep learning has led to the usage of deep neural network-based methods applied to graphs. With deep learning, it is easier to model non-linear structures, sodeep autoencoders have been used for dimensionality reduction. A few popular methods from this area are calledStructural Deep Network Embedding (SDNE) andDeep Neural Networks for Learning Graph Representations (DNGR) so feel free to check them out.

Where can node embeddings be applied?

We know thatgraphs occur naturally in various real-world scenarios such as social networks (social sciences), word co-occurrence networks (linguistics), interaction networks (i.e.Protein-Protein interactions in biology), and so on. Modeling the interactions between entities as graphs have enabled researchers to understand the various networks in a systematic manner. For example, social networks have been used for applications like friendship or content recommendations, as well as for advertisement.

But how are researchers modeling such interactions?
You may already have the answer.

By embedding a large graph in low dimensional space (a.k.a.node embeddings). Embeddings have recently attracted significant interest due to their wide applications in areas such as graph visualization, link prediction, clustering, and node classification. It has been demonstrated thatgraph embedding is superior to alternatives in many supervised learning tasks, such as node classification, link prediction, and graph reconstruction. Here is a chronological list of research papers where you can check them out:

Node classification

Node classification aims to determine the label of nodes (a.k.a. vertices) based on other labeled nodes and the topology of the network. Often in networks, only a fraction of nodes are labeled. Insocial networks, labels may indicate interests, beliefs, or demographics, whereas the labels of entities inbiology networks may be based on functionality. For example, we have some data where researchers have painstakingly worked out the functional role of specific proteins in their system of interest and characterized details of their interaction partners and the pathways in which they function. But still, a lot of them haven't yet been worked out completely. With embeddings, we could try topredict missing labels with high precision.

Link prediction

Link prediction refers to the task of predicting missing links or links that are likely to occur in the future. For example in theProtein-Protein network, where verifying the existence of links between nodes that are proteins requires costly experimental tests, link prediction might save you a lot of money so that you check only where you have a higher chance to guess correctly.

Clustering

Clustering is used to find subsets of similar nodes and group them; finally,visualization helps in providing insights into the structure of the network.

So back to ourbot case. One assumption could be made that bots have a small number of links to real users, because who would want to be friends with them, but they have a lot of links between them so that they appear as real users. Graph clustering or community detection come in place here. We want to find those clusters and remove bot users. This can be done with node embeddings, especiallydynamic node embeddings, where interactions are made every second.

There is a cool articleThe Rise of Social Bots in which you can read howbots are used to affect and possibly manipulate the online debate aboutvaccination policy.

Conclusion

Phew. There was a lot to cover, but we succeeded somehow! Good job if you stayed with me until here! ❤️. Now, after we have covered the theory, you can check out some implementations ofnode embedding algorithms, forstatic anddynamic graphs.

Our team of engineers is currently tackling the problem of graph analytics algorithms onreal-time data. If you want to discuss how to applyonline/streaming algorithms on connected data, feel free to join ourDiscord server and message us.

MAGE shares his wisdom on aTwitter channel. Get to know him better by following him 🐦