| Title: | Sample Generalized Random Dot Product Graphs in Linear Time |
| Version: | 0.4.0 |
| Description: | Samples generalized random product graphs, a generalization of a broad class of network models. Given matrices X, S, and Y with with non-negative entries, samples a matrix with expectation X S Y^T and independent Poisson or Bernoulli entries using the fastRG algorithm of Rohe et al. (2017)https://www.jmlr.org/papers/v19/17-128.html. The algorithm first samples the number of edges and then puts them down one-by-one. As a result it is O(m) where m is the number of edges, a dramatic improvement over element-wise algorithms that which require O(n^2) operations to sample a random graph, where n is the number of nodes. |
| License: | MIT + file LICENSE |
| URL: | https://rohelab.github.io/fastRG/,https://github.com/RoheLab/fastRG |
| BugReports: | https://github.com/RoheLab/fastRG/issues |
| Depends: | Matrix |
| Imports: | dplyr, ggplot2, glue, igraph, methods, rlang (≥ 1.0.0),RSpectra, stats, tibble, tidygraph, tidyr |
| Suggests: | covr, irlba, knitr, magrittr, purrr, rmarkdown, scales,testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2025-12-05 20:20:25 UTC; alex |
| Author: | Alex Hayes |
| Maintainer: | Alex Hayes <alexpghayes@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-12-06 15:20:02 UTC |
Create an undirected Chung-Lu object
Description
To specify a Chung-Lu graph, you must specifythe degree-heterogeneity parameters (vian ortheta).We provide reasonable defaults to enable rapid explorationor you can invest the effortfor more control over the model parameters. Westrongly recommendsetting theexpected_degree orexpected_density argumentto avoid large memory allocations associated withsampling large, dense graphs.
Usage
chung_lu( n = NULL, theta = NULL, ..., sort_nodes = TRUE, poisson_edges = TRUE, allow_self_loops = TRUE, force_identifiability = FALSE)Arguments
n | (degree heterogeneity) The number of nodes in the graph.Use when you don't want to specify the degree-heterogeneityparameters |
theta | (degree heterogeneity) A numeric vectorexplicitly specifying the degree heterogeneityparameters. This implicitly determines the number of nodesin the resulting graph, i.e. it will have |
... | Arguments passed on to
|
sort_nodes | Logical indicating whether or not to sort the nodesso that they are grouped by block and by |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
force_identifiability | Logical indicating whether or not tonormalize |
Value
Anundirected_chung_lu S3 object, a subclass ofdcsbm().
See Also
Other undirected graphs:dcsbm(),erdos_renyi(),mmsbm(),overlapping_sbm(),planted_partition(),sbm()
Examples
set.seed(27)cl <- chung_lu(n = 100, expected_density = 0.01)cltheta <- round(stats::rlnorm(100, 2))cl2 <- chung_lu( theta = theta, expected_degree = 5)cl2edgelist <- sample_edgelist(cl)edgelistCreate an undirected degree corrected stochastic blockmodel object
Description
To specify a degree-corrected stochastic blockmodel, you must specifythe degree-heterogeneity parameters (vian ortheta),the mixing matrix (viak orB), and the relative blockprobabilities (optional, viapi). We provide defaults for most of theseoptions to enable rapid exploration, or you can invest the effortfor more control over the model parameters. Westrongly recommendsetting theexpected_degree orexpected_density argumentto avoid large memory allocations associated withsampling large, dense graphs.
Usage
dcsbm( n = NULL, theta = NULL, k = NULL, B = NULL, ..., block_sizes = NULL, pi = NULL, sort_nodes = TRUE, force_identifiability = FALSE, poisson_edges = TRUE, allow_self_loops = TRUE)Arguments
n | (degree heterogeneity) The number of nodes in the blockmodel.Use when you don't want to specify the degree-heterogeneityparameters |
theta | (degree heterogeneity) A numeric vectorexplicitly specifying the degree heterogeneityparameters. This implicitly determines the number of nodesin the resulting graph, i.e. it will have |
k | (mixing matrix) The number of blocks in the blockmodel.Use when you don't want to specify the mixing-matrix by hand.When |
B | (mixing matrix) A |
... | Arguments passed on to
|
block_sizes | (block sizes) Number of nodes in each block,as a vector of integers. Must match the dimensions of |
pi | (block sizes) Relative blockprobabilities. Must be positive, but do not need to sumto one, as they will be normalized internally.Must match the dimensions of |
sort_nodes | Logical indicating whether or not to sort the nodesso that they are grouped by block and by |
force_identifiability | Logical indicating whether or not tonormalize |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
Value
Anundirected_dcsbm S3 object, a subclass of theundirected_factor_model() with the following additionalfields:
theta: A numeric vector of degree-heterogeneity parameters.z: The community memberships of each node, as afactor().The factor will haveklevels, wherekis the number ofcommunities in the stochastic blockmodel. There will notalways necessarily be observed nodes in each community.pi: Sampling probabilities for each block.sorted: Logical indicating where nodes are arranged byblock (and additionally by degree heterogeneity parameter)within each block.
Generative Model
There are two levels of randomness in a degree-correctedstochastic blockmodel. First, we randomly chose a blockmembership for each node in the blockmodel. This ishandled bydcsbm(). Then, given these block memberships,we randomly sample edges between nodes. This secondoperation is handled bysample_edgelist(),sample_sparse(),sample_igraph() andsample_tidygraph(), depending depending on your desiredgraph representation.
Block memberships
Letz_i represent the block membership of nodei.To generatez_i we sample from a categoricaldistribution (note that this is a special case of amultinomial) with parameter\pi, such that\pi_i represents the probability of ending up inthe ith block. Block memberships for each node are independent.
Degree heterogeneity
In addition to block membership, the DCSBM also allowsnodes to have different propensities for edge formation.We represent this propensity for nodei by a positivenumber\theta_i. Typically the\theta_i areconstrained to sum to one for identifiability purposes,but this doesn't really matter during sampling (i.e.without the sum constraint scalingB and\thetahas the same effect on edge probabilities, but whetherB or\theta is responsible for this changeis uncertain).
Edge formulation
Once we know the block membershipsz and the degreeheterogeneity parameterstheta, we need one moreingredient, which is the baseline intensity of connectionsbetween nodes in blocki and blockj. Then each edgeA_{i,j} is Poisson distributed with parameter
\lambda[i, j] = \theta_i \cdot B_{z_i, z_j} \cdot \theta_j.
See Also
Other stochastic block models:directed_dcsbm(),mmsbm(),overlapping_sbm(),planted_partition(),sbm()
Other undirected graphs:chung_lu(),erdos_renyi(),mmsbm(),overlapping_sbm(),planted_partition(),sbm()
Examples
set.seed(27)lazy_dcsbm <- dcsbm(n = 100, k = 5, expected_density = 0.01)lazy_dcsbm# sometimes you gotta let the world burn and# sample a wildly dense graphdense_lazy_dcsbm <- dcsbm(n = 50, k = 3, expected_density = 0.8)dense_lazy_dcsbm# explicitly setting the degree heterogeneity parameter,# mixing matrix, and relative community sizes rather# than using randomly generated defaultsk <- 5n <- 100B <- matrix(stats::runif(k * k), nrow = k, ncol = k)theta <- round(stats::rlnorm(n, 2))pi <- c(1, 2, 4, 1, 1)custom_dcsbm <- dcsbm( theta = theta, B = B, pi = pi, expected_degree = 50)custom_dcsbmedgelist <- sample_edgelist(custom_dcsbm)edgelistdcsbm_explicit_block_sizes <- dcsbm( theta = rexp(100, 1 / 3) + 1, B = B, block_sizes = c(13, 17, 40, 14, 16), expected_degree = 5)# respects block sizessummary(dcsbm_explicit_block_sizes$z)# efficient eigendecompostion that leverages low-rank structure in# E(A) so that you don't have to form E(A) to find eigenvectors,# as E(A) is typically dense. computation is# handled via RSpectrapopulation_eigs <- eigs_sym(custom_dcsbm)Create a directed degree corrected stochastic blockmodel object
Description
To specify a degree-corrected stochastic blockmodel, you must specifythe degree-heterogeneity parameters (vian ortheta_out andtheta_in), the mixing matrix(viak_out andk_in, orB), and the relative blockprobabilities (optional, viap_out andpi_in).We provide defaults for most of theseoptions to enable rapid exploration, or you can invest the effortfor more control over the model parameters. Westrongly recommendsetting theexpected_out_degree,expected_in_degree,orexpected_density argumentto avoid large memory allocations associated withsampling large, dense graphs.
Usage
directed_dcsbm( n = NULL, theta_out = NULL, theta_in = NULL, k_out = NULL, k_in = NULL, B = NULL, ..., pi_out = rep(1/k_out, k_out), pi_in = rep(1/k_in, k_in), sort_nodes = TRUE, force_identifiability = TRUE, poisson_edges = TRUE, allow_self_loops = TRUE)Arguments
n | (degree heterogeneity) The number of nodes in the blockmodel.Use when you don't want to specify the degree-heterogeneityparameters |
theta_out | (degree heterogeneity) A numeric vectorexplicitly specifying the degree heterogeneityparameters. This implicitly determines the number of nodesin the resulting graph, i.e. it will have |
theta_in | (degree heterogeneity) A numeric vectorexplicitly specifying the degree heterogeneityparameters. This implicitly determines the number of nodesin the resulting graph, i.e. it will have |
k_out | (mixing matrix) The number of outgoing blocks in the blockmodel.Use when you don't want to specify the mixing-matrix by hand.When |
k_in | (mixing matrix) The number of incoming blocks in the blockmodel.Use when you don't want to specify the mixing-matrix by hand.When |
B | (mixing matrix) A |
... | Arguments passed on to
|
pi_out | (relative block probabilities) Relative blockprobabilities. Must be positive, but do not need to sumto one, as they will be normalized internally.Must match the rows of |
pi_in | (relative block probabilities) Relative blockprobabilities. Must be positive, but do not need to sumto one, as they will be normalized internally.Must match the columns of |
sort_nodes | Logical indicating whether or not to sort the nodesso that they are grouped by block. Useful for plotting.Defaults to |
force_identifiability | Logical indicating whether or not tonormalize |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
Value
Adirected_dcsbm S3 object, a subclass of thedirected_factor_model() with the following additionalfields:
theta_out: A numeric vector of incoming communitydegree-heterogeneity parameters.theta_in: A numeric vector of outgoing communitydegree-heterogeneity parameters.z_out: The incoming community memberships of each node,as afactor(). The factor will havek_outlevels,wherek_outis the number of incomingcommunities in the stochastic blockmodel. There will notalways necessarily be observed nodes in each community.z_in: The outgoing community memberships of each node,as afactor(). The factor will havek_inlevels,wherek_inis the number of outgoingcommunities in the stochastic blockmodel. There will notalways necessarily be observed nodes in each community.pi_out: Sampling probabilities for each incomingcommunity.pi_in: Sampling probabilities for each outgoingcommunity.sorted: Logical indicating where nodes are arranged byblock (and additionally by degree heterogeneity parameter)within each block.
Generative Model
There are two levels of randomness in a directed degree-correctedstochastic blockmodel. First, we randomly chose a incomingblock membership and an outgoing block membershipfor each node in the blockmodel. This ishandled bydirected_dcsbm(). Then, given these block memberships,we randomly sample edges between nodes. This secondoperation is handled bysample_edgelist(),sample_sparse(),sample_igraph() andsample_tidygraph(), depending on your desiredgraph representation.
Block memberships
Letx represent the incoming block membership of a nodeandy represent the outgoing block membership of a node.To generatex we sample from a categoricaldistribution with parameter\pi_out.To generatey we sample from a categoricaldistribution with parameter\pi_in.Block memberships are independent across nodes. Incoming and outgoingblock memberships of the same node are also independent.
Degree heterogeneity
In addition to block membership, the DCSBM alsonodes to have different propensities for incoming andoutgoing edge formation.We represent the propensity to form incoming edges for agiven node by a positive number\theta_out.We represent the propensity to form outgoing edges for agiven node by a positive number\theta_in.Typically the\theta_out (andtheta_in) across all nodes areconstrained to sum to one for identifiability purposes,but this doesn't really matter during sampling.
Edge formulation
Once we know the block membershipsx andyand the degree heterogeneity parameters\theta_{in} and\theta_{out}, we need one moreingredient, which is the baseline intensity of connectionsbetween nodes in blocki and blockj. Then each edge formsindependently according to a Poisson distribution withparameters
\lambda = \theta_{in} * B_{x, y} * \theta_{out}.
See Also
Other stochastic block models:dcsbm(),mmsbm(),overlapping_sbm(),planted_partition(),sbm()
Other directed graphs:directed_erdos_renyi()
Examples
set.seed(27)B <- matrix(0.2, nrow = 5, ncol = 8)diag(B) <- 0.9ddcsbm <- directed_dcsbm( n = 100, B = B, k_out = 5, k_in = 8, expected_density = 0.01)ddcsbmpopulation_svd <- svds(ddcsbm)Create an directed erdos renyi object
Description
Create an directed erdos renyi object
Usage
directed_erdos_renyi( n, ..., p = NULL, poisson_edges = TRUE, allow_self_loops = TRUE)Arguments
n | Number of nodes in graph. |
... | Arguments passed on to
|
p | Probability of an edge between any two nodes. You must specifyeither |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
Value
Adirected_factor_model S3 class based on a listwith the following elements:
X: The incoming latent positions as aMatrix::Matrix()object.S: The mixing matrix as aMatrix::Matrix()object.Y: The outgoing latent positions as aMatrix::Matrix()object.n: The number of nodes with incoming edges in the network.k1: The dimension of the latent node position vectorsencoding incoming latent communities (i.e. inX).d: The number of nodes with outgoing edges in the network.Does not need to matchn– rectangular adjacency matricesare supported.k2: The dimension of the latent node position vectorsencoding outgoing latent communities (i.e. inY).poisson_edges: Whether or not the graph is taken to be havePoisson or Bernoulli edges, as indicated by a logical vectorof length 1.allow_self_loops: Whether or not self loops are allowed.
See Also
Other erdos renyi:erdos_renyi()
Other directed graphs:directed_dcsbm()
Examples
set.seed(87)er <- directed_erdos_renyi(n = 10, p = 0.1)erbig_er <- directed_erdos_renyi(n = 1000, expected_in_degree = 5)big_erA <- sample_sparse(er)ACreate a directed factor model graph
Description
A directed factor model graph is a directedgeneralized Poisson random dot product graph. The edgesin this graph are assumpted to be independent and Poissondistributed. The graph is parameterized by its expectedadjacency matrix, with isE[A] = X S Y'. We do not recommendthat causal users use this function, see insteaddirected_dcsbm()and related functions, which will formulate common variantsof the stochastic blockmodels as undirected factor modelswith lots of helpful input validation.
Usage
directed_factor_model( X, S, Y, ..., expected_in_degree = NULL, expected_out_degree = NULL, expected_density = NULL, poisson_edges = TRUE, allow_self_loops = TRUE)Arguments
X | A |
S | A |
Y | A |
... | Ignored. For internal developer use only. |
expected_in_degree | If specified, the desired expected in degreeof the graph. Specifying |
expected_out_degree | If specified, the desired expected out degreeof the graph. Specifying |
expected_density | If specified, the desired expected densityof the graph. Specifying |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
Value
Adirected_factor_model S3 class based on a listwith the following elements:
X: The incoming latent positions as aMatrix::Matrix()object.S: The mixing matrix as aMatrix::Matrix()object.Y: The outgoing latent positions as aMatrix::Matrix()object.n: The number of nodes with incoming edges in the network.k1: The dimension of the latent node position vectorsencoding incoming latent communities (i.e. inX).d: The number of nodes with outgoing edges in the network.Does not need to matchn– rectangular adjacency matricesare supported.k2: The dimension of the latent node position vectorsencoding outgoing latent communities (i.e. inY).poisson_edges: Whether or not the graph is taken to be havePoisson or Bernoulli edges, as indicated by a logical vectorof length 1.allow_self_loops: Whether or not self loops are allowed.
Examples
n <- 1000k1 <- 5k2 <- 3d <- 500X <- matrix(rpois(n = n * k1, 1), nrow = n)S <- matrix(runif(n = k1 * k2, 0, .1), nrow = k1, ncol = k2)Y <- matrix(rexp(n = k2 * d, 1), nrow = d)fm <- directed_factor_model(X, S, Y)fmfm2 <- directed_factor_model(X, S, Y, expected_in_degree = 50)fm2Compute the eigendecomposition of the expected adjacency matrix of an undirected factor model
Description
Compute the eigendecomposition of the expected adjacency matrix of an undirected factor model
Usage
## S3 method for class 'undirected_factor_model'eigs_sym(A, k = A$k, which = "LM", sigma = NULL, opts = list(), ...)Arguments
A | |
k | Desired rank of decomposition. |
which | Selection criterion. SeeDetails below. |
sigma | Shift parameter. See sectionShift-And-Invert Mode. |
opts | Control parameters related to the computingalgorithm. SeeDetails below. |
... | Unused, included only for consistency with generic signature. |
Details
Thewhich argument is a character stringthat specifies the type of eigenvalues to be computed.Possible values are:
| "LM" | Thek eigenvalues with largest magnitude. Here the magnitude means the Euclidean norm of complex numbers. |
| "SM" | Thek eigenvalues with smallest magnitude. |
| "LR" | Thek eigenvalues with largest real part. |
| "SR" | Thek eigenvalues with smallest real part. |
| "LI" | Thek eigenvalues with largest imaginary part. |
| "SI" | Thek eigenvalues with smallest imaginary part. |
| "LA" | Thek largest (algebraic) eigenvalues, considering any negative sign. |
| "SA" | Thek smallest (algebraic) eigenvalues, considering any negative sign. |
| "BE" | Computek eigenvalues, half from each end of the spectrum. Whenk is odd, compute more from the high and then from the low end. |
eigs() with matrix types "matrix", "dgeMatrix", "dgCMatrix"and "dgRMatrix" can use "LM", "SM", "LR", "SR", "LI" and "SI".
eigs_sym() with all supported matrix types,andeigs() with symmetric matrix types("dsyMatrix", "dsCMatrix", and "dsRMatrix") can use "LM", "SM", "LA", "SA" and "BE".
Theopts argument is a list that can supply any of thefollowing parameters:
ncvNumber of Lanzcos basis vectors to use. More vectorswill result in faster convergence, but with greatermemory use. For general matrix,
ncvmust satisfyk+2\le ncv \le n, andfor symmetric matrix, the constraint isk < ncv \le n.Default ismin(n, max(2*k+1, 20)).tolPrecision parameter. Default is 1e-10.
maxitrMaximum number of iterations. Default is 1000.
retvecWhether to compute eigenvectors. If FALSE,only calculate and return eigenvalues.
initvecInitial vector of length
nsupplied to theArnoldi/Lanczos iteration. It may speed up the convergenceifinitvecis close to an eigenvector ofA.
Create an undirected erdos renyi object
Description
Create an undirected erdos renyi object
Usage
erdos_renyi(n, ..., p = NULL, poisson_edges = TRUE, allow_self_loops = TRUE)Arguments
n | Number of nodes in graph. |
... | Arguments passed on to
|
p | Probability of an edge between any two nodes. You must specifyeither |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
Value
Anundirected_factor_model S3 class based on a listwith the following elements:
X: The latent positions as aMatrix::Matrix()object.S: The mixing matrix as aMatrix::Matrix()object.n: The number of nodes in the network.k: The rank of expectation matrix. Equivalently,the dimension of the latent node position vectors.
See Also
Other erdos renyi:directed_erdos_renyi()
Other undirected graphs:chung_lu(),dcsbm(),mmsbm(),overlapping_sbm(),planted_partition(),sbm()
Examples
set.seed(87)er <- erdos_renyi(n = 10, p = 0.1)erer <- erdos_renyi(n = 10, expected_density = 0.1)erbig_er <- erdos_renyi(n = 1000, expected_degree = 5)big_erA <- sample_sparse(er)ACalculate the expected adjacency matrix
Description
Calculate the expected adjacency matrix
Usage
expectation(model, ...)## S3 method for class 'undirected_factor_model'expectation(model, ...)## S3 method for class 'directed_factor_model'expectation(model, ...)Arguments
model | A |
... | Unused. |
Value
The expected value of the adjacency matrix, conditional on thelatent factorsX andY (if the model is directed).
Calculate the expected edges in Poisson RDPG graph
Description
These calculations are conditional on the latent factorsX andY.
Usage
expected_edges(factor_model, ...)expected_degrees(factor_model, ...)expected_degree(factor_model, ...)expected_in_degree(factor_model, ...)expected_out_degree(factor_model, ...)expected_density(factor_model, ...)Arguments
factor_model | |
... | Ignored. Do not use. |
Details
Note that the runtime of thefastRG algorithm is proportional tothe expected number of edges in the graph. Expected edge count will bean underestimate of expected number of edges for Bernoulligraphs. See the Rohe et al for details.
Value
Expected edge counts, or graph densities.
References
Rohe, Karl, Jun Tao, Xintian Han, and Norbert Binkiewicz. 2017."A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation."Journal of Machine Learning Research; 19(77):1-13, 2018.https://www.jmlr.org/papers/v19/17-128.html
Examples
##### an undirected blockmodel examplen <- 100pop <- n / 2a <- .1b <- .05B <- matrix(c(a, b, b, a), nrow = 2)b_model <- sbm(n = n, B = B, poisson_edges = FALSE)b_modelA <- sample_sparse(b_model)# comparemean(rowSums(triu(A)))pop * a + pop * b # analytical average degree##### more generic examplesn <- 1000k <- 5X <- matrix(rpois(n = n * k, 1), nrow = n)S <- matrix(runif(n = k * k, 0, .1), nrow = k)ufm <- undirected_factor_model(X, S)expected_edges(ufm)expected_degree(ufm)eigs_sym(ufm)n <- 1000d <- 100k1 <- 5k2 <- 3X <- matrix(rpois(n = n * k1, 1), nrow = n)Y <- matrix(rpois(n = d * k2, 1), nrow = d)S <- matrix(runif(n = k1 * k2, 0, .1), nrow = k1)dfm <- directed_factor_model(X = X, S = S, Y = Y)expected_edges(dfm)expected_in_degree(dfm)expected_out_degree(dfm)svds(dfm)Create an undirected degree-corrected mixed membership stochastic blockmodel object
Description
To specify a degree-corrected mixed membership stochastic blockmodel, you must specifythe degree-heterogeneity parameters (vian ortheta),the mixing matrix (viak orB), and the relative blockpropensities (optional, viaalpha). We provide defaults for most of theseoptions to enable rapid exploration, or you can invest the effortfor more control over the model parameters. Westrongly recommendsetting theexpected_degree orexpected_density argumentto avoid large memory allocations associated withsampling large, dense graphs.
Usage
mmsbm( n = NULL, theta = NULL, k = NULL, B = NULL, ..., alpha = rep(1, k), sort_nodes = TRUE, force_pure = TRUE, poisson_edges = TRUE, allow_self_loops = TRUE)Arguments
n | (degree heterogeneity) The number of nodes in the blockmodel.Use when you don't want to specify the degree-heterogeneityparameters |
theta | (degree heterogeneity) A numeric vectorexplicitly specifying the degree heterogeneityparameters. This implicitly determines the number of nodesin the resulting graph, i.e. it will have |
k | (mixing matrix) The number of blocks in the blockmodel.Use when you don't want to specify the mixing-matrix by hand.When |
B | (mixing matrix) A |
... | Arguments passed on to
|
alpha | (relative block propensities) Relative blockpropensities, which are parameters of a Dirichlet distribution.All elments of |
sort_nodes | Logical indicating whether or not to sort the nodesso that they are grouped by block and by |
force_pure | Logical indicating whether or not to force presence of"pure nodes" (nodes that belong only to a single community) for the sakeof identifiability. To include pure nodes, block membership samplingfirst proceeds as per usual. Then, after it is complete, |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
Value
Anundirected_mmsbm S3 object, a subclass of theundirected_factor_model() with the following additionalfields:
theta: A numeric vector of degree-heterogeneity parameters.Z: The community memberships of each node, amatrix()withkcolumns, whose row sums all equal one.alpha: Community membership proportion propensities.sorted: Logical indicating where nodes are arranged byblock (and additionally by degree heterogeneity parameter)within each block.
Generative Model
There are two levels of randomness in a degree-correctedstochastic blockmodel. First, we randomly choose how mucheach node belongs to each block in the blockmodel. Each nodeis one unit of block membership to distribute. This ishandled bymmsbm(). Then, given these block memberships,we randomly sample edges between nodes. This secondoperation is handled bysample_edgelist(),sample_sparse(),sample_igraph() andsample_tidygraph(), depending depending on your desiredgraph representation.
Block memberships
LetZ_i by a vector on thek dimensional simplexrepresenting the block memberships of nodei.To generatez_i we sample from a Dirichletdistribution with parameter vector\alpha.Block memberships for each node are independent.
Degree heterogeneity
In addition to block membership, the MMSBM also allowsnodes to have different propensities for edge formation.We represent this propensity for nodei by a positivenumber\theta_i.
Edge formulation
Once we know the block membership vectorz_i, z_j and the degreeheterogeneity parameters\theta, we need one moreingredient, which is the baseline intensity of connectionsbetween nodes in blocki and blockj. This is given by ak \times k matrixB. Then each edgeA_{i,j} is Poisson distributed with parameter
\lambda_{i, j} = \theta_i \cdot z_i^T B z_j \cdot \theta_j.
See Also
Other stochastic block models:dcsbm(),directed_dcsbm(),overlapping_sbm(),planted_partition(),sbm()
Other undirected graphs:chung_lu(),dcsbm(),erdos_renyi(),overlapping_sbm(),planted_partition(),sbm()
Examples
set.seed(27)lazy_mmsbm <- mmsbm(n = 100, k = 5, expected_density = 0.01)lazy_mmsbm# sometimes you gotta let the world burn and# sample a wildly dense graphdense_lazy_mmsbm <- mmsbm(n = 500, k = 3, expected_density = 0.8)dense_lazy_mmsbm# explicitly setting the degree heterogeneity parameter,# mixing matrix, and relative community sizes rather# than using randomly generated defaultsk <- 5n <- 100B <- matrix(stats::runif(k * k), nrow = k, ncol = k)theta <- round(stats::rlnorm(n, 2))alpha <- c(1, 2, 4, 1, 1)custom_mmsbm <- mmsbm( theta = theta, B = B, alpha = alpha, expected_degree = 50)custom_mmsbmedgelist <- sample_edgelist(custom_mmsbm)edgelist# efficient eigendecompostion that leverages low-rank structure in# E(A) so that you don't have to form E(A) to find eigenvectors,# as E(A) is typically dense. computation is# handled via RSpectrapopulation_eigs <- eigs_sym(custom_mmsbm)svds(custom_mmsbm)$dCreate an undirected overlapping degree corrected stochastic blockmodel object
Description
To specify a overlapping stochastic blockmodel, you must specifythe number of nodes (vian),the mixing matrix (viak orB), and the blockprobabilities (optional, viapi). We provide defaults for most of theseoptions to enable rapid exploration, or you can invest the effortfor more control over the model parameters. Westrongly recommendsetting theexpected_degree orexpected_density argumentto avoid large memory allocations associated withsampling large, dense graphs.
Usage
overlapping_sbm( n, k = NULL, B = NULL, ..., pi = rep(1/k, k), sort_nodes = TRUE, force_pure = TRUE, poisson_edges = TRUE, allow_self_loops = TRUE)Arguments
n | The number of nodes in the overlapping SBM. |
k | (mixing matrix) The number of blocks in the blockmodel.Use when you don't want to specify the mixing-matrix by hand.When |
B | (mixing matrix) A |
... | Arguments passed on to
|
pi | (block probabilities) Probability of membership in eachblock. Membership in each block is independent under theoverlapping SBM. Defaults to |
sort_nodes | Logical indicating whether or not to sort the nodesso that they are grouped by block. Useful for plotting.Defaults to |
force_pure | Logical indicating whether or not to force presence of"pure nodes" (nodes that belong only to a single community) for the sakeof identifiability. To include pure nodes, block membership samplingfirst proceeds as per usual. Then, after it is complete, |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
Value
Anundirected_overlapping_sbm S3 object, a subclass of theundirected_factor_model() with the following additionalfields:
pi: Sampling probabilities for each block.sorted: Logical indicating where nodes are arranged byblock (and additionally by degree heterogeneity parameter)within each block.
Generative Model
There are two levels of randomness in a degree-correctedoverlapping stochastic blockmodel. First, for each node, weindependently determine if that node is a member of each block. This ishandled byoverlapping_sbm(). Then, given these block memberships,we randomly sample edges between nodes. This secondoperation is handled bysample_edgelist(),sample_sparse(),sample_igraph() andsample_tidygraph(), depending depending on your desiredgraph representation.
Identifiability
In order to be identifiable, an overlapping SBM must satisfy two conditions:
Bmust be invertible, andthe must be at least one "pure node" in each block that belongs to noother blocks.
Block memberships
Note that some nodes may not belong to any blocks.
TODO
Edge formulation
Once we know the block memberships, we need one moreingredient, which is the baseline intensity of connectionsbetween nodes in blocki and blockj. Then each edgeA_{i,j} is Poisson distributed with parameter
TODO
References
Kaufmann, Emilie, Thomas Bonald, and Marc Lelarge."A Spectral Algorithm with Additive Clustering for the Recovery ofOverlapping Communities in Networks," Vol. 9925.Lecture Notes in Computer Science.Cham: Springer International Publishing, 2016.https://doi.org/10.1007/978-3-319-46379-7.
Latouche, Pierre, Etienne Birmelé, and Christophe Ambroise."Overlapping Stochastic Block Models with Application to theFrench Political Blogosphere." The Annals of Applied Statistics 5,no. 1 (March 2011): 309–36. https://doi.org/10.1214/10-AOAS382.
Zhang, Yuan, Elizaveta Levina, and Ji Zhu. "DetectingOverlapping Communities in Networks Using Spectral Methods."ArXiv:1412.3432, December 10, 2014. http://arxiv.org/abs/1412.3432.
See Also
Other stochastic block models:dcsbm(),directed_dcsbm(),mmsbm(),planted_partition(),sbm()
Other undirected graphs:chung_lu(),dcsbm(),erdos_renyi(),mmsbm(),planted_partition(),sbm()
Examples
set.seed(27)lazy_overlapping_sbm <- overlapping_sbm(n = 1000, k = 5, expected_density = 0.01)lazy_overlapping_sbm# sometimes you gotta let the world burn and# sample a wildly dense graphdense_lazy_overlapping_sbm <- overlapping_sbm(n = 500, k = 3, expected_density = 0.8)dense_lazy_overlapping_sbmk <- 5n <- 1000B <- matrix(stats::runif(k * k), nrow = k, ncol = k)pi <- c(1, 2, 4, 1, 1) / 5custom_overlapping_sbm <- overlapping_sbm( n = 200, B = B, pi = pi, expected_degree = 5)custom_overlapping_sbmedgelist <- sample_edgelist(custom_overlapping_sbm)edgelist# efficient eigendecompostion that leverages low-rank structure in# E(A) so that you don't have to form E(A) to find eigenvectors,# as E(A) is typically dense. computation is# handled via RSpectrapopulation_eigs <- eigs_sym(custom_overlapping_sbm)Create an undirected planted partition object
Description
To specify a planted partition model, you must specifythe number of nodes (vian), the mixing matrix (optional, either viawithin_block/between_block ora/b),and the relative block probabilites (optional, viapi).We provide defaults for most of these options to enablerapid exploration, or you can invest the effortfor more control over the model parameters. Westrongly recommendsetting theexpected_degree orexpected_density argumentto avoid large memory allocations associated withsampling large, dense graphs.
Usage
planted_partition( n, k, ..., within_block = NULL, between_block = NULL, a = NULL, b = NULL, block_sizes = NULL, pi = NULL, sort_nodes = TRUE, poisson_edges = TRUE, allow_self_loops = TRUE)Arguments
n | The number of nodes in the network. Must bea positive integer. This argument is required. |
k | Number of planted partitions, as a positive integer.This argument is required. |
... | Arguments passed on to
|
within_block | Probability of within block edges. Must bestrictly between zero and one. Must specify either |
between_block | Probability of between block edges. Must bestrictly between zero and one. Must specify either |
a | Integer such that |
b | Integer such that |
block_sizes | (block sizes) Number of nodes in each block,as a vector of integers. Must match the dimensions of |
pi | (block sizes) Relative blockprobabilities. Must be positive, but do not need to sumto one, as they will be normalized internally.Must match the dimensions of |
sort_nodes | Logical indicating whether or not to sort the nodesso that they are grouped by block and by |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
Details
A planted partition model is stochastic blockmodel in whichthe diagonal and the off-diagonal of the mixing matrixBare both constant. This means that edge probabilitiesdepend only on whether two nodes belong to the same block,or to different blocks, but the particular blocks themselvesdon't have any impact apart from this.
Value
Anundirected_planted_partition S3 object, which is a subclassof thesbm() object, with additional fields:
within_block: The probability of edge formation within a block.between_block: The probability of edge formation between two distinctblocks.
See Also
Other stochastic block models:dcsbm(),directed_dcsbm(),mmsbm(),overlapping_sbm(),sbm()
Other undirected graphs:chung_lu(),dcsbm(),erdos_renyi(),mmsbm(),overlapping_sbm(),sbm()
Examples
set.seed(27)lazy_pp <- planted_partition( n = 100, k = 5, expected_density = 0.01, within_block = 0.1, between_block = 0.01)lazy_ppPlot (expected) adjacency matrices
Description
Plot (expected) adjacency matrices
Usage
plot_expectation(model)plot_dense_matrix(A, ...)plot_sparse_matrix(A)Arguments
model | A |
A | A |
... | Unused. |
Value
Aggplot2::ggplot2() plot.
Examples
set.seed(27)model <- dcsbm(n = 10, k = 2, expected_density = 0.2)plot_expectation(model)A <- sample_sparse(model)plot_sparse_matrix(A)Objects exported from other packages
Description
These objects are imported from other packages. Follow the linksbelow to see their documentation.
Sample a random edgelist from a random dot product graph
Description
There are two steps to using thefastRG package. First,you must parameterize a random dot product graph bysampling the latent factors. Use functions such asdcsbm(),sbm(), etc, to perform this specification.Then, usesample_*() functions to generate a random graphin your preferred format.
Usage
sample_edgelist(factor_model, ...)## S3 method for class 'undirected_factor_model'sample_edgelist(factor_model, ...)## S3 method for class 'directed_factor_model'sample_edgelist(factor_model, ...)Arguments
factor_model | |
... | Ignored. Do not use. |
Details
This function implements thefastRG algorithm asdescribed in Rohe et al (2017). Please see the paper(which is short and open access!!) for details.
Value
A single realization of a random Poisson (or Bernoulli)Dot Product Graph, represented as atibble::tibble() with twointeger columns,from andto.
NOTE: Indices for isolated nodes will not appear in the edgelist!This can lead to issues if you construct network objects from theedgelist directly.
In the undirected case,from andto do not encodeinformation about edge direction, but we will always havefrom <= to for convenience of edge identification.
To avoid handling such considerations yourself, we recommend usingsample_sparse(),sample_igraph(), andsample_tidygraph()oversample_edgelist().
References
Rohe, Karl, Jun Tao, Xintian Han, and Norbert Binkiewicz. 2017."A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation."Journal of Machine Learning Research; 19(77):1-13, 2018.https://www.jmlr.org/papers/v19/17-128.html
See Also
Other samplers:sample_edgelist.matrix(),sample_igraph(),sample_sparse(),sample_tidygraph()
Examples
library(igraph)library(tidygraph)set.seed(27)##### undirected examples ----------------------------n <- 100k <- 5X <- matrix(rpois(n = n * k, 1), nrow = n)S <- matrix(runif(n = k * k, 0, .1), nrow = k)# S will be symmetrized internal here, or left unchanged if# it is already symmetricufm <- undirected_factor_model( X, S, expected_density = 0.1)ufm### sampling graphs as edgelists ----------------------edgelist <- sample_edgelist(ufm)edgelist### sampling graphs as sparse matrices ----------------A <- sample_sparse(ufm)inherits(A, "dsCMatrix")isSymmetric(A)dim(A)B <- sample_sparse(ufm)inherits(B, "dsCMatrix")isSymmetric(B)dim(B)### sampling graphs as igraph graphs ------------------sample_igraph(ufm)### sampling graphs as tidygraph graphs ---------------sample_tidygraph(ufm)##### directed examples ----------------------------n2 <- 100k1 <- 5k2 <- 3d <- 50X <- matrix(rpois(n = n2 * k1, 1), nrow = n2)S <- matrix(runif(n = k1 * k2, 0, .1), nrow = k1, ncol = k2)Y <- matrix(rexp(n = k2 * d, 1), nrow = d)fm <- directed_factor_model(X, S, Y, expected_in_degree = 2)fm### sampling graphs as edgelists ----------------------edgelist2 <- sample_edgelist(fm)edgelist2### sampling graphs as sparse matrices ----------------A2 <- sample_sparse(fm)inherits(A2, "dgCMatrix")isSymmetric(A2)dim(A2)B2 <- sample_sparse(fm)inherits(B2, "dgCMatrix")isSymmetric(B2)dim(B2)### sampling graphs as igraph graphs ------------------# since the number of rows and the number of columns# in `fm` differ, we will get a bipartite igraph here# creating the bipartite igraph is slow relative to other# sampling -- if this is a blocker for# you please open an issue and we can investigate speedupsdig <- sample_igraph(fm)is_bipartite(dig)### sampling graphs as tidygraph graphs ---------------sample_tidygraph(fm)Low level interface to sample RPDG edgelists
Description
This is a brakes-off, no safety checks interface.We strongly recommend that you do not callsample_edgelist.matrix() unless you know what you are doing,and even then, we still do not recommend it, as you willbypass all typical input validation.extremely loud coughing All those who bypass inputvalidation suffer foolishly at their own hand.extremely loud coughing
Usage
## S3 method for class 'matrix'sample_edgelist( factor_model, S, Y, directed, poisson_edges, allow_self_loops, ...)## S3 method for class 'Matrix'sample_edgelist( factor_model, S, Y, directed, poisson_edges, allow_self_loops, ...)Arguments
factor_model | An |
S | A |
Y | A |
directed | Logical indicating whether or not the graph should bedirected. When |
poisson_edges | Whether or not to remove duplicate edgesafter sampling. See Section 2.3 of Rohe et al. (2017)for some additional details. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
... | Ignored, for generic consistency only. |
Details
This function implements thefastRG algorithm asdescribed in Rohe et al (2017). Please see the paper(which is short and open access!!) for details.
Value
A single realization of a random Poisson (or Bernoulli)Dot Product Graph, represented as atibble::tibble() with twointeger columns,from andto.
NOTE: Indices for isolated nodes will not appear in the edgelist!This can lead to issues if you construct network objects from theedgelist directly.
In the undirected case,from andto do not encodeinformation about edge direction, but we will always havefrom <= to for convenience of edge identification.
To avoid handling such considerations yourself, we recommend usingsample_sparse(),sample_igraph(), andsample_tidygraph()oversample_edgelist().
References
Rohe, Karl, Jun Tao, Xintian Han, and Norbert Binkiewicz. 2017."A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation."Journal of Machine Learning Research; 19(77):1-13, 2018.https://www.jmlr.org/papers/v19/17-128.html
See Also
Other samplers:sample_edgelist(),sample_igraph(),sample_sparse(),sample_tidygraph()
Examples
set.seed(46)n <- 10000d <- 1000k1 <- 5k2 <- 3X <- matrix(rpois(n = n * k1, 1), nrow = n)S <- matrix(runif(n = k1 * k2, 0, .1), nrow = k1)Y <- matrix(rpois(n = d * k2, 1), nrow = d)sample_edgelist(X, S, Y, TRUE, TRUE, TRUE)Sample a random dot product graph as an igraph graph
Description
There are two steps to using thefastRG package. First,you must parameterize a random dot product graph bysampling the latent factors. Use functions such asdcsbm(),sbm(), etc, to perform this specification.Then, usesample_*() functions to generate a random graphin your preferred format.
Usage
sample_igraph(factor_model, ...)## S3 method for class 'undirected_factor_model'sample_igraph(factor_model, ...)## S3 method for class 'directed_factor_model'sample_igraph(factor_model, ...)Arguments
factor_model | |
... | Ignored. Do not use. |
Details
This function implements thefastRG algorithm asdescribed in Rohe et al (2017). Please see the paper(which is short and open access!!) for details.
Value
Anigraph::igraph() object that is possibly amultigraph (that is, we take there to be multiple edgesrather than weighted edges).
Whenfactor_model isundirected:
- the graph is undirected and one-mode.
Whenfactor_model isdirected andsquare:
- the graph is directed and one-mode.
Whenfactor_model isdirected andrectangular:
- the graph is undirected and bipartite.
Note that working with bipartite graphs inigraph is morecomplex than working with one-mode graphs.
References
Rohe, Karl, Jun Tao, Xintian Han, and Norbert Binkiewicz. 2017."A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation."Journal of Machine Learning Research; 19(77):1-13, 2018.https://www.jmlr.org/papers/v19/17-128.html
See Also
Other samplers:sample_edgelist(),sample_edgelist.matrix(),sample_sparse(),sample_tidygraph()
Examples
library(igraph)library(tidygraph)set.seed(27)##### undirected examples ----------------------------n <- 100k <- 5X <- matrix(rpois(n = n * k, 1), nrow = n)S <- matrix(runif(n = k * k, 0, .1), nrow = k)# S will be symmetrized internal here, or left unchanged if# it is already symmetricufm <- undirected_factor_model( X, S, expected_density = 0.1)ufm### sampling graphs as edgelists ----------------------edgelist <- sample_edgelist(ufm)edgelist### sampling graphs as sparse matrices ----------------A <- sample_sparse(ufm)inherits(A, "dsCMatrix")isSymmetric(A)dim(A)B <- sample_sparse(ufm)inherits(B, "dsCMatrix")isSymmetric(B)dim(B)### sampling graphs as igraph graphs ------------------sample_igraph(ufm)### sampling graphs as tidygraph graphs ---------------sample_tidygraph(ufm)##### directed examples ----------------------------n2 <- 100k1 <- 5k2 <- 3d <- 50X <- matrix(rpois(n = n2 * k1, 1), nrow = n2)S <- matrix(runif(n = k1 * k2, 0, .1), nrow = k1, ncol = k2)Y <- matrix(rexp(n = k2 * d, 1), nrow = d)fm <- directed_factor_model(X, S, Y, expected_in_degree = 2)fm### sampling graphs as edgelists ----------------------edgelist2 <- sample_edgelist(fm)edgelist2### sampling graphs as sparse matrices ----------------A2 <- sample_sparse(fm)inherits(A2, "dgCMatrix")isSymmetric(A2)dim(A2)B2 <- sample_sparse(fm)inherits(B2, "dgCMatrix")isSymmetric(B2)dim(B2)### sampling graphs as igraph graphs ------------------# since the number of rows and the number of columns# in `fm` differ, we will get a bipartite igraph here# creating the bipartite igraph is slow relative to other# sampling -- if this is a blocker for# you please open an issue and we can investigate speedupsdig <- sample_igraph(fm)is_bipartite(dig)### sampling graphs as tidygraph graphs ---------------sample_tidygraph(fm)Sample a random dot product graph as a sparse Matrix
Description
There are two steps to using thefastRG package. First,you must parameterize a random dot product graph bysampling the latent factors. Use functions such asdcsbm(),sbm(), etc, to perform this specification.Then, usesample_*() functions to generate a random graphin your preferred format.
Usage
sample_sparse(factor_model, ...)## S3 method for class 'undirected_factor_model'sample_sparse(factor_model, ...)## S3 method for class 'directed_factor_model'sample_sparse(factor_model, ...)Arguments
factor_model | |
... | Ignored. Do not use. |
Details
This function implements thefastRG algorithm asdescribed in Rohe et al (2017). Please see the paper(which is short and open access!!) for details.
Value
For undirected factor models, a sparseMatrix::Matrix() of classdsCMatrix. In particular,this means theMatrix object (1) hasdouble data type, (2) is symmetric, and (3) is incolumn compressed storage format.
For directed factor models, a sparseMatrix::Matrix() of classdgCMatrix. This meanstheMatrix object (1) has double data type,(2) innot symmetric, and (3) is in columncompressed storage format.
To reiterate: for undirected graphs, you will geta symmetric matrix. For directed graphs, you willget a general sparse matrix.
References
Rohe, Karl, Jun Tao, Xintian Han, and Norbert Binkiewicz. 2017."A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation."Journal of Machine Learning Research; 19(77):1-13, 2018.https://www.jmlr.org/papers/v19/17-128.html
See Also
Other samplers:sample_edgelist(),sample_edgelist.matrix(),sample_igraph(),sample_tidygraph()
Examples
library(igraph)library(tidygraph)set.seed(27)##### undirected examples ----------------------------n <- 100k <- 5X <- matrix(rpois(n = n * k, 1), nrow = n)S <- matrix(runif(n = k * k, 0, .1), nrow = k)# S will be symmetrized internal here, or left unchanged if# it is already symmetricufm <- undirected_factor_model( X, S, expected_density = 0.1)ufm### sampling graphs as edgelists ----------------------edgelist <- sample_edgelist(ufm)edgelist### sampling graphs as sparse matrices ----------------A <- sample_sparse(ufm)inherits(A, "dsCMatrix")isSymmetric(A)dim(A)B <- sample_sparse(ufm)inherits(B, "dsCMatrix")isSymmetric(B)dim(B)### sampling graphs as igraph graphs ------------------sample_igraph(ufm)### sampling graphs as tidygraph graphs ---------------sample_tidygraph(ufm)##### directed examples ----------------------------n2 <- 100k1 <- 5k2 <- 3d <- 50X <- matrix(rpois(n = n2 * k1, 1), nrow = n2)S <- matrix(runif(n = k1 * k2, 0, .1), nrow = k1, ncol = k2)Y <- matrix(rexp(n = k2 * d, 1), nrow = d)fm <- directed_factor_model(X, S, Y, expected_in_degree = 2)fm### sampling graphs as edgelists ----------------------edgelist2 <- sample_edgelist(fm)edgelist2### sampling graphs as sparse matrices ----------------A2 <- sample_sparse(fm)inherits(A2, "dgCMatrix")isSymmetric(A2)dim(A2)B2 <- sample_sparse(fm)inherits(B2, "dgCMatrix")isSymmetric(B2)dim(B2)### sampling graphs as igraph graphs ------------------# since the number of rows and the number of columns# in `fm` differ, we will get a bipartite igraph here# creating the bipartite igraph is slow relative to other# sampling -- if this is a blocker for# you please open an issue and we can investigate speedupsdig <- sample_igraph(fm)is_bipartite(dig)### sampling graphs as tidygraph graphs ---------------sample_tidygraph(fm)Sample a random dot product graph as a tidygraph graph
Description
There are two steps to using thefastRG package. First,you must parameterize a random dot product graph bysampling the latent factors. Use functions such asdcsbm(),sbm(), etc, to perform this specification.Then, usesample_*() functions to generate a random graphin your preferred format.
Usage
sample_tidygraph(factor_model, ...)## S3 method for class 'undirected_factor_model'sample_tidygraph(factor_model, ...)## S3 method for class 'directed_factor_model'sample_tidygraph(factor_model, ...)Arguments
factor_model | |
... | Ignored. Do not use. |
Details
This function implements thefastRG algorithm asdescribed in Rohe et al (2017). Please see the paper(which is short and open access!!) for details.
Value
Atidygraph::tbl_graph() object that is possibly amultigraph (that is, we take there to be multiple edgesrather than weighted edges).
Whenfactor_model isundirected:
- the graph is undirected and one-mode.
Whenfactor_model isdirected andsquare:
- the graph is directed and one-mode.
Whenfactor_model isdirected andrectangular:
- the graph is undirected and bipartite.
Note that working with bipartite graphs intidygraph is morecomplex than working with one-mode graphs.
References
Rohe, Karl, Jun Tao, Xintian Han, and Norbert Binkiewicz. 2017."A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation."Journal of Machine Learning Research; 19(77):1-13, 2018.https://www.jmlr.org/papers/v19/17-128.html
See Also
Other samplers:sample_edgelist(),sample_edgelist.matrix(),sample_igraph(),sample_sparse()
Examples
library(igraph)library(tidygraph)set.seed(27)##### undirected examples ----------------------------n <- 100k <- 5X <- matrix(rpois(n = n * k, 1), nrow = n)S <- matrix(runif(n = k * k, 0, .1), nrow = k)# S will be symmetrized internal here, or left unchanged if# it is already symmetricufm <- undirected_factor_model( X, S, expected_density = 0.1)ufm### sampling graphs as edgelists ----------------------edgelist <- sample_edgelist(ufm)edgelist### sampling graphs as sparse matrices ----------------A <- sample_sparse(ufm)inherits(A, "dsCMatrix")isSymmetric(A)dim(A)B <- sample_sparse(ufm)inherits(B, "dsCMatrix")isSymmetric(B)dim(B)### sampling graphs as igraph graphs ------------------sample_igraph(ufm)### sampling graphs as tidygraph graphs ---------------sample_tidygraph(ufm)##### directed examples ----------------------------n2 <- 100k1 <- 5k2 <- 3d <- 50X <- matrix(rpois(n = n2 * k1, 1), nrow = n2)S <- matrix(runif(n = k1 * k2, 0, .1), nrow = k1, ncol = k2)Y <- matrix(rexp(n = k2 * d, 1), nrow = d)fm <- directed_factor_model(X, S, Y, expected_in_degree = 2)fm### sampling graphs as edgelists ----------------------edgelist2 <- sample_edgelist(fm)edgelist2### sampling graphs as sparse matrices ----------------A2 <- sample_sparse(fm)inherits(A2, "dgCMatrix")isSymmetric(A2)dim(A2)B2 <- sample_sparse(fm)inherits(B2, "dgCMatrix")isSymmetric(B2)dim(B2)### sampling graphs as igraph graphs ------------------# since the number of rows and the number of columns# in `fm` differ, we will get a bipartite igraph here# creating the bipartite igraph is slow relative to other# sampling -- if this is a blocker for# you please open an issue and we can investigate speedupsdig <- sample_igraph(fm)is_bipartite(dig)### sampling graphs as tidygraph graphs ---------------sample_tidygraph(fm)Create an undirected stochastic blockmodel object
Description
To specify a stochastic blockmodel, you must specifythe number of nodes (vian), the mixing matrix (viak orB),and the relative block probabilites (optional, viapi).We provide defaults for most of these options to enablerapid exploration, or you can invest the effortfor more control over the model parameters. Westrongly recommendsetting theexpected_degree orexpected_density argumentto avoid large memory allocations associated withsampling large, dense graphs.
Usage
sbm( n, k = NULL, B = NULL, ..., block_sizes = NULL, pi = NULL, sort_nodes = TRUE, poisson_edges = TRUE, allow_self_loops = TRUE)Arguments
n | The number of nodes in the network. Must bea positive integer. This argument is required. |
k | (mixing matrix) The number of blocks in the blockmodel.Use when you don't want to specify the mixing-matrix by hand.When |
B | (mixing matrix) A |
... | Arguments passed on to
|
block_sizes | (block sizes) Number of nodes in each block,as a vector of integers. Must match the dimensions of |
pi | (block sizes) Relative blockprobabilities. Must be positive, but do not need to sumto one, as they will be normalized internally.Must match the dimensions of |
sort_nodes | Logical indicating whether or not to sort the nodesso that they are grouped by block and by |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
Details
A stochastic block is equivalent to a degree-correctedstochastic blockmodel where the degree heterogeneity parametershave all been set equal to 1.
Value
Anundirected_sbm S3 object, which is a subclass of thedcsbm() object.
See Also
Other stochastic block models:dcsbm(),directed_dcsbm(),mmsbm(),overlapping_sbm(),planted_partition()
Other undirected graphs:chung_lu(),dcsbm(),erdos_renyi(),mmsbm(),overlapping_sbm(),planted_partition()
Examples
set.seed(27)lazy_sbm <- sbm(n = 100, k = 5, expected_density = 0.01)lazy_sbm# by default we get a multigraph (i.e. multiple edges are# allowed between the same two nodes). using bernoulli edges# will with an adjacency matrix with only zeroes and onesbernoulli_sbm <- sbm( n = 500, k = 30, poisson_edges = FALSE, expected_degree = 8)bernoulli_sbmedgelist <- sample_edgelist(bernoulli_sbm)edgelistA <- sample_sparse(bernoulli_sbm)# only zeroes and ones!sign(A)# sbm with repeated eigenvalues# block sizes equal by default, needed to prevent variation in spectrum# from variation in block sizes. also need B to have a single repeated# eigenvaluerepeated_eigen <- sbm( n = 100, B = diag(rep(0.8, 5)), expected_degree = 10)# exactly repeated eigenvalues in the populatione <- eigs_sym(repeated_eigen)e$valuesCompute the singular value decomposition of the expected adjacency matrix of a directed factor model
Description
Compute the singular value decomposition of the expected adjacency matrix of a directed factor model
Usage
## S3 method for class 'directed_factor_model'svds(A, k = min(A$k1, A$k2), nu = k, nv = k, opts = list(), ...)Arguments
A | |
k | Desired rank of decomposition. |
nu | Number of left singular vectors to be computed. This mustbe between 0 and |
nv | Number of right singular vectors to be computed. This mustbe between 0 and |
opts | Control parameters related to the computingalgorithm. SeeDetails below. |
... | Unused, included only for consistency with generic signature. |
Details
Theopts argument is a list that can supply any of thefollowing parameters:
ncvNumber of Lanzcos basis vectors to use. More vectorswill result in faster convergence, but with greatermemory use.
ncvmust be satisfyk < ncv \le pwherep = min(m, n).Default ismin(p, max(2*k+1, 20)).tolPrecision parameter. Default is 1e-10.
maxitrMaximum number of iterations. Default is 1000.
centerEither a logical value (
TRUE/FALSE), or a numericvector of lengthn. If a vectorcis supplied, thenSVD is computed on the matrixA - 1c',in an implicit way without actually forming this matrix.center = TRUEhas the same effect ascenter = colMeans(A). Default isFALSE.scaleEither a logical value (
TRUE/FALSE), or a numericvector of lengthn. If a vectorsis supplied, thenSVD is computed on the matrix(A - 1c')S,wherecis the centering vector andS = diag(1/s).Ifscale = TRUE, then the vectorsis computed asthe column norm ofA - 1c'.Default isFALSE.
Compute the singular value decomposition of the expected adjacency matrix of an undirected factor model
Description
Compute the singular value decomposition of the expected adjacency matrix of an undirected factor model
Usage
## S3 method for class 'undirected_factor_model'svds(A, k = A$k, nu = k, nv = k, opts = list(), ...)Arguments
A | |
k | Desired rank of decomposition. |
nu | Number of left singular vectors to be computed. This mustbe between 0 and |
nv | Number of right singular vectors to be computed. This mustbe between 0 and |
opts | Control parameters related to the computingalgorithm. SeeDetails below. |
... | Unused, included only for consistency with generic signature. |
Details
Theopts argument is a list that can supply any of thefollowing parameters:
ncvNumber of Lanzcos basis vectors to use. More vectorswill result in faster convergence, but with greatermemory use.
ncvmust be satisfyk < ncv \le pwherep = min(m, n).Default ismin(p, max(2*k+1, 20)).tolPrecision parameter. Default is 1e-10.
maxitrMaximum number of iterations. Default is 1000.
centerEither a logical value (
TRUE/FALSE), or a numericvector of lengthn. If a vectorcis supplied, thenSVD is computed on the matrixA - 1c',in an implicit way without actually forming this matrix.center = TRUEhas the same effect ascenter = colMeans(A). Default isFALSE.scaleEither a logical value (
TRUE/FALSE), or a numericvector of lengthn. If a vectorsis supplied, thenSVD is computed on the matrix(A - 1c')S,wherecis the centering vector andS = diag(1/s).Ifscale = TRUE, then the vectorsis computed asthe column norm ofA - 1c'.Default isFALSE.
Create an undirected factor model graph
Description
An undirected factor model graph is an undirectedgeneralized Poisson random dot product graph. The edgesin this graph are assumed to be independent and Poissondistributed. The graph is parameterized by its expectedadjacency matrix, which isE[A|X] = X S X'. We do not recommendthat casual users use this function, see insteaddcsbm()and related functions, which will formulate common variantsof the stochastic blockmodels as undirected factor modelswith lots of helpful input validation.
Usage
undirected_factor_model( X, S, ..., expected_degree = NULL, expected_density = NULL, poisson_edges = TRUE, allow_self_loops = TRUE)Arguments
X | A |
S | A |
... | Ignored. Must be empty. |
expected_degree | If specified, the desired expected degreeof the graph. Specifying |
expected_density | If specified, the desired expected densityof the graph. Specifying |
poisson_edges | Logical indicating whether or notmultiple edges are allowed to form between a pair ofnodes. Defaults to |
allow_self_loops | Logical indicating whether or notnodes should be allowed to form edges with themselves.Defaults to |
Value
Anundirected_factor_model S3 class based on a listwith the following elements:
X: The latent positions as aMatrix::Matrix()object.S: The mixing matrix as aMatrix::Matrix()object.n: The number of nodes in the network.k: The rank of expectation matrix. Equivalently,the dimension of the latent node position vectors.
Examples
n <- 100k <- 5X <- matrix(rpois(n = n * k, 1), nrow = n)S <- matrix(runif(n = k * k, 0, .1), nrow = k)ufm <- undirected_factor_model(X, S)ufmufm2 <- undirected_factor_model(X, S, expected_degree = 50)ufm2svds(ufm2)