| Type: | Package |
| Title: | Approximate and Exact Optimal Transport Methods |
| Version: | 1.2 |
| Date: | 2025-01-06 |
| Maintainer: | Eric Dunipace <edunipace@mail.harvard.edu> |
| Description: | R and C++ functions to perform exact and approximate optimal transport. All C++ methods can be linked to other R packages via their header files. |
| License: | GPL (== 3.0) |
| Imports: | Rcpp (≥ 1.0.3), stats |
| LinkingTo: | Rcpp, RcppEigen, RcppCGAL, BH |
| BugReports: | https://github.com/ericdunipace/approxOT/issues |
| Suggests: | testthat (≥ 2.1.0), transport |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| SystemRequirements: | C++17 |
| URL: | https://github.com/ericdunipace/approxOT |
| NeedsCompilation: | yes |
| Packaged: | 2025-01-09 00:44:28 UTC; eifer |
| Author: | Eric Dunipace |
| Repository: | CRAN |
| Date/Publication: | 2025-01-09 12:30:06 UTC |
An R package to perform exact and approximate optimal transport.
Description
R and C++ functions to perform exact and approximate optimal transport. All C++ methods are linkable to other R packages via their header files.
Author(s)
Eric Dunipace
See Also
Useful links:
Report bugs athttps://github.com/ericdunipace/approxOT/issues
Transform transportation plan to transportation matrix
Description
Transform transportation plan to transportation matrix
Usage
## S3 method for class 'transport.plan'as.matrix(x, ...)Arguments
x | An object of class 'transport.plan'. See output of (transport_plan)[transport_plan()] |
... | Unused arguments |
Value
A matrix specifying the minimal joint distribution between samples. Margins will be equal to the marginal distributions of the samples
Examples
set.seed(203987)n <- 5d <- 2x <- matrix(rnorm(d*n), nrow=d, ncol=n)y <- matrix(rnorm(d*n), nrow=d, ncol=n)#get hilbert sort orders for x in backwards waytrans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert")trans_matrix <- as.matrix(trans_plan)print(trans_matrix)Transform transportation matrix to transportation plan
Description
Transform transportation matrix to transportation plan
Usage
as.transport.plan(transport_matrix, ...)Arguments
transport_matrix | A matrix that is a transportation matrix, i.e. the minimal joint distribution for two samples. |
... | Unused arguments |
Value
An object of class 'transport.plan'. See output of (transport_plan)[transport_plan]
Examples
set.seed(203987)n <- 5d <- 2x <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)y <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)#get hilbert sort orders for x in backwards waytrans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert")trans_matrix <- as.matrix(trans_plan$tplan)tplan2 <- as.transport.plan(trans_matrix)all.equal(tplan2, trans_plan$tplan)Calculate cost matrix
Description
Calculate cost matrix
Usage
cost_calc(X, Y, ground_p)Arguments
X | matrix of values in first sample. Observations should be by column, not rows. |
Y | matrix of Values in second sample. Observations should be by column, not rows. |
ground_p | power of the Lp norm to use in cost calculation. |
Value
matrix of costs
Examples
X <- matrix(rnorm(10*100), 10, 100)Y <- matrix(rnorm(10*100), 10, 100)# the Euclidean distancecost <- cost_calc(X, Y, ground_p = 2)Covert the 2-dimensional index to 1-dimensional index
Description
Covert the 2-dimensional index to 1-dimensional index
Usage
dist_2d_to_1d(i, j, n, m)Arguments
i | Index of row |
j | Index of column |
n | Total number of rows |
m | Total number of columns |
Value
a 1d index for easy matrix entry
One-dimensional optimal transport for measures with more general mass
Description
One-dimensional optimal transport for measures with more general mass
Usage
general_1d_transport( X, Y, a = NULL, b = NULL, method = c("hilbert", "univariate"))Arguments
X | Data for sample one. Should be a vector if method is "univariate" or a matrix if method is "hilbert" |
Y | Data for sample two Should be a vector if method is "univariate" or a matrix if method is "hilbert" |
a | Empirical measure for sample one. |
b | Empirical measure for sample two. |
method | One of "hilbert" or "univariate" |
Value
An optimal transportation plan as a list with slots "from", "to", and "mass"
Examples
set.seed(23423)n <- 100d <- 10x <- matrix(stats::rnorm((n + 11)*d), n + 11 , d)y <- matrix(stats::rnorm(n*d), n, d)trans <- general_1d_transport(t(x), t(y))Get order along the Hilbert curve
Description
Get order along the Hilbert curve
Usage
hilbert.projection(X, Sigma = NULL)Arguments
X | matrix of values. Observations are unique by rows. |
Sigma | Covariance of the data. If provided, uses a Mahalanobis distance. |
Value
Index of orders
Examples
X <- matrix(rnorm(10*3), 3, 10)idx <- hilbert.projection(X)print(idx)Returns orders along the Hilbert space-filling Curve
Description
Returns orders along the Hilbert space-filling Curve
Usage
hilbert_proj_(A)Arguments
A | a matrix of data-values of class Eigen::MatrixXd |
Value
An integer vector of orders
Check if function is a transport.plan
Description
Check if function is a transport.plan
Usage
is.transport.plan(tplan)Arguments
tplan | An object of class 'transport.plan'. See output of (transport_plan)[transport_plan] |
Value
Logical
Examples
set.seed(203987)n <- 5d <- 2x <- matrix(rnorm(d*n), nrow=d, ncol=n)y <- matrix(rnorm(d*n), nrow=d, ncol=n)#get hilbert sort orders for x in backwards waytrans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert")print(is.transport.plan(trans_plan))Round transportation matrix to feasible set
Description
Round transportation matrix to feasible set
Usage
round_transport_matrix(transport_matrix, mass_x, mass_y)Arguments
transport_matrix | A transportation matrix returned by an approximate method |
mass_x | The distribution of the first margin |
mass_y | The distribution of the second margin |
Value
Returns a transportation matrix projected to the feasible set.
Return the dual potentials for the Sinkhorn distance
Description
Return the dual potentials for the Sinkhorn distance
Usage
sinkhorn_pot( mass_x, mass_y, p = 2, cost = NULL, cost_a = NULL, cost_b = NULL, ...)Arguments
mass_x | The empirical distribution of the first sample |
mass_y | The empirical distribution of the second sample |
p | The power to raise the cost by |
cost | The cost matrix between first and second samples |
cost_a | The cost matrix for the first sample |
cost_b | The cost matrix for the second sample |
... | Additional arguments including
|
Value
A list with slots "f" and "g", the potentials of the rows and margins, respectively.
Function returning supported optimal transportation methods.
Description
Function returning supported optimal transportation methods.
Usage
transport_options()Details
The currently supported methods are
exact, networkflow: Utilize the networkflow algorithm to solve the exact optimal transport problem
shortsimplex: Use the shortsimplex algorithm to solve the exact optimal transport problem
sinkhorn: Use Sinkhorn's algorithm to solve the approximate optimal transport problem
sinkhorn_log: Use Sinkhorn's algorithm on a log-scale for added stability to solve the approximate optimal transport problem
greenkhorn: Use the Greenkhorn algorithm to solve the approximate optimal transport problem
hilbert: Use hilbert sorting to perform approximate optimal transport
rank: use the average covariate ranks to perform approximate optimal transport
univariate: Use appropriate optimal transport methods for univariate data
swapping: Utilize the swapping algorithm to perform approximate optimal transport
sliced: Use the sliced optimal transport distance
Value
Returns a vector of supported transport methods
Optimal transport plans
Description
Optimal transport plans
Usage
transport_plan( X, Y, a = NULL, b = NULL, p = 2, ground_p = 2, observation.orientation = c("rowwise", "colwise"), method = transport_options(), ...)Arguments
X | The covariate data of the first sample. |
Y | The covariate data of the second sample. |
a | Optional. Empirical measure of the first sample |
b | Optional. Empirical measure of the second sample |
p | The power of the Wasserstein distance |
ground_p | The power of the Lp norm |
observation.orientation | Are observations by row ("rowwise") or column ("colwise"). |
method | Which transportation method to use. See [transport_options][transport_options] |
... | Additional arguments for various methods
|
Value
a list with slots "tplan" and "cost". "tplan" is the optimal transport plan and "cost" is the optimal transport distance.
Examples
set.seed(203987)n <- 100d <- 10x <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)y <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)#get hilbert sort orders for x in backwards waytransx <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, observation.orientation = "colwise", method = "hilbert")Optimal transport plans given a pre-specified cost
Description
Optimal transport plans given a pre-specified cost
Usage
transport_plan_given_C( mass_x, mass_y, p = 2, cost = NULL, method = "exact", cost_a = NULL, cost_b = NULL, ...)Arguments
mass_x | The empirical measure of the first sample |
mass_y | The empirical measure of the second sample. |
p | The power of the Wasserstein distance |
cost | Specify the cost matrix in advance. |
method | The transportation method to use, one of "exact", "networkflow","shortsimplex", "sinkhorn", "greenkhorn" |
cost_a | The cost matrix for the first sample with itself. Only used for unbiased Sinkhorn |
cost_b | The cost matrix for the second sample with itself. Only used for unbiased Sinkhorn |
... | Additional arguments for various methods
|
Value
A transportation plan as an object of class "transport.plan", which is a list with slots "from","to", and "mass".
Examples
n <- 32d <- 5set.seed(293897)A <- matrix(stats::rnorm(n*d),nrow=d,ncol=n)B <- matrix(stats::rnorm(n*d),nrow=d,ncol=n)transp.meth <- "sinkhorn"niter <- 1e2test <- transport_plan_given_C(rep(1/n,n), rep(1/n,n), 2, cost = cost_calc(A,B,2), "sinkhorn", niter = niter)Multimarginal optimal transport plans
Description
Multimarginal optimal transport plans
Usage
transport_plan_multimarg( ..., p = 2, ground_p = 2, observation.orientation = c("rowwise", "colwise"), method = c("hilbert", "univariate", "sliced"), nsim = 1000)Arguments
... | Either data matrices as separate arguments or a list of data matrices. Arguments after the data must be specified by name. |
p | The power of the Wasserstein distance to use |
ground_p | The power of the Euclidean distance to use |
observation.orientation | Are observations by rows or columns |
method | One of "hilbert", "univariate", or "sliced" |
nsim | Number of simulations to use for the sliced method |
Value
transport plan
Examples
set.seed(23423)n <- 100d <- 10p <- ground_p <- 2 #euclidean cost, p = 2x <- matrix(stats::rnorm((n + 11)*d), n + 11 , d)y <- matrix(stats::rnorm(n*d), n, d)z <- matrix(stats::rnorm((n +455)*d), n +455, d)# make data a listdata <- list(x,y,z)tplan <- transport_plan_multimarg(data, p = p, ground_p = ground_p,observation.orientation = "rowwise", method = "hilbert")#' #transpose data works toodatat <- lapply(data, t)tplan2 <- transport_plan_multimarg(datat, p = p, ground_p = ground_p,observation.orientation = "colwise",method = "hilbert")Calculate the Wasserstein distance
Description
Calculate the Wasserstein distance
Usage
wasserstein( X = NULL, Y = NULL, a = NULL, b = NULL, cost = NULL, tplan = NULL, p = 2, ground_p = 2, method = transport_options(), cost_a = NULL, cost_b = NULL, ...)Arguments
X | The covariate data of the first sample. |
Y | The covariate data of the second sample. |
a | Optional. Empirical measure of the first sample |
b | Optional. Empirical measure of the second sample |
cost | Specify the cost matrix in advance. |
tplan | Give a transportation plan with slots "from", "to", and "mass", like that returned by the [tranportation_plan()] function. |
p | The power of the Wasserstein distance |
ground_p | The power of the Lp norm |
method | Which transportation method to use. See [transport_options()] |
cost_a | The cost matrix for the first sample with itself. Only used for unbiased Sinkhorn |
cost_b | The cost matrix for the second sample with itself. Only used for unbiased Sinkhorn |
... | Additional arguments for various methods:
|
Value
The p-Wasserstein distance, a numeric value
Examples
set.seed(11289374)n <- 100z <- stats::rnorm(n)w <- stats::rnorm(n)uni <- approxOT::wasserstein(X = z, Y = w, p = 2, ground_p = 2, observation.orientation = "colwise", method = "univariate")