- Notifications
You must be signed in to change notification settings - Fork73
A C++ header-only library of statistical distribution functions.
License
kthohr/stats
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
StatsLib is a templated C++ library of statistical distribution functions, featuring unique compile-time computing capabilities and seamless integration with several popular linear algebra libraries.
Features:
- A header-only library of probability density functions, cumulative distribution functions, quantile functions, and random sampling methods.
- Functions are written in C++11
constexpr
format, enabling the library to operate as both a compile-time and run-time computation engine. - Designed with a simpleR-like syntax.
- Optional vector-matrix functionality with wrappers to support:
- Matrix-based operations are parallelizable with OpenMP.
- Released under a permissive, non-GPL license.
- Distributions
- Installation
- Documentation
- Jupyter Notebook
- Options
- Syntax and Examples
- Compile-time Computation Capabilities
- Author and License
Functions to compute the cdf, pdf, quantile, as well as random sampling methods, are available for the following distributions:
- Bernoulli
- Beta
- Binomial
- Cauchy
- Chi-squared
- Exponential
- F
- Gamma
- Inverse-Gamma
- Inverse-Gaussian
- Laplace
- Logistic
- Log-Normal
- Normal (Gaussian)
- Poisson
- Rademacher
- Student's t
- Uniform
- Weibull
In addition, pdf and random sampling functions are available for several multivariate distributions:
- inverse-Wishart
- Multivariate Normal
- Wishart
StatsLib is a header-only library. Simply add the header files to your project using
#include"stats.hpp"
The only dependency is the latest version ofGCEM and a C++11 compatible compiler.
Full documentation is available online:
A PDF version of the documentation is availablehere.
You can test the library online using an interactive Jupyter notebook:
The following options should be declaredbefore including the StatsLib header files.
- For inline-only functionality (i.e., no
constexpr
specifiers):
#defineSTATS_GO_INLINE
- OpenMP functionality is enabled by default if the
_OPENMP
macro is detected (e.g., by invoking-fopenmp
with GCC or Clang). To explicitly enable OpenMP features use:
#defineSTATS_USE_OPENMP
- To disable OpenMP functionality:
#defineSTATS_DONT_USE_OPENMP
- To use StatsLib with Armadillo, Blaze or Eigen:
#defineSTATS_ENABLE_ARMA_WRAPPERS#defineSTATS_ENABLE_BLAZE_WRAPPERS#defineSTATS_ENABLE_EIGEN_WRAPPERS
- To enable wrappers for
std::vector
:
#defineSTATS_ENABLE_STDVEC_WRAPPERS
Functions are called using anR-like syntax. Some general rules:
- density functions:
stats::d*
. For example, the Normal (Gaussian) density is called using
stats::dnorm(<value>,<mean parameter>,<standard deviation>);
- cumulative distribution functions:
stats::p*
. For example, the Gamma CDF is called using
stats::pgamma(<value>,<shape parameter>,<scale parameter>);
- quantile functions:
stats::q*
. For example, the Beta quantile is called using
stats::qbeta(<value>,<a parameter>,<b parameter>);
- random sampling:
stats::r*
. For example, to generate a single draw from the Logistic distribution:
stats::rlogis(<location parameter>,<scale parameter>,<seed valueor random number engine>);
All of these functions have matrix-based equivalents using Armadillo, Blaze, and Eigen dense matrices.
- The pdf, cdf, and quantile functions can take matrix-valued arguments. For example,
// Using Armadillo:arma::mat norm_pdf_vals = stats::dnorm(arma::ones(10,20),1.0,2.0);
- The randomization functions (
r*
) can output random matrices of arbitrary size. For example, For example, the following code will generate a 100-by-50 matrix of iid draws from a Gamma(3,2) distribution:
// Armadillo:arma::mat gamma_rvs = stats::rgamma<arma::mat>(100,50,3.0,2.0);// Blaze:blaze::DynamicMatrix<double> gamma_rvs = stats::rgamma<blaze::DynamicMatrix<double>>(100,50,3.0,2.0);// Eigen:Eigen::MatrixXd gamma_rvs = stats::rgamma<Eigen::MatrixXd>(100,50,3.0,2.0);
- All matrix-based operations are parallelizable with OpenMP. For GCC and Clang compilers, simply include the
-fopenmp
option during compilation.
Random number seeding is available in two forms: seed values and random number engines.
- Seed values are passed as unsigned integers. For example, to generate a draw from a normal distribution N(1,2) with seed value 1776:
stats::rnorm(1,2,1776);
- Random engines in StatsLib use the 64-bit Mersenne-Twister generator (
std::mt19937_64
) and are passed by reference. Example:
std::mt19937_64engine(1776);stats::rnorm(1,2,engine);
More examples with code:
// evaluate the normal PDF at x = 1, mu = 0, sigma = 1double dval_1 = stats::dnorm(1.0,0.0,1.0);// evaluate the normal PDF at x = 1, mu = 0, sigma = 1, and return the log valuedouble dval_2 = stats::dnorm(1.0,0.0,1.0,true);// evaluate the normal CDF at x = 1, mu = 0, sigma = 1double pval = stats::pnorm(1.0,0.0,1.0);// evaluate the Laplacian quantile at p = 0.1, mu = 0, sigma = 1double qval = stats::qlaplace(0.1,0.0,1.0);// draw from a t-distribution dof = 30double rval = stats::rt(30);// matrix outputarma::mat beta_rvs = stats::rbeta<arma::mat>(100,100,3.0,2.0);// matrix inputarma::mat beta_cdf_vals = stats::pbeta(beta_rvs,3.0,2.0);
StatsLib is designed to operate equally well as a compile-time computation engine. Compile-time computation allows the compiler to replace function calls (e.g.,dnorm(0,0,1)
) with static values in the source code. That is, functions are evaluated during the compilation process, rather than at run-time. This capability is made possible due to the templatedconstexpr
design of the library and can be verified by inspecting the assembly code generated by the compiler.
The compile-time features are enabled using theconstexpr
specifier. The example below computes the pdf, cdf, and quantile function of the Laplace distribution.
#include"stats.hpp"intmain(){constexprdouble dens_1 =stats::dlaplace(1.0,1.0,2.0);// answer = 0.25constexprdouble prob_1 =stats::plaplace(1.0,1.0,2.0);// answer = 0.5constexprdouble quant_1 =stats::qlaplace(0.1,1.0,2.0);// answer = -2.218875...return0;}
Assembly code generated by Clang without any optimization:
LCPI0_0:.quad-4611193153885729483 ## double-2.2188758248682015LCPI0_1:.quad4602678819172646912 ## double0.5LCPI0_2:.quad4598175219545276417 ## double0.25000000000000006.section__TEXT,__text,regular,pure_instructions.globl_main.p2align4,0x90_main: ## @mainpushrbpmovrbp,rspxoreax,eaxmovsdxmm0, qword ptr[rip+ LCPI0_0] ##xmm0 = mem[0],zeromovsdxmm1, qword ptr[rip+ LCPI0_1] ##xmm1 = mem[0],zeromovsdxmm2, qword ptr[rip+ LCPI0_2] ##xmm2 = mem[0],zeromovdword ptr[rbp-4],0movsdqword ptr[rbp-16],xmm2movsdqword ptr[rbp-24],xmm1movsdqword ptr[rbp-32],xmm0poprbpret
Keith O'Hara
Apache Version 2
About
A C++ header-only library of statistical distribution functions.