- Notifications
You must be signed in to change notification settings - Fork76
KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
License
getkeops/keops
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Please visit ourwebsite fordocumentation, contribution guidelines and tutorials.
The KeOps library lets you compute reductions oflarge arrayswhose entries are given by amathematical formula or aneural network.It combinesefficient C++ routines with anautomatic differentiationengine and can be used withPython (NumPy, PyTorch),Matlab andR.
It is perfectly suited to the computation ofkernel matrix-vector products,K-nearest neighbors queries,N-body interactions,point cloudconvolutions and the associatedgradients.Crucially, it performs well even when the corresponding kernel or distancematrices donot fit into the RAM or GPU memory.Compared with a PyTorch GPU baseline, KeOps providesax10-x100 speed-up on a wide range of geometric applications,from kernel methods to geometric deep learning.
Why using KeOps?Math libraries represent most objects as matrices and tensors:
(a) Dense matrices. Variables are often encoded asdense numerical arrays Mi,j = M[i,j].This representation is convenient and well-supported,but also puts aheavy load on the memories of our computers.Unfortunately,large arrays are expensive to move around andmay not even fit in RAM or GPU memories.
In practice, this means that a majority of scientific programs arememory-bound.Run times for most neural networks and mathematical computationsare not limited by the raw capabilities of our CPUs and CUDA cores,but by thetime-consuming transfers of large arraysfrom memory circuits to arithmetic computing units.
(b) Sparse matrices. To work around this problem, a common solution is to rely onsparse matrices: tensors that have few non-zero coefficients.We represent these objects using lists of indices(in,jn) and values Mn = Min,jnthat correspond to a small number of non-zero entries.Matrix-vector operations are then implemented withindexing methods and scattered memory accesses.
This method is elegant and allows us to represent large arrayswith a small memory footprint.But unfortunately, itdoes not stream well on GPUs:parallel computing devices are wired to performblock-wise memory accessesand have a hard time dealing with lists ofrandom indices (in,jn).As a consequence, when compared with dense arrays,sparse encodings only speed up computations for matricesthat haveless than 1% non-zero coefficients.This restrictive condition prevents sparse matricesfrom being very useful outside of graph and mesh processing.
(c) Symbolic matrices.KeOps provides another solution tospeed up tensor programs.Our key remark is that most of the large arrays that are usedin machine learning and applied mathematics share acommon mathematical structure.Distance matrices,kernel matrices, point cloudconvolutionsandattention layers can all be described assymbolic tensors:given two collections of vectors (xi) and (yj),their coefficients Mi,j at location (i,j) are given bymathematical formulas F(xi,yj)that are evaluated on data samples xi and yj.
These objects are not "sparse" in the traditional sense...but can nevertheless be described efficiently usinga mathematical formula F and relativelysmall data arrays (xi) and (yj).The main purpose of the KeOps libraryis to provide support for this abstractionwith all the perks of a deep learning library:
- A transparentinterface with CPU and GPU integration.
- Numeroustutorials andbenchmarks.
- Full support forautomatic differentiation,batch processingand approximate computations.
In practice, KeOps symbolic tensors are bothfast andmemory-efficient.We take advantage of the structure of CUDA registersto bypass costly memory transfers betweenarithmetic and memory circuits. This allows us to provide ax10-x100 speed-up to PyTorch GPU programsin a wide range of settings.
Using ourPython interface, a typical sample of code looks like:
# Create two arrays with 3 columns and a (huge) number of lines, on the GPUimporttorch# NumPy, Matlab and R are also supportedM,N,D=1000000,2000000,3x=torch.randn(M,D,requires_grad=True).cuda()# x.shape = (1e6, 3)y=torch.randn(N,D).cuda()# y.shape = (2e6, 3)# Turn our dense Tensors into KeOps symbolic variables with "virtual"# dimensions at positions 0 and 1 (for "i" and "j" indices):frompykeops.torchimportLazyTensorx_i=LazyTensor(x.view(M,1,D))# x_i.shape = (1e6, 1, 3)y_j=LazyTensor(y.view(1,N,D))# y_j.shape = ( 1, 2e6,3)# We can now perform large-scale computations, without memory overflows:D_ij= ((x_i-y_j)**2).sum(dim=2)# Symbolic (1e6,2e6,1) matrix of squared distancesK_ij= (-D_ij).exp()# Symbolic (1e6,2e6,1) Gaussian kernel matrix# We come back to vanilla PyTorch Tensors or NumPy arrays using# reduction operations such as .sum(), .logsumexp() or .argmin()# on one of the two "symbolic" dimensions 0 and 1.# Here, the kernel density estimation a_i = sum_j exp(-|x_i-y_j|^2)# is computed using a CUDA scheme that has a linear memory footprint and# outperforms standard PyTorch implementations by two orders of magnitude.a_i=K_ij.sum(dim=1)# Genuine torch.cuda.FloatTensor, a_i.shape = (1e6, 1),# Crucially, KeOps fully supports automatic differentiation!g_x=torch.autograd.grad((a_i**2).sum(), [x])
KeOps allows you toget the most out of your hardware without compromising onusability.It provides:
- Linear (instead of quadratic)memory footprint for numerous types of computations.
- Support for a wide range of mathematicalformulas that can be composed at will.
- Seamless computation ofderivatives andgradients, up to arbitrary orders.
- Sum, LogSumExp, Min, Max but also ArgMin, ArgMax or K-minreductions.
- Aconjugate gradient solver for large-scale spline interpolation and Gaussian process regression.
- Transparent integration withstandard packages, such as theSciPy solvers for linear algebra.
- An interface forblock-sparse and coarse-to-fine strategies.
- Support formulti GPU configurations.
More details are provided below:
Symbolic matrices are togeometric learning whatsparse matrices are tograph processing.KeOps can thus be used in a wide range of settings,fromshape analysis (registration, geometric deep learning, optimal transport...)tomachine learning (kernel methods, k-means, UMAP...),Gaussian processes, computationalbiology andphysics.
KeOps provides core routines for the following projects and libraries:
GPyTorch(from the universities of Cornell, Columbia, Pennsylvania) andFalkon(from the university of Genoa and theSierra Inria team),two libraries forGaussian Process regression that now scaleup tobillion-scale datasets.
Deformetrica, acomputational anatomy softwarefrom theAramis Inria team.
TheGudhi library fortopological data analysisand higher dimensional geometry understanding,from theDataShape Inria team.
GeomLoss, a PyTorch package for Chamfer (Hausdorff) distances, Kernel (Sobolev) divergences andEarth Mover's (Wasserstein) distances.It providesoptimal transport solvers that scale up tomillions of samples in seconds.
Thedeep graph matching consensus module,for learning and refining structural correspondences between graphs.
FshapesTk and theShapes toolbox,two research-orientedLDDMM toolkits.
HyenaDNA for parallel computations of the Vandermonde matrix multiplication kernel and reductions used in the S4D kernel.
This library is licensed under the permissiveMIT license,which is fully compatible with bothacademic andcommercial applications.
If you use this code in a research paper,please cite ouroriginal publication:
Charlier, B., Feydy, J., Glaunès, J. A., Collin, F.-D. & Durif, G. Kernel Operations on the GPU, with Autodiff, without Memory Overflows. Journal of Machine Learning Research 22, 1–6 (2021).
@article{JMLR:v22:20-275, author = {Benjamin Charlier and Jean Feydy and Joan Alexis Glaunès and François-David Collin and Ghislain Durif}, title = {Kernel Operations on the GPU, with Autodiff, without Memory Overflows}, journal = {Journal of Machine Learning Research}, year = {2021}, volume = {22}, number = {74}, pages = {1-6}, url = {http://jmlr.org/papers/v22/20-275.html}}For applications togeometric (deep) learning,you may also consider ourNeurIPS 2020 paper:
@article{feydy2020fast, title={Fast geometric learning with symbolic matrices}, author={Feydy, Jean and Glaun{\`e}s, Joan and Charlier, Benjamin and Bronstein, Michael}, journal={Advances in Neural Information Processing Systems}, volume={33}, year={2020}}Please contact us for anybug report,question orfeature request by filinga report on ourGitHub issue tracker!
Core library - KeOps, PyKeOps, KeOpsLab:
- Benjamin Charlier, from the University of Montpellier.
- Jean Feydy, from theHeKA team (Inria Paris, Inserm, Université Paris Cité).
- Joan Alexis Glaunès, from MAP5 laboratory, Université Paris Cité.
R bindings - RKeOps:
- Ghislain Durif, from the University of Montpellier.
Contributors:
- François-David Collin, from the University of Montpellier: Tensordot operation, CI setup.
- Tanguy Lefort, from the University of Montpellier: conjugate gradient solver.
- Amélie Vernay andChloé Serre-Combe, from the University of Montpellier: support for LazyTensors in RKeOps.
- Mauricio Diaz, from Inria of Paris: CI setup.
- Benoît Martin, from the Aramis Inria team: multi-GPU support.
- Francis Williams, from New York University: maths operations.
- Kshiteej Kalambarkar, from Quansight: maths operations.
- Hugo Aguettaz, from ETH Zürich: trigonometric functions.
- D. J. Sutherland, from the TTI-Chicago: bug fix in the Python package.
- David Völgyes, from the Norwegian Institute of Science and Technology: bug fix in the formula parser.
- Jean-Baptiste Keck, from the Univeristy Grenoble-Alpes: bug fix in the Python package.
Beyond explicit code contributions, KeOps has grown out of numerous discussions with applied mathematicians and machine learning experts. We would especially like to thankAlain Trouvé,Stanley Durrleman,Gabriel Peyré andMichael Bronsteinfor their valuable suggestions and financial support.
KeOps was awarded anopen science prize by the French Ministry of Higher Education and Research in 2023 ("Espoir - Documentation").
About
KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
