getkeops/keopsPublic

NotificationsYou must be signed in to change notification settings
Fork76
Star1.1k

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

License

MIT license

1.1k stars 76 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4,558 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
doc		doc
keopscore		keopscore
pykeops		pykeops
rkeops		rkeops
.cirrus.yml		.cirrus.yml
.dockerignore		.dockerignore
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
doc-requirements.txt		doc-requirements.txt
docker-images.sh		docker-images.sh
keops.Rproj		keops.Rproj
keops_version		keops_version
licence.txt		licence.txt
pybuild.sh		pybuild.sh
pydoc.sh		pydoc.sh
pytest.sh		pytest.sh
readme.md		readme.md

Repository files navigation

Please visit ourwebsite fordocumentation, contribution guidelines and tutorials.

Kernel Operations on the GPU, with autodiff, without memory overflows

The KeOps library lets you compute reductions oflarge arrayswhose entries are given by amathematical formula or aneural network.It combinesefficient C++ routines with anautomatic differentiationengine and can be used withPython (NumPy, PyTorch),Matlab andR.

It is perfectly suited to the computation ofkernel matrix-vector products,K-nearest neighbors queries,N-body interactions,point cloudconvolutions and the associatedgradients.Crucially, it performs well even when the corresponding kernel or distancematrices donot fit into the RAM or GPU memory.Compared with a PyTorch GPU baseline, KeOps providesax10-x100 speed-up on a wide range of geometric applications,from kernel methods to geometric deep learning.

Why using KeOps?Math libraries represent most objects as matrices and tensors:

(a) Dense matrices. Variables are often encoded asdense numerical arrays M_i,j = M[i,j].This representation is convenient and well-supported,but also puts aheavy load on the memories of our computers.Unfortunately,large arrays are expensive to move around andmay not even fit in RAM or GPU memories.
In practice, this means that a majority of scientific programs arememory-bound.Run times for most neural networks and mathematical computationsare not limited by the raw capabilities of our CPUs and CUDA cores,but by thetime-consuming transfers of large arraysfrom memory circuits to arithmetic computing units.
(b) Sparse matrices. To work around this problem, a common solution is to rely onsparse matrices: tensors that have few non-zero coefficients.We represent these objects using lists of indices(i_n,j_n) and values M_n = M_{i_n,j_n}that correspond to a small number of non-zero entries.Matrix-vector operations are then implemented withindexing methods and scattered memory accesses.
This method is elegant and allows us to represent large arrayswith a small memory footprint.But unfortunately, itdoes not stream well on GPUs:parallel computing devices are wired to performblock-wise memory accessesand have a hard time dealing with lists ofrandom indices (i_n,j_n).As a consequence, when compared with dense arrays,sparse encodings only speed up computations for matricesthat haveless than 1% non-zero coefficients.This restrictive condition prevents sparse matricesfrom being very useful outside of graph and mesh processing.
(c) Symbolic matrices.KeOps provides another solution tospeed up tensor programs.Our key remark is that most of the large arrays that are usedin machine learning and applied mathematics share acommon mathematical structure.Distance matrices,kernel matrices, point cloudconvolutionsandattention layers can all be described assymbolic tensors:given two collections of vectors (x_i) and (y_j),their coefficients M_i,j at location (i,j) are given bymathematical formulas F(x_i,y_j)that are evaluated on data samples x_i and y_j.
These objects are not "sparse" in the traditional sense...but can nevertheless be described efficiently usinga mathematical formula F and relativelysmall data arrays (x_i) and (y_j).The main purpose of the KeOps libraryis to provide support for this abstractionwith all the perks of a deep learning library:
- A transparentinterface with CPU and GPU integration.
- Numeroustutorials andbenchmarks.
- Full support forautomatic differentiation,batch processingand approximate computations.

In practice, KeOps symbolic tensors are bothfast andmemory-efficient.We take advantage of the structure of CUDA registersto bypass costly memory transfers betweenarithmetic and memory circuits. This allows us to provide ax10-x100 speed-up to PyTorch GPU programsin a wide range of settings.

Using ourPython interface, a typical sample of code looks like:

# Create two arrays with 3 columns and a (huge) number of lines, on the GPUimporttorch# NumPy, Matlab and R are also supportedM,N,D=1000000,2000000,3x=torch.randn(M,D,requires_grad=True).cuda()# x.shape = (1e6, 3)y=torch.randn(N,D).cuda()# y.shape = (2e6, 3)# Turn our dense Tensors into KeOps symbolic variables with "virtual"# dimensions at positions 0 and 1 (for "i" and "j" indices):frompykeops.torchimportLazyTensorx_i=LazyTensor(x.view(M,1,D))# x_i.shape = (1e6, 1, 3)y_j=LazyTensor(y.view(1,N,D))# y_j.shape = ( 1, 2e6,3)# We can now perform large-scale computations, without memory overflows:D_ij= ((x_i-y_j)**2).sum(dim=2)# Symbolic (1e6,2e6,1) matrix of squared distancesK_ij= (-D_ij).exp()# Symbolic (1e6,2e6,1) Gaussian kernel matrix# We come back to vanilla PyTorch Tensors or NumPy arrays using# reduction operations such as .sum(), .logsumexp() or .argmin()# on one of the two "symbolic" dimensions 0 and 1.# Here, the kernel density estimation   a_i = sum_j exp(-|x_i-y_j|^2)# is computed using a CUDA scheme that has a linear memory footprint and# outperforms standard PyTorch implementations by two orders of magnitude.a_i=K_ij.sum(dim=1)# Genuine torch.cuda.FloatTensor, a_i.shape = (1e6, 1),# Crucially, KeOps fully supports automatic differentiation!g_x=torch.autograd.grad((a_i**2).sum(), [x])

KeOps allows you toget the most out of your hardware without compromising onusability.It provides:

Linear (instead of quadratic)memory footprint for numerous types of computations.
Support for a wide range of mathematicalformulas that can be composed at will.
Seamless computation ofderivatives andgradients, up to arbitrary orders.
Sum, LogSumExp, Min, Max but also ArgMin, ArgMax or K-minreductions.
Aconjugate gradient solver for large-scale spline interpolation and Gaussian process regression.
Transparent integration withstandard packages, such as theSciPy solvers for linear algebra.
An interface forblock-sparse and coarse-to-fine strategies.
Support formulti GPU configurations.

More details are provided below:

Projects using KeOps

Symbolic matrices are togeometric learning whatsparse matrices are tograph processing.KeOps can thus be used in a wide range of settings,fromshape analysis (registration, geometric deep learning, optimal transport...)tomachine learning (kernel methods, k-means, UMAP...),Gaussian processes, computationalbiology andphysics.

KeOps provides core routines for the following projects and libraries:

GPyTorch(from the universities of Cornell, Columbia, Pennsylvania) andFalkon(from the university of Genoa and theSierra Inria team),two libraries forGaussian Process regression that now scaleup tobillion-scale datasets.
Deformetrica, acomputational anatomy softwarefrom theAramis Inria team.
TheGudhi library fortopological data analysisand higher dimensional geometry understanding,from theDataShape Inria team.
GeomLoss, a PyTorch package for Chamfer (Hausdorff) distances, Kernel (Sobolev) divergences andEarth Mover's (Wasserstein) distances.It providesoptimal transport solvers that scale up tomillions of samples in seconds.
Thedeep graph matching consensus module,for learning and refining structural correspondences between graphs.
FshapesTk and theShapes toolbox,two research-orientedLDDMM toolkits.
HyenaDNA for parallel computations of the Vandermonde matrix multiplication kernel and reductions used in the S4D kernel.

Licensing, citation, academic use

This library is licensed under the permissiveMIT license,which is fully compatible with bothacademic andcommercial applications.

If you use this code in a research paper,please cite ouroriginal publication:

Charlier, B., Feydy, J., Glaunès, J. A., Collin, F.-D. & Durif, G. Kernel Operations on the GPU, with Autodiff, without Memory Overflows. Journal of Machine Learning Research 22, 1–6 (2021).

@article{JMLR:v22:20-275,  author  = {Benjamin Charlier and Jean Feydy and Joan Alexis Glaunès and François-David Collin and Ghislain Durif},  title   = {Kernel Operations on the GPU, with Autodiff, without Memory Overflows},  journal = {Journal of Machine Learning Research},  year    = {2021},  volume  = {22},  number  = {74},  pages   = {1-6},  url     = {http://jmlr.org/papers/v22/20-275.html}}

For applications togeometric (deep) learning,you may also consider ourNeurIPS 2020 paper:

@article{feydy2020fast,    title={Fast geometric learning with symbolic matrices},    author={Feydy, Jean and Glaun{\`e}s, Joan and Charlier, Benjamin and Bronstein, Michael},    journal={Advances in Neural Information Processing Systems},    volume={33},    year={2020}}

Authors

Please contact us for anybug report,question orfeature request by filinga report on ourGitHub issue tracker!

Core library - KeOps, PyKeOps, KeOpsLab:

Benjamin Charlier, from the University of Montpellier.
Jean Feydy, from theHeKA team (Inria Paris, Inserm, Université Paris Cité).
Joan Alexis Glaunès, from MAP5 laboratory, Université Paris Cité.

R bindings - RKeOps:

Ghislain Durif, from the University of Montpellier.

Contributors:

François-David Collin, from the University of Montpellier: Tensordot operation, CI setup.
Tanguy Lefort, from the University of Montpellier: conjugate gradient solver.
Amélie Vernay andChloé Serre-Combe, from the University of Montpellier: support for LazyTensors in RKeOps.
Mauricio Diaz, from Inria of Paris: CI setup.
Benoît Martin, from the Aramis Inria team: multi-GPU support.
Francis Williams, from New York University: maths operations.
Kshiteej Kalambarkar, from Quansight: maths operations.
Hugo Aguettaz, from ETH Zürich: trigonometric functions.
D. J. Sutherland, from the TTI-Chicago: bug fix in the Python package.
David Völgyes, from the Norwegian Institute of Science and Technology: bug fix in the formula parser.
Jean-Baptiste Keck, from the Univeristy Grenoble-Alpes: bug fix in the Python package.

Beyond explicit code contributions, KeOps has grown out of numerous discussions with applied mathematicians and machine learning experts. We would especially like to thankAlain Trouvé,Stanley Durrleman,Gabriel Peyré andMichael Bronsteinfor their valuable suggestions and financial support.

KeOps was awarded anopen science prize by the French Ministry of Higher Education and Research in 2023 ("Espoir - Documentation").

About

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

www.kernel-operations.io

Releases19

v2.3 Latest

Apr 18, 2025

+ 18 releases

Packages

No packages published

Contributors25

+ 11 contributors

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Kernel Operations on the GPU, with autodiff, without memory overflows

Projects using KeOps

Licensing, citation, academic use

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases19

Packages

Uh oh!

Contributors25

Uh oh!

Languages

Movatterモバイル変換

License

getkeops/keops

Folders and files

Latest commit

History

Repository files navigation

Kernel Operations on the GPU, with autodiff, without memory overflows

Projects using KeOps

Licensing, citation, academic use

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases19

Packages0

Uh oh!

Contributors25

Uh oh!

Languages

Packages