Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

License

NotificationsYou must be signed in to change notification settings

getkeops/keops

Repository files navigation

logo


PyPI versionPyPI downloads

Please visit ourwebsite fordocumentation, contribution guidelines and tutorials.

Kernel Operations on the GPU, with autodiff, without memory overflows

The KeOps library lets you compute reductions oflarge arrayswhose entries are given by amathematical formula or aneural network.It combinesefficient C++ routines with anautomatic differentiationengine and can be used withPython (NumPy, PyTorch),Matlab andR.

It is perfectly suited to the computation ofkernel matrix-vector products,K-nearest neighbors queries,N-body interactions,point cloudconvolutions and the associatedgradients.Crucially, it performs well even when the corresponding kernel or distancematrices donot fit into the RAM or GPU memory.Compared with a PyTorch GPU baseline, KeOps providesax10-x100 speed-up on a wide range of geometric applications,from kernel methods to geometric deep learning.

Symbolic matrices

Why using KeOps?Math libraries represent most objects as matrices and tensors:

  • (a) Dense matrices. Variables are often encoded asdense numerical arrays Mi,j = M[i,j].This representation is convenient and well-supported,but also puts aheavy load on the memories of our computers.Unfortunately,large arrays are expensive to move around andmay not even fit in RAM or GPU memories.

    In practice, this means that a majority of scientific programs arememory-bound.Run times for most neural networks and mathematical computationsare not limited by the raw capabilities of our CPUs and CUDA cores,but by thetime-consuming transfers of large arraysfrom memory circuits to arithmetic computing units.

  • (b) Sparse matrices. To work around this problem, a common solution is to rely onsparse matrices: tensors that have few non-zero coefficients.We represent these objects using lists of indices(in,jn) and values Mn = Min,jnthat correspond to a small number of non-zero entries.Matrix-vector operations are then implemented withindexing methods and scattered memory accesses.

    This method is elegant and allows us to represent large arrayswith a small memory footprint.But unfortunately, itdoes not stream well on GPUs:parallel computing devices are wired to performblock-wise memory accessesand have a hard time dealing with lists ofrandom indices (in,jn).As a consequence, when compared with dense arrays,sparse encodings only speed up computations for matricesthat haveless than 1% non-zero coefficients.This restrictive condition prevents sparse matricesfrom being very useful outside of graph and mesh processing.

  • (c) Symbolic matrices.KeOps provides another solution tospeed up tensor programs.Our key remark is that most of the large arrays that are usedin machine learning and applied mathematics share acommon mathematical structure.Distance matrices,kernel matrices, point cloudconvolutionsandattention layers can all be described assymbolic tensors:given two collections of vectors (xi) and (yj),their coefficients Mi,j at location (i,j) are given bymathematical formulas F(xi,yj)that are evaluated on data samples xi and yj.

    These objects are not "sparse" in the traditional sense...but can nevertheless be described efficiently usinga mathematical formula F and relativelysmall data arrays (xi) and (yj).The main purpose of the KeOps libraryis to provide support for this abstractionwith all the perks of a deep learning library:

    • A transparentinterface with CPU and GPU integration.
    • Numeroustutorials andbenchmarks.
    • Full support forautomatic differentiation,batch processingand approximate computations.

In practice, KeOps symbolic tensors are bothfast andmemory-efficient.We take advantage of the structure of CUDA registersto bypass costly memory transfers betweenarithmetic and memory circuits. This allows us to provide ax10-x100 speed-up to PyTorch GPU programsin a wide range of settings.

Using ourPython interface, a typical sample of code looks like:

# Create two arrays with 3 columns and a (huge) number of lines, on the GPUimporttorch# NumPy, Matlab and R are also supportedM,N,D=1000000,2000000,3x=torch.randn(M,D,requires_grad=True).cuda()# x.shape = (1e6, 3)y=torch.randn(N,D).cuda()# y.shape = (2e6, 3)# Turn our dense Tensors into KeOps symbolic variables with "virtual"# dimensions at positions 0 and 1 (for "i" and "j" indices):frompykeops.torchimportLazyTensorx_i=LazyTensor(x.view(M,1,D))# x_i.shape = (1e6, 1, 3)y_j=LazyTensor(y.view(1,N,D))# y_j.shape = ( 1, 2e6,3)# We can now perform large-scale computations, without memory overflows:D_ij= ((x_i-y_j)**2).sum(dim=2)# Symbolic (1e6,2e6,1) matrix of squared distancesK_ij= (-D_ij).exp()# Symbolic (1e6,2e6,1) Gaussian kernel matrix# We come back to vanilla PyTorch Tensors or NumPy arrays using# reduction operations such as .sum(), .logsumexp() or .argmin()# on one of the two "symbolic" dimensions 0 and 1.# Here, the kernel density estimation   a_i = sum_j exp(-|x_i-y_j|^2)# is computed using a CUDA scheme that has a linear memory footprint and# outperforms standard PyTorch implementations by two orders of magnitude.a_i=K_ij.sum(dim=1)# Genuine torch.cuda.FloatTensor, a_i.shape = (1e6, 1),# Crucially, KeOps fully supports automatic differentiation!g_x=torch.autograd.grad((a_i**2).sum(), [x])

KeOps allows you toget the most out of your hardware without compromising onusability.It provides:

  • Linear (instead of quadratic)memory footprint for numerous types of computations.
  • Support for a wide range of mathematicalformulas that can be composed at will.
  • Seamless computation ofderivatives andgradients, up to arbitrary orders.
  • Sum, LogSumExp, Min, Max but also ArgMin, ArgMax or K-minreductions.
  • Aconjugate gradient solver for large-scale spline interpolation and Gaussian process regression.
  • Transparent integration withstandard packages, such as theSciPy solvers for linear algebra.
  • An interface forblock-sparse and coarse-to-fine strategies.
  • Support formulti GPU configurations.

More details are provided below:

Projects using KeOps

Symbolic matrices are togeometric learning whatsparse matrices are tograph processing.KeOps can thus be used in a wide range of settings,fromshape analysis (registration, geometric deep learning, optimal transport...)tomachine learning (kernel methods, k-means, UMAP...),Gaussian processes, computationalbiology andphysics.

KeOps provides core routines for the following projects and libraries:

  • GPyTorch(from the universities of Cornell, Columbia, Pennsylvania) andFalkon(from the university of Genoa and theSierra Inria team),two libraries forGaussian Process regression that now scaleup tobillion-scale datasets.

  • Deformetrica, acomputational anatomy softwarefrom theAramis Inria team.

  • TheGudhi library fortopological data analysisand higher dimensional geometry understanding,from theDataShape Inria team.

  • GeomLoss, a PyTorch package for Chamfer (Hausdorff) distances, Kernel (Sobolev) divergences andEarth Mover's (Wasserstein) distances.It providesoptimal transport solvers that scale up tomillions of samples in seconds.

  • Thedeep graph matching consensus module,for learning and refining structural correspondences between graphs.

  • FshapesTk and theShapes toolbox,two research-orientedLDDMM toolkits.

  • HyenaDNA for parallel computations of the Vandermonde matrix multiplication kernel and reductions used in the S4D kernel.

Licensing, citation, academic use

This library is licensed under the permissiveMIT license,which is fully compatible with bothacademic andcommercial applications.

If you use this code in a research paper,please cite ouroriginal publication:

Charlier, B., Feydy, J., Glaunès, J. A., Collin, F.-D. & Durif, G. Kernel Operations on the GPU, with Autodiff, without Memory Overflows. Journal of Machine Learning Research 22, 1–6 (2021).

@article{JMLR:v22:20-275,  author  = {Benjamin Charlier and Jean Feydy and Joan Alexis Glaunès and François-David Collin and Ghislain Durif},  title   = {Kernel Operations on the GPU, with Autodiff, without Memory Overflows},  journal = {Journal of Machine Learning Research},  year    = {2021},  volume  = {22},  number  = {74},  pages   = {1-6},  url     = {http://jmlr.org/papers/v22/20-275.html}}

For applications togeometric (deep) learning,you may also consider ourNeurIPS 2020 paper:

@article{feydy2020fast,    title={Fast geometric learning with symbolic matrices},    author={Feydy, Jean and Glaun{\`e}s, Joan and Charlier, Benjamin and Bronstein, Michael},    journal={Advances in Neural Information Processing Systems},    volume={33},    year={2020}}

Authors

Please contact us for anybug report,question orfeature request by filinga report on ourGitHub issue tracker!

Core library - KeOps, PyKeOps, KeOpsLab:

R bindings - RKeOps:

Contributors:

Beyond explicit code contributions, KeOps has grown out of numerous discussions with applied mathematicians and machine learning experts. We would especially like to thankAlain Trouvé,Stanley Durrleman,Gabriel Peyré andMichael Bronsteinfor their valuable suggestions and financial support.

KeOps was awarded anopen science prize by the French Ministry of Higher Education and Research in 2023 ("Espoir - Documentation").

About

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors25


[8]ページ先頭

©2009-2025 Movatter.jp