Annoy is a small, fastand lightweight library for Approximate Nearest Neighbours with aparticular focus on efficient memory use and the ability to load apre-saved index.
Annoy is written byErik Bernhardsson for use atSpotify, and implemented in about 500lines of a single C++ template header file — which is wrapped by Erikinto a loadable Python module.
It provides a nice example for Rcpp Modules and use of templates:Annoy uses two template data types (generallyfloat andint32_t for efficiency) and one of two distance measures.This package shows that it is easy to wrap both.
It also shows how easy it is to have Python and R shared theexact same functionality by virtue of modules binding on thePython modules and R side (where Rcpp helps).
Source code resides in theRcppAnnoy GitHubrepo.
This is implemented asdemo/simpleExample.R and mirrorsthe Python example on theAnnoy repo page.
library(RcppAnnoy)set.seed(123)# be reproduciblef<-40a<-new(AnnoyEuclidean, f)n<-50# not specifiedfor (iinseq(n)) { v<-rnorm(f) a$addItem(i-1, v)}a$build(50)# 50 treesa$save("/tmp/test.tree")b<-new(AnnoyEuclidean, f)# new object, could be in another processb$load("/tmp/test.tree")# super fast, will just mmap the fileprint(b$getNNsByItem(0,40))The package matches the behaviour of the original Python package inthe original Python wrapper for theAnnoy library. It alsoreplicates all unit tests written for the Python frontend, including atest for efficientlymmap-ing a binary index file. Whilesetting it up, some small contributions were made back toAnnoy as well.
As it usesmmap for fast disk-access to stored indexfile, a Windows build is possible viaMapViewOfFile (seee.g. Jeff Ryan’smmap CRANpackage) but we have not needed that functionality. A clean pullrequests to theAnnoy orRcppAnnoy reposwould be welcome.
Dirk Eddelbuettel
Initially created: Sun Nov 16 07:45:09 CST 2014
Last modified: Sun May 26 10:09:42 CDT 2024