Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A flexible and efficient С++ template library for dimension reduction

License

BSD-3-Clause, LGPL-3.0 licenses found

Licenses found

BSD-3-Clause
LICENSE
LGPL-3.0
LGPL-LICENSE
NotificationsYou must be signed in to change notification settings

lisitsyn/tapkee

Tapkee is a C++ template library for dimensionality reduction with some bias onspectral methods. The Tapkee origins from the code developed duringGSoC 2011 asthe part of theShogun machine learning toolbox.The project aim is to provide efficient and flexible standalone library fordimensionality reduction which can be easily integrated to existing codebases.Tapkee leverages capabilities of effectiveEigen3 linear algebra library andoptionally makes use of theARPACK eigensolver.The library uses CoverTree and VP-tree data structures to compute nearest neighbors. To achievegreater flexibility we provide a callback interface which decouples dimension reduction algorithms fromthe data representation and storage schemes.

The library is distributed under permissiveBSD 3-clause license(except a few rather optional parts that are distributed under otheropen sources licenses, see Licensing section of this document). If you usethis software in any publication we would be happy if you cite the following paper:

Sergey Lisitsyn and Christian Widmer and Fernando J. Iglesias Garcia. Tapkee: An Efficient Dimension Reduction Library. Journal of Machine Learning Research, 14: 2355-2359, 2013.

To get started with dimension reduction you may try thego.py scriptthat embeds common datasets (swissroll, helix, scurve) usingthe Tapkee library and outputs it with the help ofMatplotlib library. To use the script build thesample application (see the Application section for more details)and call go.py with the following command:

./examples/go.py [swissroll|helix|scurve|...] [lle|isomap|...]

You may also try out an minimal example usingmake minimal (examples/minimal)and the RNA example usingmake rna (examples/rna). There are also a few graphicalexamples. To run MNIST digits embedding example usemake mnist (examples/mnist),to run promoters embedding example usemake promoters (examples/promoters)and to run embedding for faces dataset usemake faces (examples/faces).All graphical examples require Matplotlib which can be usuallyinstalled with a package manager. The promoters example alsohas non-trivial dependency on Shogun machine learning toolbox(minimal version is 2.1.0). We also providesome examples of usage Tapkee in Shogun asmake langs (examples/langs) example.

API

We provide an interface based on the method chaining technique. The chain starts with the callof thewith(const ParametersSet&) method, which is used to provide parameters like the methodto use and its settings. The provided argument is formed with the following syntax:

(keyword1=value1, keyword2=value2)

Such syntax is possible due to comma operator overloading which groups all assigned keywordsin the comma separated list.

Keywords are defined in thetapkee namespace. Currently, the following keywordsare defined:method,eigen_method,neighbors_method,num_neighbors,target_dimension,diffusion_map_timesteps,gaussian_kernel_width,max_iteration,spe_global_strategy,spe_num_updates,spe_tolerance,landmark_ratio,nullspace_shift,klle_shift,check_connectivity,fa_epsilon,progress_function,cancel_function,sne_perplexity,sne_theta. See the documentation for their detailed meaning.

As an example of parameters setting, if you want to use the Isomapalgorithm with the number of neighbors set to 15:

tapkee::with((method=Isomap,num_neighbors=15))

Please note that the inner parentheses are necessary as it uses thecomma operator which appears to be ambiguous in this case.

Next, you may either embed the provided matrix with:

tapkee::with((method=Isomap,num_neighbors=15)).embedUsing(matrix);

Or provide callbacks (kernel, distance and features) using any combinationof thewithKernel(KernelCallback),withDistance(DistanceCallback) andwithFeatures(FeaturesCallback) member functions:

tapkee::with((method=Isomap,num_neighbors=15))       .withKernel(kernel_callback)       .withDistance(distance_callback)       .withFeatures(features_callback)

Once callbacks are initialized you may either embed data using anSTL-compatible sequence of indices or objects (that supports thebegin() andend() methods to obtain the corresponding iterators)with theembedUsing(Sequence) member functionor embed the data using a sequence range with theembedRange(RandomAccessIterator, RandomAccessIterator)member function.

As a summary - a few examples:

TapkeeOutput output = with((method=Isomap,num_neighbors=15))    .embedUsing(matrix);TapkeeOutput output = with((method=Isomap,num_neighbors=15))    .withDistance(distance_callback)    .embedUsing(indices);TapkeeOutput output = with((method=Isomap,num_neighbors=15))    .withDistance(distance_callback)    .embedRange(indices.begin(),indices.end());

Minimal example

A minimal working example of a program that uses the library is:

#include <tapkee/tapkee.hpp>#include <tapkee/callbacks/dummy_callbacks.hpp>using namespace std;using namespace tapkee;struct MyDistanceCallback{ScalarType distance(IndexType l, IndexType r) { return abs(l-r); }};int main(int argc, const char** argv){const int N = 100;vector<IndexType> indices(N);for (int i=0; i<N; i++) indices[i] = i;MyDistanceCallback d;TapkeeOutput output = tapkee::with((method=MultidimensionalScaling,target_dimension=1))   .withDistance(d)   .embedUsing(indices);cout << output.embedding.transpose() << endl;return 0;}

This example require Tapkee to be in the include path. With Linux compilersyou may do that with the-I/path/to/tapkee/headers/folder key.

Integration

There are a few issues related to including the Tapkee library to your code. First, if your libraryalready includes Eigen3 (and only if) - you might need to let Tapkeeknow about that with the following define:

#define TAPKEE_EIGEN_INCLUDE_FILE <path/to/your/eigen/include/file.h>

Please note that if you don't include Eigen3 in your project there is no need to define that variable -in this case Eigen3 will be included by Tapkee. This issue comes from the need of including the Eigen3 libraryonly once when using some specific parameters (like debug and extensions).

If you are able to use less restrictive licenses (such as LGPLv3) you may definethe following variable:

  • TAPKEE_USE_LGPL_COVERTREE to use Covertree code by John Langford.

When compiling your software that includes Tapkee be sure Eigen3 headers are in include path and your codeis linked against ARPACK library (-larpack key for g++ and clang++).

For an example of integration you may checkTapkee adapter in Shogun.

When working with installed headers you may check which version of the librarydo you have with checking the values ofTAPKEE_WORLD_VERSION,TAPKEE_MAJOR_VERSIONandTAPKEE_MINOR_VERSION defines.

We welcome any integration so please contact authors if you have got any questions. If you havesuccessfully used the library please also let authors know about that - mentions of anyapplications are very appreciated.

Customization

Tapkee is designed to be highly customizable with preprocessor definitions.

If you want to use float as internal numeric type (default is double) you may dothat with definition of#define TAPKEE_CUSTOM_NUMTYPE floatbefore includingdefines header.

If you use some non-standard STL-compatible realization of vector, map and pair you may redefine themwithTAPKEE_INTERNAL_VECTOR,TAPKEE_INTERNAL_PAIR,TAPKEE_INTERNAL_MAP(they are set to std::vector, std::pair and std::map by default otherwise).

You may defineTAPKEE_USE_FIBONACCI_HEAP orTAPKEE_USE_PRIORITY_QUEUE to select whichdata structure should be used in the shortest paths computing algorithm. By defaulta priority queue is used.

Other properties can be loaded from some provided header file using#define TAPKEE_CUSTOM_PROPERTIES. Currentlysuch file should define only one variable -COVERTREE_BASE which defines the base of the CoverTree (default is 1.3).

Command line application

Tapkee comes with a sample application which can be used to constructlow-dimensional representations of dense feature matrices. For more information onits usage please run:

./bin/tapkee -h

The application takes plain ASCII file containing dense matrix (each vector is a column and eachline contains values of some feature). The output of the application is stored into the providedfile in the same format (each line is feature).

To compile the application please useCMake. The workflow of compilationTapkee with CMake is usual. When using Unix-basedsystems you may use the following command to compile the Tapkee application:

mkdir build && cd build && cmake [definitions] .. && make

There are a few cases when you'd want to put some definitions:

  • To enable unit-tests compilation add to-DBUILD_TESTS=1 to[definitions] when building. Please note thatbuilding unit-tests require googletest. If you are running Ubuntu you may installlibgtest-dev package for that.Otherwise, if you have gtest sources around you may provide them as-DGTEST_SOURCE_DIR and-DGTEST_INCLUDES_DIR.You may also download gtest with the following command:

wget https://github.com/google/googletest/archive/release-1.8.0.tar.gz && tar xfv release-1.8.0.tar.gz

Downloaded sources will be used by Tapkee.To run tests usemake test command (or better 'ctest -VV').

  • To let make script store test coverage information using GCOV andadd a target for output test coverage in HTML with LCOV add the-DUSE_GCOV=1 flag to[definitions].

  • To enable precomputation of kernel/distance matrices which can speed-up algorithms (but requires much more memory) add-DPRECOMPUTED=1 to[definitions] when building.

  • To build application without parts licensed by LGPLv3 use-DGPL_FREE=1 definition.

The library requires Eigen3 to be available in your path. The ARPACK library is also highlyrecommended to achieve best performance. On Debian/Ubuntu these packages can be installed with

sudo apt-get install libeigen3-dev libarpack2-dev

If you are using Mac OS X and Macports you can install these packages with

sudo port install eigen3 && sudo port install arpack`

In case you want to use some non-defaultcompiler useCC=your-C-compiler CXX=your-C++-compiler cmake [definitions] .. when running cmake.

Directory contents

The repository of Tapkee contains the following directories:

  • src/ that contains simple command-line application (src/cli)and CMake module finders (src/cmake).
  • includes/ that contains the library itself in theincludes/tapkeesubdirectory.
  • test/ that contains unit-tests in thetest/unit subdirectory anda few helper scripts.
  • examples/ that contains a few examples including already mentioned(these examples are supposed to be called throughmake as describedabove, e.g.make minimal).
  • data/ a git submodule that contains data required forexamples. To initialize this submodule usegit submodule update --init.
  • doc/ that contains Doxygen interface file which is used togenerate HTML documentation of the library. Callingdoxygen doc/Doxyfile will generate it in this folder.

Once built, the root will also contain the following directories:

  • bin that contains binaries (tapkee that is command line applicationand various tests with common namingtest_*)
  • lib that contains gtest shared libraries.

Need help?

If you need any help or advice don't hesitate to sendan email orfirean issue at github.

Supported platforms

Tapkee is tested to be fully functional on Linux (ICC, GCC, Clang compilers)and Mac OS X (GCC and Clang compilers). It also compiles under Windows natively(MSVS 2012 compiler) with a few known issues. In general, Tapkee uses no platformspecific code and should work on other systems as well. Pleaselet us know if you have successfully compiledor have got any issues on any other system not listed above.

Supported dimension reduction methods

Tapkee provides implementations of the following dimension reduction methods (urls to descriptions provided):

  • Locally Linear Embedding and Kernel Locally Linear Embedding (LLE/KLLE)
  • Neighborhood Preserving Embedding (NPE)
  • Local Tangent Space Alignment (LTSA)
  • Linear Local Tangent Space Alignment (LLTSA)
  • Hessian Locally Linear Embedding (HLLE)
  • Laplacian eigenmaps
  • Locality Preserving Projections
  • Diffusion map
  • Isomap and landmark Isomap
  • Multidimensional scaling and landmark Multidimensional scaling (MDS/lMDS)
  • Stochastic Proximity Embedding (SPE)
  • Principal Component Analysis (PCA)
  • Kernel Principal Component Analysis (PCA)
  • Random projection
  • Factor analysis
  • t-SNE
  • Barnes-Hut-SNE

Licensing

The library is distributed under theBSD 3-clause license.

Exceptions are:

About

A flexible and efficient С++ template library for dimension reduction

Topics

Resources

License

BSD-3-Clause, LGPL-3.0 licenses found

Licenses found

BSD-3-Clause
LICENSE
LGPL-3.0
LGPL-LICENSE

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp