Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A Modern C++ Data Sciences Toolkit

License

MIT, NCSA licenses found

Licenses found

MIT
LICENSE.mit
NCSA
LICENSE.ncsa
NotificationsYou must be signed in to change notification settings

meta-toolkit/meta

Repository files navigation

Please visit ourweb page for information and tutorialsabout MeTA!

Build Status (by branch)

  • master:Build StatusWindows Build Status
  • develop:Build StatusWindows Build Status

Outline

Intro

MeTA is a modern C++ data sciences toolkit featuring

  • text tokenization, including deep semantic features like parse trees
  • inverted and forward indexes with compression and various caching strategies
  • a collection of ranking functions for searching the indexes
  • topic models
  • classification algorithms
  • graph algorithms
  • language models
  • CRF implementation (POS-tagging, shallow parsing)
  • wrappers for liblinear and libsvm (including libsvm dataset parsers)
  • UTF8 support for analysis on various languages
  • multithreaded algorithms

Documentation

Doxygen documentation can be foundhere.

Tutorials

We have walkthroughs for a few different parts of MeTA on theMeTA homepage.

Citing

If you used MeTA in your research, we would greatly appreciate a citation forour ACL demo paper:

@InProceedings{meta-toolkit,  author    = {Massung, Sean and Geigle, Chase and Zhai, Cheng{X}iang},  title     = {{MeTA: A Unified Toolkit for Text Retrieval and Analysis}},  booktitle = {Proceedings of ACL-2016 System Demonstrations},  month     = {August},  year      = {2016},  address   = {Berlin, Germany},  publisher = {Association for Computational Linguistics},  pages     = {91--96},  url       = {http://anthology.aclweb.org/P16-4016}}

Project setup

Mac OS X Build Guide

Mac OS X 10.6 or higher is required. You may have success with 10.5, butthis is not tested.

You will need to havehomebrew installed, as well as theCommand Line Tools for Xcode (homebrew requires these as well, and it willprompt for them during install, or you can install them withxcode-select --install on recent versions of OS X).

Once you have homebrew installed, run the following commands to get thedependencies for MeTA:

brew updatebrew install cmake jemalloc lzlib icu4c

To get started, run the following commands:

# clone the projectgit clone https://github.com/meta-toolkit/meta.gitcd meta/# set up submodulesgit submodule update --init --recursive# set up a build directorymkdir buildcd buildcp ../config.toml.# configure and build the projectCXX=clang++ cmake ../ -DCMAKE_BUILD_TYPE=Release -DICU_ROOT=/usr/local/opt/icu4cmake

You can now test the system by running the following command:

./unit-test --reporter=spec

If everything passes, congratulations! MeTA seems to be working on yoursystem.

Ubuntu Build Guide

The directions here depend greatly on your installed version of Ubuntu. Tocheck what version you are on, run the following command:

cat /etc/issue

Based on what you see, you should proceed with one of the following guides:

If your version is less than 12.04 LTS, your operating system is notsupported (even by your vendor!) and you should upgrade to at least 12.04LTS (or 14.04 LTS, if possible).

Ubuntu 12.04 LTS Build Guide

Building on Ubuntu 12.04 LTS requires more work than its more up-to-date14.04 sister, but it can be done relatively easily. You will, however, needto install a newer C++ compiler from a ppa, and switch to it in order tobuild meta. We will also need to install a newer CMake version than isnatively available.

Start by running the following commands to get the dependencies that wewill need for building MeTA.

# this might take a whilesudo apt-get updatesudo apt-get install python-software-properties# add the ppa that contains an updated g++sudo add-apt-repository ppa:ubuntu-toolchain-r/testsudo apt-get update# this will probably take a whilesudo apt-get install g++ g++-4.8 git make wget libjemalloc-dev zlib1g-devwget http://www.cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.shsudo sh cmake-3.2.0-Linux-x86_64.sh --prefix=/usr/local

During CMake installation, you should agree to the license and then say "n"to including the subdirectory. You should be able to run the followingcommands and see the following output:

g++-4.8 --version

should print

g++-4.8 (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1Copyright (C) 2013 Free Software Foundation, Inc.This is free software; see the source for copying conditions.  There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and

/usr/local/bin/cmake --version

should print

cmake version 3.2.0CMake suite maintained and supported by Kitware (kitware.com/cmake).

Once the dependencies are all installed, you should be ready to build. Runthe following commands to get started:

# clone the projectgit clone https://github.com/meta-toolkit/meta.gitcd meta/# set up submodulesgit submodule update --init --recursive# set up a build directorymkdir buildcd buildcp ../config.toml.# configure and build the projectCXX=g++-4.8 /usr/local/bin/cmake ../ -DCMAKE_BUILD_TYPE=Releasemake

You can now test the system by running the following command:

./unit-test --reporter=spec

If everything passes, congratulations! MeTA seems to be working on yoursystem.

Ubuntu 14.04 LTS Build Guide

Ubuntu 14.04 has a recent enough GCC for building MeTA, but we'll need toadd a ppa for a more recent version of CMake.

Start by running the following commands to install the dependencies forMeTA.

# this might take a whilesudo apt-get updatesudo apt-get install software-properties-common# add the ppa for cmakesudo add-apt-repository ppa:george-edison55/cmake-3.xsudo apt-get update# install dependenciessudo apt-get install g++ cmake libicu-dev git libjemalloc-dev zlib1g-dev

Once the dependencies are all installed, you should double check yourversions by running the following commands.

g++ --version

should output

g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2Copyright (C) 2013 Free Software Foundation, Inc.This is free software; see the source for copying conditions.  There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and

cmake --version

should output

cmake version 3.2.2CMake suite maintained and supported by Kitware (kitware.com/cmake).

Once the dependencies are all installed, you should be ready to build. Runthe following commands to get started:

# clone the projectgit clone https://github.com/meta-toolkit/meta.gitcd meta/# set up submodulesgit submodule update --init --recursive# set up a build directorymkdir buildcd buildcp ../config.toml.# configure and build the projectcmake ../ -DCMAKE_BUILD_TYPE=Releasemake

You can now test the system by running the following command:

./unit-test --reporter=spec

If everything passes, congratulations! MeTA seems to be working on yoursystem.

Ubuntu 15.10 Build Guide

Ubuntu's non-LTS desktop offering in 15.10 has enough modern software inits repositories to build MeTA without much trouble. To install thedependencies, run the following commands.

apt updateapt install g++ git cmake make libjemalloc-dev zlib1g-dev

Once the dependencies are all installed, you should be ready to build. Runthe following commands to get started:

# clone the projectgit clone https://github.com/meta-toolkit/meta.gitcd meta/# set up submodulesgit submodule update --init --recursive# set up a build directorymkdir buildcd buildcp ../config.toml.# configure and build the projectcmake ../ -DCMAKE_BUILD_TYPE=Releasemake

You can now test the system by running the following command:

./unit-test --reporter=spec

If everything passes, congratulations! MeTA seems to be working on yoursystem.

Arch Linux Build Guide

Arch Linux consistently has the most up to date packages due to its rollingrelease setup, so it's often the easiest platform to get set up on.

To install the dependencies, run the following commands.

sudo pacman -Sysudo pacman -S clang cmake git icu libc++ make jemalloc zlib

Once the dependencies are all installed, you should be ready to build. Runthe following commands to get started:

# clone the projectgit clone https://github.com/meta-toolkit/meta.gitcd meta/# set up submodulesgit submodule update --init --recursive# set up a build directorymkdir buildcd buildcp ../config.toml.# configure and build the projectCXX=clang++ cmake ../ -DCMAKE_BUILD_TYPE=Releasemake

You can now test the system by running the following command:

./unit-test --reporter=spec

If everything passes, congratulations! MeTA seems to be working on yoursystem.

Fedora Build Guide

This has been tested with Fedora 22+ (the oldest currently supported Fedoraas of the time of writing). You may have success with earlier versions, butthis is not tested. (If you're on an older version of Fedora, useyuminstead ofdnf for the commands given below.)

To get started, install some dependencies:

# These may be already installedsudo dnf install make git wget gcc-c++ jemalloc-devel cmake zlib-devel

You should be able to run the following commands and see the followingoutput:

g++ --version

should print

g++ (GCC) 5.3.1 20151207 (Red Hat 5.3.1-2)Copyright (C) 2015 Free Software Foundation, Inc.This is free software; see the source for copying conditions.  There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and

cmake --version

should print

cmake version 3.3.2CMake suite maintained and supported by Kitware (kitware.com/cmake).

Once the dependencies are all installed, you should be ready to build. Runthe following commands to get started:

# clone the projectgit clone https://github.com/meta-toolkit/meta.gitcd meta/# set up submodulesgit submodule update --init --recursive# set up a build directorymkdir buildcd buildcp ../config.toml.# configure and build the projectcmake ../ -DCMAKE_BUILD_TYPE=Releasemake

You can now test the system with the following command:

./unit-test --reporter=spec

CentOS Build Guide

MeTA can be built in CentOS 7 and above. CentOS 7 comes with a recentenough compiler (GCC 4.8.5), but too old a version of CMake. We'll thusinstall the compiler and related libraries from the package manager andinstall our own more recentcmake ourselves.

# install build dependencies (this will probably take a while)sudo yum install gcc gcc-c++ git make wget zlib-devel epel-releasesudo yum install jemalloc-develwget http://www.cmake.org/files/v3.2/cmake-3.2.0-Linux-x86_64.shsudo sh cmake-3.2.0-Linux-x86_64.sh --prefix=/usr/local --exclude-subdir

You should be able to run the following commands and see the followingoutput:

g++ --version

should print

g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)Copyright (C) 2015 Free Software Foundation, Inc.This is free software; see the source for copying conditions.  There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and

/usr/local/bin/cmake --version

should print

cmake version 3.2.0CMake suite maintained and supported by Kitware (kitware.com/cmake).

Once the dependencies are all installed, you should be ready to build. Runthe following commands to get started:

# clone the projectgit clone https://github.com/meta-toolkit/meta.gitcd meta/# set up submodulesgit submodule update --init --recursive# set up a build directorymkdir buildcd buildcp ../config.toml.# configure and build the project/usr/local/bin/cmake ../ -DCMAKE_BUILD_TYPE=Releasemake

You can now test the system by running the following command:

./unit-test --reporter=spec

If everything passes, congratulations! MeTA seems to be working on yoursystem.

EWS/EngrIT Build Guide

Note: Please don't do this if you are able to get MeTA working inanyother possible way, as the EWS filesystem has a habit of beingunbearably slow and increasing compile times by several orders ofmagnitude. For example, comparing thecmake,make, andunit-teststeps on my desktop vs. EWS gives the following:

systemcmake timemake timeunit-test time
my desktop0m7.523s2m30.715s0m36.631s
EWS1m28s11m28.473s1m25.326s

If you are on a machine managed by Engineering IT at UIUC, you shouldfollow this guide. These systems have software that is much too old forbuilding MeTA, but EngrIT has been kind enough to package updated versionsof research software as modules. The modules provided for GCC and CMake arerecent enough to build MeTA, so it is actually mostly straightforward.

To set up your dependencies (you will need to do this every time you logback in to the system), run the following commands:

module load gccmodule load cmake/3.5.0

Once you have done this, double check your versions by running thefollowing commands.

g++ --version

should output

g++ (GCC) 5.3.0Copyright (C) 2015 Free Software Foundation, Inc.This is free software; see the source for copying conditions.  There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and

cmake --version

should output

cmake version 3.5.0CMake suite maintained and supported by Kitware (kitware.com/cmake).

If your versions are correct, you should be ready to build. To get started,run the following commands:

# clone the projectgit clone https://github.com/meta-toolkit/meta.gitcd meta/# set up submodulesgit submodule update --init --recursive# set up a build directorymkdir buildcd buildcp ../config.toml.# configure and build the projectCXX=`which g++` CC=`which gcc` cmake ../ -DCMAKE_BUILD_TYPE=Releasemake

You can now test the system by running the following command:

./unit-test --reporter=spec

If everything passes, congratulations! MeTA seems to be working on yoursystem.

Windows Build Guide

MeTA can be built on Windows using the MinGW-w64 toolchain with gcc. Westrongly recommend usingMSYS2 as this makes fetching the compilerand related libraries significantly easier than it would be otherwise, andit tends to have very up-to-date packages relative to other similar MinGWdistributions.

Note: If you find yourself confused or lost by the instructions below,please refer to ourvisual setup guide forWindows which includesscreenshots for every step, including updating MSYS2 and the MinGW-w64toolchain.

To start,download the installer for MSYS2 from the linkedwebsite and follow the instructions on that page. Once you've got itinstalled, you should use the MinGW shell to start a new terminal, in whichyou should run the following commands to download dependencies and relatedsoftware needed for building:

pacman -Syu git make patch mingw-w64-x86_64-{gcc,cmake,icu,jemalloc,zlib} --force

(the--force is needed to work around a bug with the latest MSYS2installer as of the time of writing.)

Then, exit the shell and launch the "MinGW-w64 Win64" shell. You can obtainthe toolkit and get started with:

# clone the projectgit clone https://github.com/meta-toolkit/meta.gitcd meta# set up submodulesgit submodule update --init --recursive# set up a build directorymkdir buildcd buildcp ../config.toml.# configure and build the projectcmake .. -G"MSYS Makefiles" -DCMAKE_BUILD_TYPE=Releasemake

You can now test the system by running the following command:

./unit-test --reporter=spec

If everything passes, congratulations! MeTA seems to be working on yoursystem.

Generic Setup Notes

  • There are rules for clean, tidy, and doc.After you run thecmakecommand once, you will be able to just runmake as usual when you'redeveloping---it'll detect when the CMakeLists.txt file has changed andrebuild Makefiles if it needs to.

  • To compile in debug mode, just replaceRelease withDebug in theappropriatecmake command for your OS above and rebuild usingmakeafter.

  • Don't hesitate to reach out onthe forum if you encounterproblems getting set up. We routinely build with a wide variety ofcompilers and operating systems through our continuous integrationsetups (travis-ci for Linux and OS X andAppveyor for Windows), so we can be fairly certain thatthings should build on nearly all major platforms.


[8]ページ先頭

©2009-2025 Movatter.jp