Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

C++ wrapper library for the NLP library spaCy

License

NotificationsYou must be signed in to change notification settings

d99kris/spacy-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LinuxMac
LinuxmacOS

Spacy-cpp is a C++ wrapper library for the NLP libraryspaCy. This project is not affiliated with spaCy, it ishowever distributed under the same type of license (MIT).

The goal of spacy-cpp is to expose the functionality of spaCy to C++applications, and to provide an API that is similar to that of spaCy,enabling rapid development in Python and simple porting to C++.

Spacy-cpp is under development and does not yet support all API's of spaCy,refer to the API Documentation section below.

Example Usage

Simple POS tagging example using spacy-cpp:

Spacy::Spacy spacy;auto nlp = spacy.load("en_core_web_sm");auto doc = nlp.parse("This is a sentence.");for (auto& token : doc.tokens())    std::cout << token.text() <<" [" << token.pos_() <<"]\n";

For reference - doing the same using the spaCy API in Python:

importspacynlp=spacy.load("en_core_web_sm")doc=nlp(u"This is a sentence.")fortokenindoc:print(token.text+" ["+token.pos_+"]")

Supported Platforms

Spacy-cpp is implemented using C++11 with the intention of being portable.Current version has been tested on:

  • macOS Ventura 13.1
  • Ubuntu 22.04 LTS

Pre-requisites

Spacy-cpp requires python development library, pip, spaCy and typically aspaCy model.

macOS

Install build dependencies:

brew install cmake python

Install spaCy and an English model:

pip3 install -U spacypython3 -m spacy download en_core_web_sm

Ubuntu

Install build dependencies:

sudo apt install cmake python3-pip libpython3-dev

Install spaCy and an English model:

pip3 install -U spacypython3 -m spacy download en_core_web_sm

Installation

Spacy-cpp can be used either as a shared library or as a header-only library.

Shared Library

Build and install spacy-cpp:

mkdir -p build && cd build && cmake .. && make && sudo make install

Link library:

-lspacy

Include header (convenience header including all modules):

#include <spacy/spacy>

Header-only Library

Copy the src/spacy directory to the source directory of your project. Thendefine SPACY_HEADER_ONLY and include headers needed (spacy/spacy includesall headers):

#define SPACY_HEADER_ONLY#include <spacy/spacy>

CMake Usage

The source tree includes two CMake project examples:

FAQ

No module named spacy. Why does spacy-cpp not find spacy?

If a system has more than one Python installation, each of the installationswill have its own set of pip-installed Python packages. One must ensure thatspacy is installed for the Python version used by spacy-cpp (alternativelypoint spacy-cpp to the desired Python installation). When building spacy-cppusing CMake (example:./make.sh tests) the Python version used will beoutput, for examplePYTHON_EXECUTABLE="/usr/local/bin/python3.11". Use thisinterpreter to ensure spacy works correctly in Python, example:/usr/local/bin/python3.11 ./examples/python-spacy-usage.py. If not working,use this Python version to install spacy and a language model:

/usr/local/bin/python3.11 -m pip install -U spacy/usr/local/bin/python3.11 -m spacy download en_core_web_sm

API Documentation

Spacy-cpp is under development and does not support the complete spaCy API yet.

Supported Classes

Supported Methods / Attributes

Attrs supports all attribute constants.

Doc supports the following methods / attributes:

  • count_by()
  • ents()
  • has_vector()
  • is_parsed()
  • is_tagged()
  • noun_chunks()
  • sentiment()
  • sents()
  • similarity()
  • text()
  • text_with_ws()
  • tokens()
  • vector_norm()

MorphAnalysis supports the following methods / attributes:

  • get()
  • str()
  • to_dict()

Nlp supports the following methods / attributes:

  • parse()
  • vocab()

Spacy supports the following methods / attributes:

  • load()
  • attrs()

Span supports the following methods / attributes:

  • doc()
  • label()
  • label_()
  • lemma_()
  • orth_()
  • root()
  • sentiment()
  • text()
  • text_with_ws()
  • tokens()
  • vector_norm()

StringStore supports the following methods / attributes:

  • add()

Token supports the following methods / attributes:

  • check_flag()
  • children()
  • cluster()
  • dep()
  • dep_()
  • ent_iob_()
  • has_vector()
  • head()
  • i()
  • idx()
  • is_alpha()
  • is_ascii()
  • is_bracket()
  • is_digit()
  • is_left_punct()
  • is_lower()
  • is_oov()
  • is_punct()
  • is_quote()
  • is_right_punct()
  • is_space()
  • is_stop()
  • is_title()
  • is_upper()
  • lang()
  • lang_()
  • lemma()
  • lemma_()
  • like_email()
  • like_num()
  • like_url()
  • lower()
  • lower_()
  • morph()
  • nbor()
  • norm()
  • norm_()
  • orth()
  • orth_()
  • pos()
  • pos_()
  • prob()
  • rank()
  • sentiment()
  • shape()
  • shape_()
  • tag()
  • tag_()
  • text()
  • text_with_ws()
  • whitespace_()

Vocab supports the following methods / attributes:

  • strings()

Key Differences with spaCy

  1. In spacy-cpp Nlp cannot be called as a method in order to perform parsing.Instead one need to use Nlp::parse().
  2. In spacy-cpp Doc is not an iterable, instead one need to use Doc::token()to get a std::vector of the tokens in the Doc. Likewise for Span.
  3. In spacy-cpp non-ASCII strings must be UTF-8 encoded in order to becorrectly processed.

Technical Details

Spacy-cpp uses cmake for its tests. Commands to build and execute the testsuite:

mkdir -p build && cd build && cmake -DSPACYCPP_BUILD_TESTS=ON .. && make && ctest --output-on-failure ; cd -

License

Spacy-cpp is distributed under the MIT license.SeeLICENSE file.

Contributions

Bugs, PRs, etc are welcome on the GitHub project pagehttps://github.com/d99kris/spacy-cpp

Keywords

c++, c++11, natural language processing, nlp, spacy.

About

C++ wrapper library for the NLP library spaCy

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp