- Notifications
You must be signed in to change notification settings - Fork16
C++ wrapper library for the NLP library spaCy
License
d99kris/spacy-cpp
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
| Linux | Mac |
|---|---|
Spacy-cpp is a C++ wrapper library for the NLP libraryspaCy. This project is not affiliated with spaCy, it ishowever distributed under the same type of license (MIT).
The goal of spacy-cpp is to expose the functionality of spaCy to C++applications, and to provide an API that is similar to that of spaCy,enabling rapid development in Python and simple porting to C++.
Spacy-cpp is under development and does not yet support all API's of spaCy,refer to the API Documentation section below.
Simple POS tagging example using spacy-cpp:
Spacy::Spacy spacy;auto nlp = spacy.load("en_core_web_sm");auto doc = nlp.parse("This is a sentence.");for (auto& token : doc.tokens()) std::cout << token.text() <<" [" << token.pos_() <<"]\n";
For reference - doing the same using the spaCy API in Python:
importspacynlp=spacy.load("en_core_web_sm")doc=nlp(u"This is a sentence.")fortokenindoc:print(token.text+" ["+token.pos_+"]")
Spacy-cpp is implemented using C++11 with the intention of being portable.Current version has been tested on:
- macOS Ventura 13.1
- Ubuntu 22.04 LTS
Spacy-cpp requires python development library, pip, spaCy and typically aspaCy model.
Install build dependencies:
brew install cmake pythonInstall spaCy and an English model:
pip3 install -U spacypython3 -m spacy download en_core_web_smInstall build dependencies:
sudo apt install cmake python3-pip libpython3-devInstall spaCy and an English model:
pip3 install -U spacypython3 -m spacy download en_core_web_smSpacy-cpp can be used either as a shared library or as a header-only library.
Build and install spacy-cpp:
mkdir -p build && cd build && cmake .. && make && sudo make installLink library:
-lspacyInclude header (convenience header including all modules):
#include <spacy/spacy>Copy the src/spacy directory to the source directory of your project. Thendefine SPACY_HEADER_ONLY and include headers needed (spacy/spacy includesall headers):
#define SPACY_HEADER_ONLY#include <spacy/spacy>The source tree includes two CMake project examples:
If a system has more than one Python installation, each of the installationswill have its own set of pip-installed Python packages. One must ensure thatspacy is installed for the Python version used by spacy-cpp (alternativelypoint spacy-cpp to the desired Python installation). When building spacy-cppusing CMake (example:./make.sh tests) the Python version used will beoutput, for examplePYTHON_EXECUTABLE="/usr/local/bin/python3.11". Use thisinterpreter to ensure spacy works correctly in Python, example:/usr/local/bin/python3.11 ./examples/python-spacy-usage.py. If not working,use this Python version to install spacy and a language model:
/usr/local/bin/python3.11 -m pip install -U spacy/usr/local/bin/python3.11 -m spacy download en_core_web_smSpacy-cpp is under development and does not support the complete spaCy API yet.
Attrs supports all attribute constants.
Doc supports the following methods / attributes:
- count_by()
- ents()
- has_vector()
- is_parsed()
- is_tagged()
- noun_chunks()
- sentiment()
- sents()
- similarity()
- text()
- text_with_ws()
- tokens()
- vector_norm()
MorphAnalysis supports the following methods / attributes:
- get()
- str()
- to_dict()
Nlp supports the following methods / attributes:
- parse()
- vocab()
Spacy supports the following methods / attributes:
- load()
- attrs()
Span supports the following methods / attributes:
- doc()
- label()
- label_()
- lemma_()
- orth_()
- root()
- sentiment()
- text()
- text_with_ws()
- tokens()
- vector_norm()
StringStore supports the following methods / attributes:
- add()
Token supports the following methods / attributes:
- check_flag()
- children()
- cluster()
- dep()
- dep_()
- ent_iob_()
- has_vector()
- head()
- i()
- idx()
- is_alpha()
- is_ascii()
- is_bracket()
- is_digit()
- is_left_punct()
- is_lower()
- is_oov()
- is_punct()
- is_quote()
- is_right_punct()
- is_space()
- is_stop()
- is_title()
- is_upper()
- lang()
- lang_()
- lemma()
- lemma_()
- like_email()
- like_num()
- like_url()
- lower()
- lower_()
- morph()
- nbor()
- norm()
- norm_()
- orth()
- orth_()
- pos()
- pos_()
- prob()
- rank()
- sentiment()
- shape()
- shape_()
- tag()
- tag_()
- text()
- text_with_ws()
- whitespace_()
Vocab supports the following methods / attributes:
- strings()
- In spacy-cpp Nlp cannot be called as a method in order to perform parsing.Instead one need to use Nlp::parse().
- In spacy-cpp Doc is not an iterable, instead one need to use Doc::token()to get a std::vector of the tokens in the Doc. Likewise for Span.
- In spacy-cpp non-ASCII strings must be UTF-8 encoded in order to becorrectly processed.
Spacy-cpp uses cmake for its tests. Commands to build and execute the testsuite:
mkdir -p build && cd build && cmake -DSPACYCPP_BUILD_TESTS=ON .. && make && ctest --output-on-failure ; cd -Spacy-cpp is distributed under the MIT license.SeeLICENSE file.
Bugs, PRs, etc are welcome on the GitHub project pagehttps://github.com/d99kris/spacy-cpp
c++, c++11, natural language processing, nlp, spacy.
About
C++ wrapper library for the NLP library spaCy
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.