Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Self-contained Japanese Morphological Analyzer written in pure Go

License

NotificationsYou must be signed in to change notification settings

ikawaha/kagome

Repository files navigation

GoDevGoReleaseCoverage StatusDocker Pulls

Kagome v2

Kagome is an open source Japanese morphological analyzer written in pure Go. It can tokenize Japanese text into words and analyze parts of speech, with dictionaries embedded in the binary for easy deployment.

Note

Key features (Improvements fromv1):

  • Self-contained binaries with embedded dictionaries (MeCab-IPADIC, UniDic)
  • Multiple segmentation modes for different use cases
  • RESTful API server mode for production use
  • WebAssembly support for browser environments

Index

Basic Usage

Command line

%kagome -hJapanese Morphological Analyzer -- github.com/ikawaha/kagome/v2usage: kagome <command>The commands are:   [tokenize] - command line tokenize (*default)   server - run tokenize server   lattice - lattice viewer   sentence - tiny sentence splitter   version - show versiontokenize [-file input_file] [-dict dic_file] [-userdict user_dic_file] [-sysdict (ipa|uni)] [-simple false] [-mode (normal|search|extended)] [-split] [-json]  -dict string    dict  -file string    input file  -json    outputs in JSON format  -mode string    tokenize mode (normal|search|extended) (default "normal")  -simple    display abbreviated dictionary contents  -split    use tiny sentence splitter  -sysdict string    system dict type (ipa|uni) (default "ipa")  -udict string    user dict
%# piped standard input%echo"すもももももももものうち"| kagomeすもも名詞,一般,*,*,*,*,すもも,スモモ,スモモも助詞,係助詞,*,*,*,*,も,モ,モもも名詞,一般,*,*,*,*,もも,モモ,モモも助詞,係助詞,*,*,*,*,も,モ,モもも名詞,一般,*,*,*,*,もも,モモ,モモの助詞,連体化,*,*,*,*,の,ノ,ノうち名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチEOS

As a Go library

# Install Kagome modulego get github.com/ikawaha/kagome/v2
package mainimport ("fmt""strings""github.com/ikawaha/kagome-dict/ipa""github.com/ikawaha/kagome/v2/tokenizer")funcmain() {t,err:=tokenizer.New(ipa.Dict(),tokenizer.OmitBosEos())iferr!=nil {panic(err)  }// wakati (simple word splitting/segmentation)fmt.Println("---wakati---")seg:=t.Wakati("すもももももももものうち")fmt.Println(seg)// tokenize w/ morphological analysisfmt.Println("---tokenize---")tokens:=t.Tokenize("すもももももももものうち")for_,token:=rangetokens {features:=strings.Join(token.Features(),",")fmt.Printf("%s\t%v\n",token.Surface,features)  }}

output:

---wakati---[すもも も もも も もも の うち]---tokenize---すもも名詞,一般,*,*,*,*,すもも,スモモ,スモモも助詞,係助詞,*,*,*,*,も,モ,モもも名詞,一般,*,*,*,*,もも,モモ,モモも助詞,係助詞,*,*,*,*,も,モ,モもも名詞,一般,*,*,*,*,もも,モモ,モモの助詞,連体化,*,*,*,*,の,ノ,ノうち名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ

Install

Toget thekagome command line tool, choose your preferred installation method below:

  • Go (recommended)

    go install github.com/ikawaha/kagome/v2@latest
  • Homebrew

    #macOS and Linux (for both AMD64 and Arm64)brew install ikawaha/kagome/kagome
  • Manual Install

    • For manual installation, download and extract the appropriate archived file for your OS and architecture from thereleases page.
    • Note that the extracted binary must be placed in an accessible directory with execution permission.
  • Docker/Docker Compose

Commands

Major sub-commands ofkagome command line tool.

Tokenize command

%# interactive/REPL mode%kagomeすもももももももものうちすもも名詞,一般,*,*,*,*,すもも,スモモ,スモモも助詞,係助詞,*,*,*,*,も,モ,モもも名詞,一般,*,*,*,*,もも,モモ,モモも助詞,係助詞,*,*,*,*,も,モ,モもも名詞,一般,*,*,*,*,もも,モモ,モモの助詞,連体化,*,*,*,*,の,ノ,ノうち名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチEOS
%# piped standard inputecho "すもももももももものうち" | kagomeすもも  名詞,一般,*,*,*,*,すもも,スモモ,スモモも      助詞,係助詞,*,*,*,*,も,モ,モもも    名詞,一般,*,*,*,*,もも,モモ,モモも      助詞,係助詞,*,*,*,*,も,モ,モもも    名詞,一般,*,*,*,*,もも,モモ,モモの      助詞,連体化,*,*,*,*,の,ノ,ノうち    名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチEOS
%# JSON output%# (For jq command see https://jqlang.org/)%echo""| kagome -json| jq.[  {    "id": 286994,    "start": 0,    "end": 1,    "surface": "猫",    "class": "KNOWN",    "pos": [      "名詞",      "一般",      "*",      "*"    ],    "base_form": "猫",    "reading": "ネコ",    "pronunciation": "ネコ",    "features": [      "名詞",      "一般",      "*",      "*",      "*",      "*",      "猫",      "ネコ",      "ネコ"    ]  }]
%# word splitting/segmentation only (equivalent to "wakati" functionality)%echo"すもももももももものうち"| kagome -json| jq -r'[.[].surface] | join("/")'すもも/も/もも/も/もも/の/うち
%# Extract only pronunciations using jq (for Text-to-Speech purposes, etc.)%echo"私ははにわよわわわんわん"| kagome -json| jq -r'.[].pronunciation'ワタシハニワワンワン

Server command

For continuous usage,kagome provides a server mode to decouple the startup time of the tokenizer.

RESTful API

Start a server and try to access the "/tokenize" endpoint.

%kagome server&%curl -XPUT localhost:6060/tokenize -d'{"sentence":"すもももももももものうち", "mode":"normal"}'| jq.

Web App

Start a server and accesshttp://localhost:6060 in your browser.

%kagome server&

webapp

Important

The demo web application usesgraphviz to draw a lattice. You need graphviz to be installed on your system.

[!TIP]Kagome can be compiled to WebAssembly (wasm) and run locally in a web browser as well. For details, see theWebAssembly section.

Lattice command

A debug tool of tokenize process outputs a lattice in graphviz dot format.

%kagome lattice 私は鰻| dot -Tpng -o lattice.png

lattice

Sentence command

Split long text into sentences:

%echo"吾輩は猫である。名前はまだ無い。"| kagome sentence吾輩は猫である。名前はまだ無い。

This command is useful if a single line of data is too lengthy, and you want to avoid errors such asbufio.Scanner: token too long.

%echo"吾輩は猫である。名前はまだ無い。"| kagome -json| jq -r'[.[].surface] | join("/")'吾輩/は/猫/で/ある/。/名前/は/まだ/無い/。%echo"吾輩は猫である。名前はまだ無い。"| kagome sentence| kagome -json| jq -r'[.[].surface] | join("/")'吾輩/は/猫/で/ある/。名前/は/まだ/無い/。

This command is equivalent to the-split option of thetokenize command.

%echo"吾輩は猫である。名前はまだ無い。"| kagome -split -json| jq -r'[.[].surface] | join("/")'吾輩/は/猫/で/ある/。名前/は/まだ/無い/。

Dictionaries

Note

For more details and differences between the dictionaries, see thewiki.

Segmentation modes

Similar toKuromoji, Kagome also supports varioussegmentation modes (splitting strategies) to tokenize the input text.

  • Normal: Regular segmentation
  • Search: Use a heuristic to perform additional segmentation that isuseful for search purposes
  • Extended: Similar to search mode, but also unknown words withuni-grams
UntokenizedNormalSearchExtended
関西国際空港関西国際空港関西 国際 空港関西 国際 空港
日本経済新聞日本経済新聞日本 経済 新聞日本 経済 新聞
シニアソフトウェアエンジニアシニアソフトウェアエンジニアシニア ソフトウェア エンジニアシニア ソフトウェア エンジニア
デジカメを買ったデジカメ を 買っ たデジカメ を 買っ たデ ジ カ メ を 買っ た

Note

If your purpose is for search, try changing the mode before switching to another dictionary.

Docker

Docker

We providescratch-based Docker images that simply run thekagome command line tool on various architectures: AMD64, Arm64, Arm32 (Arm v5, v6 and v7)

  • Pull the image

    docker pull ikawaha/kagome:latest
    # Alternatively, you can pull from GitHub Container Registrydocker pull ghcr.io/ikawaha/kagome:latest
  • Run the command via Docker

    # Interactive/REPL modedocker run --rm -it ikawaha/kagome:latest
    # If pulling from GitHub Container Registrydocker run --rm -it ghcr.io/ikawaha/kagome:latest
  • Run the server via Docker

    # Server mode (http://localhost:6060)docker run --rm -p 6060:6060 ikawaha/kagome:latest server
    # If pulling from GitHub Container Registrydocker run --rm -p 6060:6060 ghcr.io/ikawaha/kagome:latest server
  • docker-compose.yml example

    services:kagome:image:ikawaha/kagome:latestports:["6060:6060"]command:serverrestart:unless-stopped

Note: Base image doesn't include Graphviz. For lattice visualization, seeexamples.

WebAssembly

Kagome compiles to WebAssembly for browser use.

Reference

License

  • MIT

About

Self-contained Japanese Morphological Analyzer written in pure Go

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

 
 
 

[8]ページ先頭

©2009-2025 Movatter.jp