Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

pygeonlp, A python module for geotagging Japanese texts.

License

NotificationsYou must be signed in to change notification settings

geonlp-platform/pygeonlp

Repository files navigation

pygeonlp is an open source software for geotagging/geoparsingJapanese natural language text to extract place names.

More detailed Japanese documentation and API references are availablein the/docs/source directory.You can also find the latest online documentation atPyGeoNLP Reference.

How To Use

Importpygeonlp.api and initialize it by specifying the directorywhere the place-name database is placed.

>>>importpygeonlp.apiasapi>>>api.init(db_dir='mydic')

Then, rungeoparse("text to parse") .

>>>result=api.geoparse("国立情報学研究所は千代田区にあります。")

The result is a list of dict objects, with POS/Spatial attributesassigned to each word.

AGeoJSONrepresentation is obtained by JSON-encoding each dict object.

>>>importjson>>>print(json.dumps(result,indent=2,ensure_ascii=False))[  {"type":"Feature","geometry":null,"properties": {"surface":"国立","node_type":"NORMAL","morphemes": {"conjugated_form":"名詞-固有名詞-地名語","conjugation_type":"*","original_form":"国立","pos":"名詞","prononciation":"コクリツ","subclass1":"固有名詞","subclass2":"地名修飾語","subclass3":"*","surface":"国立","yomi":"コクリツ"      }    }  }, ...   {"type":"Feature","geometry": {"type":"Point","coordinates": [139.753634,35.694003      ]    },"properties": {"surface":"千代田区","node_type":"GEOWORD","morphemes": {"conjugated_form":"*","conjugation_type":"*","original_form":"千代田区","pos":"名詞","prononciation":"","subclass1":"固有名詞","subclass2":"地名語","subclass3":"WWIY7G:千代田区","surface":"千代田区","yomi":""      },"geoword_properties": {"address":"東京都千代田区","body":"千代田","body_variants":"千代田","code": {},"countyname":"","countyname_variants":"","dictionary_id":1,"entry_id":"13101A1968","geolod_id":"WWIY7G","hypernym": ["東京都"        ],"latitude":"35.69400300","longitude":"139.75363400","ne_class":"市区町村","prefname":"東京都","prefname_variants":"東京都","source":"1/千代田区役所/千代田区九段南1-2-1/P34-14_13.xml","suffix": ["区"        ],"valid_from":"","valid_to":"","dictionary_identifier":"geonlp:geoshape-city"      }    }  },  {"type":"Feature","geometry":null,"properties": {"surface":"に","node_type":"NORMAL","morphemes": {"conjugated_form":"*","conjugation_type":"*","original_form":"に","pos":"助詞","prononciation":"ニ","subclass1":"格助詞","subclass2":"一般","subclass3":"*","surface":"に","yomi":"ニ"      }    }  },...]

Pre-requirements

pygeonlp requiresMeCab C++ library and UTF8 dictionary for Japanese morphological analysis.

Also, the C++ implementation part depends onBoost C++.

$ sudo apt install libmecab-dev mecab-ipadic-utf8 libboost-all-dev

Install

The pygeonlp package can be installed with thepip command.It is recommended that you upgrade pip and setuptools tothe latest versions before running it.

$ pip install --upgrade pip setuptools$ pip install pygeonlp

The database needs to be prepared the first time.

Prepare the database

Execute the command to register the basic place name word analysis dictionaries(*.json,*.csv) in this package into the database undermydic/.

>>> import pygeonlp.api as api>>> api.setup_basic_database(db_dir='mydic/')

This command registers three dictionaries:

  • "Prefectures of Japan" (geonlp:geoshape-pref),

  • "Historical Administrative Area Data Set Beta Dictionary of Place Names" (geonlp:geoshape-city)

  • "Railroad Stations in Japan (2019)" (geonlp:ksj-station-N02-2019)

Install GDAL library (Optional)

If theGDAL library is installed,pygeonlp can use "spatial distance" for disambiguationwhen there are multiple place names with the same name, thus improving accuracy.You can also use spatial filters.

$ sudo apt install libgdal-dev$ pip install gdal

Install jageocoder (Optional)

pygeonlp can use address-geocoding if an address-dictionary forjageocoder is installed.

See the jageocoder documentation for installation instructions.

Run tests (Optional)

Run the unit tests withpytest command.

Uninstall

Usepip command to uninstall.

$ pip uninstall pygeonlp

Delete the database

When you register a place-name word analysis dictionary to the database,it will create a sqlite3 database and some other files in the specified directory.

If you want to delete them, just delete the whole directory.

$ rm -r mydic/

License

The 2-Clause BSD License

Acknowledgements

This software is supported byDIAS (Data Integration and Analysis System) andROIS-DS CODH (Center for Open Data in the Humanities).

It was also supported by JST (Japan Science and Technology Agency) PRESTO program.

About

pygeonlp, A python module for geotagging Japanese texts.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors3

  •  
  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp