- Notifications
You must be signed in to change notification settings - Fork8
Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.
License
iwpnd/flashgeotext
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Extract and count countries and cities (+their synonyms) from text, likeGeoText on steroids usingFlashText, a Aho-Corasick implementation. Flashgeotext is a fast, batteries-included (and BYOD) and native python library that extracts one or more sets of given city and country names (+ synonyms) from an input text.
introductory blogpost:https://iwpnd.github.io/articles/2020-02/flashgeotext-library
fromflashgeotext.geotextimportGeoTextgeotext=GeoText()input_text='''Shanghai. The Chinese Ministry of Finance in Shanghai said that China plans to cut tariffs on $75 billion worth of goods that the country imports from the US. Washington welcomes the decision.'''geotext.extract(input_text=input_text)>> {'cities': {'Shanghai': {'count':2,'span_info': [(0,8), (45,53)],'found_as': ['Shanghai','Shanghai'], },'Washington, D.C.': {'count':1,'span_info': [(175,185)],'found_as': ['Washington'], } },'countries': {'China': {'count':1,'span_info': [(64,69)],'found_as': ['China'], },'United States': {'count':1,'span_info': [(171,173)],'found_as': ['US'], } } }
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
pip:
pip install flashgeotext
conda:
conda install flashgeotext
for development:
git clone https://github.com/iwpnd/flashgeotext.gitcd flashgeotext/poetry install
poetry run pytest. -v
- Benjamin Ramser -Initial work -iwpnd
See also the list ofcontributors who participated in this project.
This project is licensed under the MIT License - see theLICENSE.md file for details
Demo Data cities fromhttp://www.geonames.org licensed under the Creative Commons Attribution 3.0 License.
About
Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.