- Notifications
You must be signed in to change notification settings - Fork8
Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.
License
iwpnd/flashgeotext
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Extract and count countries and cities (+their synonyms) from text, likeGeoText on steroids usingFlashText, a Aho-Corasick implementation. Flashgeotext is a fast, batteries-included (and BYOD) and native python library that extracts one or more sets of given city and country names (+ synonyms) from an input text.
introductory blogpost:https://iwpnd.github.io/articles/2020-02/flashgeotext-library
fromflashgeotext.geotextimportGeoTextgeotext=GeoText()input_text='''Shanghai. The Chinese Ministry of Finance in Shanghai said that China plans to cut tariffs on $75 billion worth of goods that the country imports from the US. Washington welcomes the decision.'''geotext.extract(input_text=input_text)>> {'cities': {'Shanghai': {'count':2,'span_info': [(0,8), (45,53)],'found_as': ['Shanghai','Shanghai'], },'Washington, D.C.': {'count':1,'span_info': [(175,185)],'found_as': ['Washington'], } },'countries': {'China': {'count':1,'span_info': [(64,69)],'found_as': ['China'], },'United States': {'count':1,'span_info': [(171,173)],'found_as': ['US'], } } }
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
pip:
pip install flashgeotext
conda:
conda install flashgeotext
for development:
git clone https://github.com/iwpnd/flashgeotext.gitcd flashgeotext/poetry install
poetry run pytest. -v
- Benjamin Ramser -Initial work -iwpnd
See also the list ofcontributors who participated in this project.
This project is licensed under the MIT License - see theLICENSE.md file for details
Demo Data cities fromhttp://www.geonames.org licensed under the Creative Commons Attribution 3.0 License.
About
Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors5
Uh oh!
There was an error while loading.Please reload this page.