- Notifications
You must be signed in to change notification settings - Fork0
A script to generate multilingual GPS data of all places in the world.
License
canbax/GPS-miner
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
run below script for full execution
./process-dr5hn.sh && ./process-planet-geonames.sh && python3 merge-data.py && python3 verify-TR-data.py && python3 verify-major-cities.py && python3 verify-gps-data.py && node create-index.js && node create-grid.js && gzip -r generated_data/grid.jsonRun the scriptprocess-dr5hn. It will do the following steps.
download cities.csv fromGithub the github repo (countries-states-cities-database) ofdr5hn
- get only TR cities
- get only necessary columns
- Replace
âwitha - Replace name
Merkezwith the corresponding state name - Generate entries for the states that does not exist as a name such as İstanbul,Ankara from the average GPS of the it's children
- lower case the country code to be consistent with geo names
- name the result file as "turkish_cities.csv"
Run the scriptprocess-planet-geonames. It will use theplanet-latest_geonames.tsv.gz file fromhttps://github.com/OSMNames/OSMNames/releases page. Currently it uses v2.2.0. This script is to enrich the Turkish cities with alternative names and add all other cities of the world with their alternative names.
generate filtered_geonames.tsv from planet-scale OSM names data using bash script
- get only cities or towns that could be a municipality
- filter out unnecessary columns in the bash script
- convert certain names to Turkish alphabet such as "Istanbul" -> "İstanbul"
- filter-out all the
trcountry code names that are not in "turkish_cities.csv"
Run the Python 3 scriptmerge-data.py. It will do the following steps.
- merge 3 data files (dr5hn, OSMnames, ip2location) and create a singular TSV file which will be simply the database. (Let's name the file as "db.tsv"). The rows are sorted by "country_code", "state_name", "name" to look tidy and easily detect the duplicates.
test the DB if it stores all things correctly
- check all Turkish cities and states (run
verify-TR-data.pyscript) - check some major cities such as "Ulaanbaatar" (run
verify-major-cities.pyscript) - check if all valid GPS-data entries are present correctly (run
verify-gps-data.pyscript)
- check all Turkish cities and states (run
- create an index file from the DB to find an entry in O(1) time (run
node create-index.js) - read a line in O(1) time (run
node read-lines.js) - create a grizpped "grid" file from the DB to closest locations in nearly O(1) time (run
node create-grid.js)
create Trie by reading the DB and putting pointers to the DB entry using index file
- for each weird character such as: "ç", "ö" ... add also it's English mapping in the Trie structure
- gzip Trie data file if it's big
read gzip file and then implement the search function
About
A script to generate multilingual GPS data of all places in the world.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.