Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A script to generate multilingual GPS data of all places in the world.

License

NotificationsYou must be signed in to change notification settings

canbax/GPS-miner

Repository files navigation

run below script for full execution

./process-dr5hn.sh && ./process-planet-geonames.sh && python3 merge-data.py && python3 verify-TR-data.py && python3 verify-major-cities.py && python3 verify-gps-data.py && node create-index.js && node create-grid.js && gzip -r generated_data/grid.json

Generate Turkish cities from dr5hn/countries-states-cities-database

Run the scriptprocess-dr5hn. It will do the following steps.

  • download cities.csv fromGithub the github repo (countries-states-cities-database) ofdr5hn

    • get only TR cities
    • get only necessary columns
    • Replaceâ witha
    • Replace nameMerkez with the corresponding state name
    • Generate entries for the states that does not exist as a name such as İstanbul,Ankara from the average GPS of the it's children
    • lower case the country code to be consistent with geo names
    • name the result file as "turkish_cities.csv"

Process planet-scale OSM names data

Run the scriptprocess-planet-geonames. It will use theplanet-latest_geonames.tsv.gz file fromhttps://github.com/OSMNames/OSMNames/releases page. Currently it uses v2.2.0. This script is to enrich the Turkish cities with alternative names and add all other cities of the world with their alternative names.

  • generate filtered_geonames.tsv from planet-scale OSM names data using bash script

    • get only cities or towns that could be a municipality
    • filter out unnecessary columns in the bash script
    • convert certain names to Turkish alphabet such as "Istanbul" -> "İstanbul"
    • filter-out all thetr country code names that are not in "turkish_cities.csv"

Merge TSV and CSV files

Run the Python 3 scriptmerge-data.py. It will do the following steps.

  • merge 3 data files (dr5hn, OSMnames, ip2location) and create a singular TSV file which will be simply the database. (Let's name the file as "db.tsv"). The rows are sorted by "country_code", "state_name", "name" to look tidy and easily detect the duplicates.

Check and ensure data

  • test the DB if it stores all things correctly

    • check all Turkish cities and states (runverify-TR-data.py script)
    • check some major cities such as "Ulaanbaatar" (runverify-major-cities.py script)
    • check if all valid GPS-data entries are present correctly (runverify-gps-data.py script)

Prepare searchable data structures

  • create an index file from the DB to find an entry in O(1) time (runnode create-index.js)
  • read a line in O(1) time (runnode read-lines.js)
  • create a grizpped "grid" file from the DB to closest locations in nearly O(1) time (runnode create-grid.js)

Create importable/exportable Trie data structure from TSV (in "irem")

  • create Trie by reading the DB and putting pointers to the DB entry using index file

    • for each weird character such as: "ç", "ö" ... add also it's English mapping in the Trie structure
    • gzip Trie data file if it's big
  • read gzip file and then implement the search function


[8]ページ先頭

©2009-2025 Movatter.jp