Movatterモバイル変換

[0]ホーム

Jump to content

Toponym resolution

Add links

From Wikipedia, the free encyclopedia

(Redirected fromGeoparsing)

Relationship process between a toponym and an unambiguous spatial footprint of the same place

Ingeographic information systems,toponym resolution is therelationship process between atoponym, i.e. the mention of a place, and an unambiguous spatial footprint of the same place.^[1]

The places mentioned in digitized text collections constitute a rich data source for researchers in many disciplines. However, toponyms inlanguage use are ambiguous, and difficult to assign a definite real-worldreferent. Over time, established geographic names may change (as in "Byzantium" > "Constantinople" > "Istanbul"); or they may be reused verbatim ("Boston" in England, UK vs. "Boston" in Massachusetts, USA), or with modifications (as in "York" vs. "New York"). To map a set of place names or toponyms that occur in a document to their correspondinglatitude/longitude coordinates, a polygon, or any other spatial footprint, a disambiguation step is necessary. A toponym resolution algorithm is an automatic method that performs a mapping from a toponym to a spatial footprint.

Some methods for toponym resolution employ agazetteer of possible mappings between names and spatial footprints.^[2]

Resolution process

[edit]

The "unambiguous spatial footprint of the same place"^[1] of definition can be in fact unambiguous, or "not so unambiguous". There are some differentcontexts ofuncertainty where the resolution process can occur:

When the evidence is geographical and with no uncertainty. For example, to obtain the country name of a photo place, when the place is a GPS position (10 meters of error), at 1000 km far from country borders.
When the evidence is geographical, but with considerable uncertainty. Imagine a similar scenario where the GPS error is 100 meters and the place is near from, ~100 meters, of the country borders.
When the evidence is only textual. Imagine a letter where the narrator is a tourist telling about his trip after he returned from vacation. The only evidences are textual, in the narrative.
Mixed sources of evidence: more than one evidence, no one precise.

From geographical evidence

[edit]

The toponym resolution sometimes is a simple conversion from name to abbreviation, in special when the abbreviation is used as standardgeocode. For example, converting the official country nameAfghanistan into anISO country code,AF.

In annotating media andmetadata, the conversion using amap and the geographical evidence (e.g. GPS), is the most usual approach to obtain toponym, or ageocode that represents the toponym.

From textual evidence

[edit]

In contrast togeocoding of postal addresses, which are typically stored in structureddatabase records, toponym resolution is typically applied to large unstructured text document collections to associate the locations mentioned in them with maps. If some of those text documents are geotagged --- e.g. because they are micro-blog posts with latitude and longitude automatically added --- they can be used to infer the varying geographical specificity of arbitrary terms, e.g. "cable car" or "high tide"^[3].

The process of annotating media (e.g., image, text, video) using spatial footprints is known asGeotagging. In order to automatically geotag a text document, the following steps are usually undertaken:toponym recognition (i.e., spotting textual references to geographic locations) andtoponym resolution (i.e., selecting an appropriate location interpretation for each geographic reference).

Toponym recognition can be considered as a special case ofnamed-entity recognition where the objective is to merely derive location entities. However, the result of named-entity recognition can be further improved using hand-crafted rules or statistical rules.^[4]

For obtaining location interpretations,resolution models tend to leveragegazetteers (i.e., huge databases of locations) such asGeoNames andOpenStreetMap. A naive approach to resolve toponyms is to pick the most populated interpretation from the list of candidates. For example, in the following excerpt:

Toronto man living, working in London 'uncertain of future' in U.K. after Brexit
— CBC

The naive approach seems viable since toponymsToronto andLondon refer to their most common interpretation, located in Canada and Britain respectively, whereas in the following piece from a news article:

High-speed rail between Toronto and London by 2025
— CBC

This approach fails to pinpoint toponymLondon as the city located inOntario, Canada. Hence, selecting the highest population cannot work well for toponyms in a localized context.

Additionally,toponym resolution does not addressmetonymy in general. Nonetheless, a resolution technique can still disambiguate a metonymy reference as long as it is identified as a toponym in the recognition phase. For instance, in the following excerpt:

Canada is also adjusting its driving laws to account for cannabis DUIs.
— Esquire

Canada indicates ametonymy and refers to "the government of Canada". However, it can be identified as a location by a generic named-entity recognizer and thus, a toponym resolver is able to disambiguate it.

Approaches

[edit]

Toponym resolution methods can be generally divided intosupervised andunsupervised models. Supervised methods typically cast the problem as a learning task wherein the model first extracts contextual and non-contextual features and then, a classifier is trained on a labelled dataset. Adaptive model^[5] is one of the prominent models proposed in resolving toponyms. For each interpretation of a toponym, the model derives context-sensitive features based on geographical proximity and sibling relationships with other interpretations. In addition to context related features, the model benefits from context-free features including population, and audience location. On the other hand, unsupervised models do not warrant annotated data. They are superior to supervised models when the annotated corpus is not sufficiently large, and supervised models may not generalize well.^[6]

Unsupervised models tend to better exploit the interplay of toponyms mentioned in a document. The Context-Hierarchy Fusion^[6] model estimates the geographic scope of documents and leverages the connections between nearby place names as evidence to resolve toponyms. By means of mapping the problem to a conflict-freeset cover problem, this model achieves a coherent and robust resolution.

Furthermore, adopting Wikipedia and knowledge bases have been shown effective in toponym resolution. TopoCluster^[7] models the geographical senses of words by incorporating Wikipedia pages of locations and disambiguates toponyms using the spatial senses of the words in the text.

Geoparsing

[edit]

Geoparsing is a special toponym resolution process of converting free-text descriptions of places (such as "twenty miles northeast of Jalalabad") into unambiguous geographic identifiers, such asgeographic coordinates expressed aslatitude-longitude. One can also geoparse location references from other forms of media, for examples audio content in which a speaker mentions a place. With geographic coordinates the features can be mapped and entered intoGeographic information systems. Two primary uses of the geographic coordinates derived from unstructured content are to plot portions of the content on maps and to search the content using a map as a filter.

Geoparsing goes beyondgeocoding. Geocoding analyzes unambiguous structured location references, such as postal addresses and rigorously formatted numerical coordinates. Geoparsing handles ambiguous references in unstructured discourse, such as "Al Hamra," which is the name of several places, including towns in both Syria and Yemen.

Ageoparser is a piece of software or a (web) service that helps in this process. Some examples:

GEOLocate automated georeferencing
BioGeomancer – Semi-automatic georeferencing
GEOnet Names Server – Freely available GIS information for areas outside of the US and Antarctica, updated monthly by the National Geospatial-Intelligence Agency (NGA) and the U.S. Board on Geographic Names (US BGN)
Geographic Names Information System (GNIS) – Freely available database containing information on almost 2 million physical features, places, and landmarks in the U.S.A.
CLAVIN – CLAVIN (Cartographic Location And Vicinity INdexer) is an open source software package for document geotagging and geoparsing that employs context-based geographic entity resolution.
Geocode.xyz – Geocode.xyz is a web service that identifies both place names and street addresses mentioned in text.^[8]
geoparsepy – geoparsepy is a free Python geoparsing library supporting free text location identification and disambiguation using the OpenStreetMap database

References

[edit]

^^a ^bLeidner, Jochen L. (2007).Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding (PhD). University of Edinburgh.hdl:1842/1849.
^Hill, Linda L. (2006).Georeferencing: The geographic associations of information. The MIT Press.ISBN 978-0262083546.
^Berggren, Max;Karlgren, Jussi; Östling, Robert; Parkvall, Mikael (2016). "Inferring the location of authors from words in their texts".Proceedings of the Nordic Conference on Computational Linguistics.arXiv:1612.06671.
^Lieberman, Michael D.;Samet, Hanan (2011).Multifaceted toponym recognition for streaming news(PDF). Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. pp. 843–852.doi:10.1145/2009916.2010029.
^Lieberman, Michael D.;Samet, Hanan (2012).Adaptive context features for toponym resolution in streaming news(PDF). Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. pp. 731–740.doi:10.1145/2348283.2348381.
^^a ^bKamalloo, Ehsan; Rafiei, Davood (2018).A Coherent Unsupervised Model for Toponym Resolution. Proceedings of the 2018 World Wide Web Conference. pp. 1287–1296.arXiv:1805.01952.doi:10.1145/3178876.3186027.
^DeLozier, Grant; Baldridge, Jason; London, Loretta (2015).Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. pp. 2382–2388.
^"Perl Advent Calendar 2016 - A Geo Parser for vast amounts of Text".

Movatterモバイル変換

Resolution process

From geographical evidence

From textual evidence

Approaches

Geoparsing

References

See also