Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Who's On First is a gazetteer of places.

License

NotificationsYou must be signed in to change notification settings

whosonfirst-data/whosonfirst-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disclaimer

As of May 2019, the whosonfirst-data repository has split into per-country repositories. You can read more about that changehere. While we still track all issues in this repository, the data itself will live in the per-country repositories for the foreseeable future.

Per-country repositories have the following repository naming convention:

whosonfirst-data-admin-{2-char country code}

Meaning administrative data for Mexico, for example, would live in the following repository:

whosonfirst-data-admin-mx

At thebottom of this README, you will find a full list of per-country repositories.

whosonfirst-data

Who's On First is a gazetteer of places. Not quiteall the places in the world but a whole lot of them and, we hope, the kinds of places that we mostly share in common.

A gazetteer is a big list of places, each with a stable identifier and some number of descriptive properties about that location. An interesting way to think about a gazetteer is to consider it as the space where debate about a place ismanaged but not decided. We call our gazetteer "Who's On First" (or sometimes "WOF" for short).

According toWikipedia, Who’s on First:

...is a comedy routine made famous by Abbott and Costello. The premise of thesketch is that Abbott is identifying the players on a baseball team forCostello, but their names and nicknames can be interpreted as non-responsiveanswers to Costello's questions. For example, the first baseman is named "Who";thus, the utterance "Who's on first" is ambiguous between the question ("Whichperson is the first baseman?") and the answer ("The name of the first baseman is'Who'"). "Who's on First?" is descended from turn-of-the-century burlesquesketches that used plays on words and names. Examples are "The Baker Scene" (theshop is located on Watt Street) and "Who Dyed" (the owner is named Who). In the1930 movie Cracked Nuts, comedians Bert Wheeler and Robert Woolsey examine a mapof a mythical kingdom with dialogue like this: "What is next to Which." "What isthe name of the town next to Which?" "Yes." In English music halls (Britain'sequivalent of vaudeville theatres), comedian Will Hay performed a routine in theearly 1930s (and possibly earlier) as a schoolmaster interviewing a schoolboynamed Howe who came from Ware but now lives in Wye.

Which sort of sums up the “problem” of geo, nicely. It might be easier, perhaps, if we all understood andexperienced the world as coordinate data but we don’t, so the burden of “place” and its many meanings is one we trundle along with to this day.

Our gazetteer is absolutely not finished – both in terms of data coverage as well as data quality – so, in the near-term, you should adjust your expectations accordingly when you approach the data. We are releasing the data now becausewe believe it is important not just to articulate our goals and intentions around the project but also to back them up with tangible proofs.

Learn more about the Who’s On First data model over athttps://whosonfirst.org/docs/.

First Principles

The gazetteer starts from a series of first principles:

Who's On First has an opinion

It is important that Who's On First have an opinion not about any one place but ratheraboutthe nature of place itself. It is important for us to know andunderstand the boundaries of our project in order to know what the project isfor and, critically, what the project is not.

Leave as many decisions as possible to the "edges"

The world is a complicated place and we would like the gazetteer to be a projectthat can support, or act as a scaffolding for, the sometimes contradictoryopinions that people have about it. We aim to leave as much meaning orinference, as we can, about a place to individual users and applications. Howthis will manifest itself in concrete terms remains to be seen but this is agoal we have set for ourselves.

Portability

The canonical source for a place is a text file, specifically GeoJSON with aunique 64-bit numeric ID. This is because all computers speak "text files" and"numbers". Text files can be inspected or updated in any old text editor. Textfiles can be printed. Numbers are fast and cheap for databases to index.

We use text files because our primary concern for the data is: Ease of use,robustness and portabilityover time. On measure, the benefits of plain oldtext files outweigh both the costs and in many cases the benefits of otherformats.

Google'sProtocol Buffers forexample are awesome but require that you install a whole lot of Google on yourcomputer in order to use them. ESRI'sShapefiles are equally awesome andtheir ubiquity and longevity is a testament to their utility but they toorequire bespoke applications for even the most trivial of updates.

That does not mean that plain text or static files are necessarily the optimalchoice for delivery or distribution. We will account for that on a case-by-casebasis. If we need to pre-process all the data into a smaller and nimbler formatfor a specific use-case then we will, but you will always be able to access thedata as simple text files.

GeoJSON

We useGeoJSON as the primary exchange format for thegazetteer for two interconnected and complementary reasons:

  • It is structured data with the least amount of markuptoday. If someonecreates another markup language with even less scaffolding we might use thatinstead but for now GeoJSON is a good happy medium.

  • There are lots of tools for working with GeoJSON and, importantly, forconverting it into all the other formats that different people use.

Some Very Very (Very Very) Important Caveats

Who’s On First is a work in progress

This means a few things:

  1. Some (maybe even a lot) of the data will be wrong.

  2. Some things are missing. Some things are missing in aknown unknown kind of way in which case they’ll be addressed shortly. Some things may still be missing in anunknown unknown kind of way in which case they’ll be addressed as the errors become apparent.

  3. Some (probably most) of the data will change in some way, if only to account for #1.

  4. We have not formalized or finalized the tools for updating all the ancestors or dependencies of a record when that record is updated. This means that in the short-term it is possible there will be inconsistencies between a record and its relations. We’ll get there.

The purpose of releasing the data now is not to sound the trumpets andherald a new dawn of perfect data but rather to give substance toeverything we’ve been talking about and to have a meaningful datasetwith which to prove or disprove those assumptions and to work throughthe practicalities of working with that data.

If you don’t have the time or the temperament (personally orinstitutionally) to deal with a little bit of on-goingbadcrazinessas we work through the issues diving in to the data now is probablypremature. We intend to continue working in public and discussing theproject openly so keep an eye on the blog and we’ll let you know asthings improve.

Git and GitHub

Don’t get too attached to working with or managing Who’s On First datain GitHub (or Git in general). We haven’t quite figured out what thebest way of both distributing the Who’s On First data and of acceptingcorrections or suggestions from community.

Even though the nice people at GitHub continue to do excellent work atmaking Git easier for a broader population to use, the reality remainsthat Git is a significant barrier to participation for manypeople. Absent a more formal decision about an alternative GitHub atleast allows us to point in the general direction of:

  • An open and readily distributed dataset that people can download andwork with

  • A way for people to contribute corrections (and general nuance)about a place

  • A way for us to be able to do everything above while still assuringus a measure of authority around the assertions we make about the data

  • Also a way for us to think about how and where we store an audittrail (of sorts) for updates to a place

Git and large files

We have started usinggit-lfs for managing large files. For example, the record for New Zealand which contains avery very very very very detailed coastline is fast approaching the 100MB filesize limit for any individual file on GitHub.

You can see the current list of files being managed by invoking thegit lfs ls-files command, like this:

$> cd /usr/local/mapzen/whosonfirst-data$> git lfs ls-files65ccc4825e * data/856/333/45/85633345.geojson

When you clone this repo the files (managed by git-lfs) only contain metadata, like this:

$> cat data/856/333/45/85633345.geojsonversion https://git-lfs.github.com/spec/v1oid sha256:65ccc4825e65c30f00fcebf1f3d57f4385f18a47e3c5e524114a67050186ae48size 71879893

In order to fetch the file itself you will need to rungit lfs fetch and thengit lfs checkout. Because computers... but anyway, like this:

$> git lfs fetchFetching master(1 of 1 files) 68.54 MB / 68.55 MB                                                                                               $> cat data/856/333/45/85633345.geojsonversion https://git-lfs.github.com/spec/v1oid sha256:65ccc4825e65c30f00fcebf1f3d57f4385f18a47e3c5e524114a67050186ae48size 71879893$> git lfs checkout(1 of 1 files) 68.55 MB / 68.55 MB                                                                                               $> cat data/856/333/45/85633345.geojson{  "id": 85633345,  "type": "Feature",  "properties": {    "edtf:cessation":"u",    "edtf:inception":"u",    "geom:area":29.187792061074827,    "geom:bbox":"166.426148,-47.289992,178.577244,-33

Woosh! We're still working through the details on this so suggestions, tips and (gentle) cluebats are welcome.

Theory (or "the even-longer version")

Where appropriate we have moved the theory (and sometimes history) around decisions for specific Who's On First properties in to dedicated GitHub repositories. They are:

Blog posts and related musings

All of the blog posts can be found over here:https://whosonfirst.org/blog/.

Practice

The spelunker

There is a read-only "spelunker" for viewing Who's On First online at:

https://spelunker.whosonfirst.org/

Venues

For the time being Who's On First maintains separate repositories with venues and points-of-interest. The starting point for venue data is:

https://github.com/whosonfirst-data/whosonfirst-data-venue-*

Note: The* should be replaced with country code (ex:ca) and/or country code and region code (ex:us-ny).

Repository examples:

License

Crediting Who's On First is recommended and linking back to this License is required.

Data from Who's On First.License.

The Who's On First dataset is both original work and a modification of existing open data. Some of those open data projectsdo require attribution. We have listed some sources below.

When wesource other open data projects we make best effort to indicate them (e.g.:'src:geom':naturalearth) and we also include the original source's properties prefixed with the following names spaces:

See an up-to-date list of sourceshere.

Please notify us if you believe that an open data project has not been properly noted.

Our original work is generally indicated with properties prefixed withwof or is not prefixed (likename).

Remember, some sourcesrequire attribution, some do not. Mapzen's original work, including the format and structure that allows Who's On First to operate, is made available under theCreative Commons Zero designation, and a shout out would be lovely.

Read the fullLicense file for more details per data source.

Caveats and "known knowns"

We've add a separate document calledREADME.KNOWN.KNOWNS.md that lists the current state of known knowns and other gotchas you might encounter working with the Who's On First data.

See also:

Repositories

About

Who's On First is a gazetteer of places.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors18


[8]ページ先頭

©2009-2025 Movatter.jp