- Notifications
You must be signed in to change notification settings - Fork161
JSON Utilities
TheJSON Utilities provide helper methods for developers using EsriJSON file formats in MapReduce applications.
This library contains:
- InputFormats - These are classes that Hadoop uses to decide how input data is split between mappers.
- RecordReaders - These are classes that define how a data-split is broken into records (key/value pairs). RecordReaders are used in conjunction with the InputFormats.
- EsriFeatureClass - This class provides a data structure that represents a set of geometries and associated attributes that can be directly constructed from theEnclosed JSON file format.
InputFormats and RecordReaders separate data intosplits and records that can be distributed across multiple mappers and map tasks. Each split is given to a mapper which then loops through each record in the split and callsmap<K,V>() on it.K is the key associated with a record andV is the value. For our formats, this key is the character offset from the start of the file to the start of the record in the file, and the value is the JSON text of the record. Don't focus too much on theK as we are really only interested in theV.
Take this small dataset of twoUnenclosed JSON files that live in the Hadoop file system (HDFS). Each file has a couple of records that contain a U.S. state name and a geometry representing the state's boundary.
- /path/to/data/
- data-1.json
{"attributes" : {"Name" :"California" },"geometry" :...}{"attributes" : {"Name" :"Arizona" },"geometry" :...} - data-2.json
{"attributes" : {"Name" :"Utah" },"geometry" :...}{"attributes" : {"Name" :"Colorado" },"geometry" :...}
For this dataset, we will useUnenclosedEsriJsonInputFormat andUnenclosedEsriJsonRecordReader. The RecordReader is created for each input split by the input format under the covers.
In this case, each mapper will receive an entire.json file.
Mapper 1 (data-1.json)
- Record 1
- Key
0 - Value
{ "attributes" : { "Name" : "California" }, "geometry" : ... }
- Record 2
- Key
62 - Value
{ "attributes" : { "Name" : "Arizona" }, "geometry" : ... }
Mapper 2 (data-2.json)
- Record 1
- Key
0 - Value
{ "attributes" : { "Name" : "Utah" }, "geometry" : ... }
- Record 2
- Key
56 - Value
{ "attributes" : { "Name" : "Colorado" }, "geometry" : ... }
The classEsriFeatureClass is a direct mapping to the data contained in an Enclosed JSON document.
Creating a feature class object is as simple as callingEsriFeatureClass.fromJson(InputStream) whereInputStream contains the entire JSON file. Here is an example of a how a mapper would create a feature class in Hadoop in itssetup method.
EsriFeatureClassfeatureClass;@Overridepublicvoidsetup(Contextcontext){Configurationconfig =context.getConfiguration();FSDataInputStreamiStream =null;try {// load the JSON file provided as argument 0FileSystemhdfs =FileSystem.get(config);iStream =hdfs.open(newPath(config.get("com.esri.geometry")));// create feature class from streamfeatureClass =EsriFeatureClass.fromJson(iStream);}catch (Exceptione){e.printStackTrace();}finally{if (iStream !=null){try {iStream.close();}catch (IOExceptione) { }}}}
Now that you have the feature class loaded, eachmap method can access the geometries and attributes in the feature. For Example:
@Overridepublicvoidmap(LongWritablekey,Textval,Contextcontext)throwsIOException,InterruptedException {// code to process these values and pull out longitude and latitude ...// construct a point using the Point class from the esri-geometry-api// and set the x,y values to coordinates from our source dataPointp =newPoint(longitude,latitude);// loop through each feature in the feature classfor (EsriFeaturefeature :featureClass.features) {// check to see if the feature geometry contains the point that we// are interested inif (GeometryEngine.contains(feature.geometry,point,spatialReference)) {Stringname = (String)feature.attributes.get("name");// emit the name as a key and any data you want associated with that// key as the valuecontext.write(newText(name), ...);break; } }}