- Notifications
You must be signed in to change notification settings - Fork161
Hive JSON SerDe
Randall Whitman edited this pageJul 14, 2017 ·6 revisions
Hive usesSerDes (Serializers/Deserializers) to map both structured and unstructured data into tables that can then be queried in a similar manner to a traditional relational database. JSON is not tabular, but it is structured and theJSON Formats exported from ArcGIS can easily be mapped to rows and columns using the Esri-JSON SerDe provided with these tools.
Here is an example of a table mapped with the SerDe using the Unenclosed Esri JSON format from theJSON Formats overview.
JSON source data
{"attributes" : {"OBJECTID" :1,"NAME" :"Redlands"... },"geometry" : {"rings" : [... ] }}{"attributes" : {"OBJECTID" :2,"NAME" :"Riverside"... },"geometry" : {"rings" : [... ] }}Create a table that maps theNAME attribute and the geometry from JSON to columnsName andBoundaryShape.
CREATE EXTERNAL TABLE IF NOT EXISTS counties (Name string, BoundaryShape binary) ROW FORMAT SERDE'com.esri.hadoop.hive.serde.EsriJsonSerDe' STOREDAS INPUTFORMAT'com.esri.json.hadoop.UnenclosedEsriJsonInputFormat'OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
There are a few things to note in the create statement:
- Column names are not case sensitive. So while the attribute in the JSON is
NAME, you can reference it asName. - The geometry is mapped to whichever column is defined with the type
binary. As a consequence, we only allow one column to be defined asbinaryin order to remove any potential ambiguity. - Attributes that are not defined in the column list will simply be discarded.
- The Hadoop input format
com.esri.json.hadoop.UnenclosedEsriJsonInputFormatfrom theSpatial Framework for Hadoop is necessary for the SerDe to work. SeeInputFormats and RecordReaders.
Once the table is created, you can interact with the JSON as if it were a table.
hive> DESCRIBE counties;OKnamestringfrom deserializerboundaryshapebinaryfrom deserializer
hive> SELECT Name, ST_GeometryType(BoundaryShape) FROM counties;OKRedlandsST_MULTIPOLYGONRiverside ST_MULTIPOLYGON