![]() A vector map, with points, polylines and polygons | |
Filename extension | .shp ,.shx ,.dbf |
---|---|
Internet media type | application/vnd.shp |
Developed by | Esri |
Type of format | GIS |
Standard | Shapefile Technical Description |
Theshapefile format is a geospatial vectordata format for geographic information system (GIS) software. It is developed and regulated byEsri as a mostlyopen specification for data interoperability among Esri and otherGIS software products.[1] The shapefile format can spatially describevector features:points,lines, andpolygons, representing, for example,water wells,rivers, andlakes. Each item usually hasattributes that describe it, such asname ortemperature.
The shapefile format is a digital vector storage format for storing geographic location and associated attribute information. This format lacks the capacity to storetopological information. The shapefile format was introduced withArcView GIS version 2 in the early 1990s. It is now possible to read and write geographical datasets using the shapefile format with a wide variety of software.
The shapefile format stores the geometry as primitive geometric shapes like points, lines, and polygons. These shapes, together with data attributes that are linked to each shape, create the representation of the geographic data. The term "shapefile" is quite common, but the format consists of a collection of files with a common filename prefix, stored in the samedirectory. The threemandatory files havefilename extensions.shp
,.shx
, and.dbf
. The actualshapefile relates specifically to the.shp
file, but alone is incomplete for distribution as the other supporting files are required. In line with theESRI Shapefile Technical Description,[1] legacy GIS software may expect that the filename prefix be limited to eight characters to conform to the DOS8.3 filename convention, though modern software applications accept files with longer names.
.shp
application/vnd.shp
.shx
application/vnd.shp.shx
.dbf
application/vnd.dbf
.prj
— projection description, using awell-known text representation of coordinate reference systems {content-type: text/plain OR application/text}.sbn
and.sbx
— aspatial index of the features {content-type: application/vnd.shp}.fbn
and.fbx
— a spatial index of the features that are read-only {content-type: application/vnd.shp}.ain
and.aih
— an attribute index of the active fields in a table {content-type: application/vnd.shp}.ixs
— a geocoding index for read-write datasets {content-type: application/vnd.shp}.mxs
— a geocoding index for read-write datasets (ODB format) {content-type: application/vnd.shp}.atx
— an attribute index for the.dbf
file in the form ofshapefile.columnname.atx
(ArcGIS 8 and later) {content-type:application/vnd.shp
}.shp.xml
—geospatial metadata in XML format, such asISO 19115 or otherXML schema {content-type: application/fgdc+xml}.cpg
— used to specify thecode page (only for.dbf
) for identifying thecharacter encoding to be used {content-type:text/plain
ORapplication/vnd.shp
}.qix
— an alternativequadtree spatial index used byMapServer andGDAL/OGR software {content-type: application/vnd.shp}In each of the.shp
,.shx
, and.dbf
files, the shapes in each file correspond to each other in sequence (i.e., the first record in the.shp
file corresponds to the first record in the.shx
and.dbf
files, etc.). The.shp
and.shx
files have various fields with differentendianness, so an implementer of the file formats must be very careful to respect the endianness of each field and treat it properly.
The main file (.shp) contains the geometry data. Geometry of a given feature is stored as a set of vector coordinates.[1]: 5 Thebinary file consists of a single fixed-lengthheader followed by one or more variable-lengthrecords. Each of the variable-length records includes a record-header component and a record-contents component. A detailed description of the file format is given in theESRI Shapefile Technical Description.[1] This format should not be confused with theAutoCAD shape font source format, which shares the.shp
extension.
The 2D axis ordering of coordinate data assumes aCartesian coordinate system, using the order (X Y) or (Easting Northing). This axis order is consistent forGeographic coordinate systems, where the order is similarly (longitude latitude). Geometries may also support 3- or 4-dimensional Z and M coordinates, forelevation and measure, respectively. A Z-dimension stores the elevation of each coordinate in3D space, which can be used for analysis or for visualisation of geometries using3D computer graphics. The user-defined M dimension can be used for one of many functions, such as storinglinear referencing measures or relativetime of a feature in4D space.
The main file header is fixed at 100 bytes in length and contains 17 fields; nine 4-byte (32-bit signed integer or int32) integer fields followed by eight 8-byte (double) signed floating point fields:
Bytes | Type | Endianness | Usage |
---|---|---|---|
0–3 | int32 | big | File code (always hex value0x0000270a) |
4–23 | int32 | big | Unused; five uint32 |
24–27 | int32 | big | File length (in 16-bit words, including the header) |
28–31 | int32 | little | Version |
32–35 | int32 | little | Shape type (see reference below) |
36–67 | double | little | Minimum bounding rectangle (MBR) of all shapes contained within the dataset; four doubles in the following order: min X, min Y, max X, max Y |
68–83 | double | little | Range of Z; two doubles in the following order: min Z, max Z |
84–99 | double | little | Range of M; two doubles in the following order: min M, max M |
The file then contains any number of variable-length records. Each record is prefixed with a record header of 8 bytes:
Bytes | Type | Endianness | Usage |
---|---|---|---|
0–3 | int32 | big | Record number (1-based) |
4–7 | int32 | big | Record length (in 16-bit words) |
Following the record header is the actual record:
Bytes | Type | Endianness | Usage |
---|---|---|---|
0–3 | int32 | little | Shape type (see reference below) |
4– | – | – | Shape content |
The variable-length record contents depend on the shape type, which must be either the shape type given in the file header or Null. The following are the possible shape types:
Value | Shape type | Fields |
---|---|---|
0 | Null shape | None |
1 | Point | X, Y |
3 | Polyline | MBR, Number of parts, Number of points, Parts, Points |
5 | Polygon | MBR, Number of parts, Number of points, Parts, Points |
8 | MultiPoint | MBR, Number of points, Points |
11 | PointZ | X, Y, Z Optional: M |
13 | PolylineZ | Mandatory: MBR, Number of parts, Number of points, Parts, Points, Z range, Z array Optional: M range, M array |
15 | PolygonZ | Mandatory: MBR, Number of parts, Number of points, Parts, Points, Z range, Z array Optional: M range, M array |
18 | MultiPointZ | Mandatory: MBR, Number of points, Points, Z range, Z array Optional: M range, M array |
21 | PointM | X, Y, M |
23 | PolylineM | Mandatory: MBR, Number of parts, Number of points, Parts, Points Optional: M range, M array |
25 | PolygonM | Mandatory: MBR, Number of parts, Number of points, Parts, Points Optional: M range, M array |
28 | MultiPointM | Mandatory: MBR, Number of points, Points Optional Fields: M range, M array |
31 | MultiPatch | Mandatory: MBR, Number of parts, Number of points, Parts, Part types, Points, Z range, Z array Optional: M range, M array |
The index contains positional index of the feature geometry and the same 100-byte header as the.shp
file, followed by any number of 8-byte fixed-length records which consist of the following two fields:
Bytes | Type | Endianness | Usage |
---|---|---|---|
0–3 | int32 | big | Record offset (in 16-bit words) |
4–7 | int32 | big | Record length (in 16-bit words) |
Using this index, it is possible to seek backwards in the shapefile by, first, seeking backwards in the shape index (which is possible because it uses fixed-length records), then reading the record offset, and using that offset to seek to the correct position in the.shp
file. It is also possible to seek forwards an arbitrary number of records using the same method.
It is possible to generate the complete index file given a lone.shp
file. However, since a shapefile is supposed to always contain an index, doing so counts as repairing a corrupt file.[2]
This file stores the attributes for each shape; it uses thedBase IV format. The format is public knowledge, and has been implemented in many dBase clones known asxBase. The open-source shapefile C library, for example, calls its format "xBase" even though it's plain dBase IV.[3]
The names and values of attributes are not standardized, and will be different depending on the source of the shapefile.
This is a binaryspatial index file, which is used only by Esri software. The format is not documented by Esri. However it has been reverse-engineered and documented by the open source community. The 100-byte header is similar to the one in.shp.[4] It is not currently implemented by other vendors. The.sbn
file is not strictly necessary, since the.shp
file contains all of the information necessary to successfully parse the spatial data.
The shapefile format has a number of limitations.[5]
The shapefile format does not have the ability to storetopological relationships between shapes. The ESRI ArcInfocoverages and manygeodatabases do have the ability to store feature topology.
The size of both.shp
and.dbf
component files cannot exceed 2 GB (or 231 bytes) — around 70 million point features at best.[6] The maximum number of feature for other geometry types varies depending on the number of vertices used.
The attribute database format for the.dbf
component file is based on an olderdBase standard. This database format inherently has a number of limitations:[6]
Because the shape type precedes each geometry record, a shapefile is technically capable of storing a mixture of different shape types. However, the specification states, "All the non-Null shapes in a shapefile are required to be of the same shape type." Therefore, this ability to mix shape types must be limited to interspersing null shapes with the single shape type declared in the file's header. A shapefile must not contain both polyline and polygon data, for example, the descriptions for a well (point), a river (polyline), and a lake (polygon) would be stored in three separate datasets.