Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Specification for storing geospatial data in Apache Arrow

License

NotificationsYou must be signed in to change notification settings

geoarrow/geoarrow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains a specification for storing geospatial data in Apache Arrowand Arrow-compatible data structures and formats.

TheApache Arrow project specifies a standardizedlanguage-independent columnar memory format. It enables shared computational libraries,zero-copy shared memory and streaming messaging, interprocess communication, and issupported by many programming languages and data libraries.

Spatial information can be represented as a collection of discrete objects using points,lines and polygons (i.e., vector data). TheSimple Feature Access standard provides a widelyused abstraction, defining a set of geometries: Point, LineString, Polygon, MultiPoint,MultiLineString, MultiPolygon, and GeometryCollection. Next to a geometry, simple featurescan also have non-spatial attributes that describe the feature.

Geospatial data often comes in tabular format, with one or more columns withfeature geometries and additional columns with feature attributes. The Arrow columnarmemory model is well-suited to store both vector features andtheir attribute data. The GeoArrow specification defines how the vector features(geometries) can be stored in Arrow (and Arrow-compatible) data structures.

This repository contains the specifications for:

  • The memory layout for storing geometries in an Arrow array (format.md)
  • The Arrow extension type definitions that ensure type-level metadata (e.g., CRS) ispropagated when used in Arrow implementations (extension-types.md)

Defining a standard and efficient way to store geospatial data in the Arrow memorylayout enables interoperability between different tools and ensures geospatial tools canleverage the growing Apache Arrow ecosystem:

  • Efficient, columnar file formats. Leveraging the performant and compact storage ofApache Parquet as a vector data format in geospatial tools usingGeoParquet
  • Accelerated between-process geospatial data exchange using Apache Arrow IPC messageformat and Apache Arrow Flight
  • Zero-copy in-process geospatial data transport using the Apache Arrow C Data Interface(e.g., GDAL)
  • Shared libraries for geospatial data type representation and computation for queryengines that support columnar data formats (e.g., Velox, DuckDB, and Acero)

Relationship with GeoParquet

The GeoParquet specification originally started in this repo, but was moved out into itsown repo, leaving this repo to focus onthe Arrow-specific specifications (Arrow layout and extension type metadata). WhereasGeoParquet is a file-level metadata specification, GeoArrow is a field-level metadataand memory layout specification that applies in-memory (e.g., an Arrow array), on disk (e.g., usingParquet readers/writers provided by an Arrow implementation), and over the wire (e.g.,using the Arrow IPC format).

Implementations

  • geoarrow-c: geospatial type system andgeneric coordinate-shuffling library written in C with bindings in C++, R, and Python
  • geoarrow-rs: Rust implementation of theGeoArrow specification and bindings to GeoRust algorithms for efficient spatialoperations on GeoArrow memory. See also:
  • geoarrow-python: Python bindings to geoarrow-cthat provide integrations with libraries like pyarrow, pandas, andgeopandas.
  • geoarrow-r: R bindings to geoarrow-c that provideintegrations with libraries like sf and Arrow for geospatial data handling.
  • geoarrow-js: Pure TypeScript implementation of GeoArrow, on top of the Arrow JavaScript implementation.

Downstream libraries

  • Lonboard: fast, interactive geospatial vector data visualization in Jupyter, building on top of GeoArrow.

[8]ページ先頭

©2009-2025 Movatter.jp