geoarrow/geoarrowPublic

NotificationsYou must be signed in to change notification settings
Fork27
Star473

Specification for storing geospatial data in Apache Arrow

License

BSD-3-Clause license

473 stars 27 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
logo		logo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
extension-types.md		extension-types.md
format.md		format.md

Repository files navigation

GeoArrow Specification

Version 0.2.

This repository contains a specification for storing geospatial data in Apache Arrowand Arrow-compatible data structures and formats.

TheApache Arrow project specifies a standardizedlanguage-independent columnar memory format. It enables shared computational libraries,zero-copy shared memory and streaming messaging, interprocess communication, and issupported by many programming languages and data libraries.

Spatial information can be represented as a collection of discrete objects using points,lines and polygons (i.e., vector data). TheSimple Feature Access standard provides a widelyused abstraction, defining a set of geometries: Point, LineString, Polygon, MultiPoint,MultiLineString, MultiPolygon, and GeometryCollection. Next to a geometry, simple featurescan also have non-spatial attributes that describe the feature.

Geospatial data often comes in tabular format, with one or more columns withfeature geometries and additional columns with feature attributes. The Arrow columnarmemory model is well-suited to store both vector features andtheir attribute data. The GeoArrow specification defines how the vector features(geometries) can be stored in Arrow (and Arrow-compatible) data structures.

This repository contains the specifications for:

The memory layout for storing geometries in an Arrow array (format.md)
The Arrow extension type definitions that ensure type-level metadata (e.g., CRS) ispropagated when used in Arrow implementations (extension-types.md)

Defining a standard and efficient way to store geospatial data in the Arrow memorylayout enables interoperability between different tools and ensures geospatial tools canleverage the growing Apache Arrow ecosystem:

Efficient, columnar file formats. Leveraging the performant and compact storage ofApache Parquet as a vector data format in geospatial tools usingGeoParquet
Accelerated between-process geospatial data exchange using Apache Arrow IPC messageformat and Apache Arrow Flight
Zero-copy in-process geospatial data transport using the Apache Arrow C Data Interface(e.g., GDAL)
Shared libraries for geospatial data type representation and computation for queryengines that support columnar data formats (e.g., Velox, DuckDB, and Acero)

Relationship with GeoParquet

The GeoParquet specification originally started in this repo, but was moved out into itsown repo, leaving this repo to focus onthe Arrow-specific specifications (Arrow layout and extension type metadata). WhereasGeoParquet is a file-level metadata specification, GeoArrow is a field-level metadataand memory layout specification that applies in-memory (e.g., an Arrow array), on disk (e.g., usingParquet readers/writers provided by an Arrow implementation), and over the wire (e.g.,using the Arrow IPC format).

Implementations

geoarrow-c: geospatial type system andgeneric coordinate-shuffling library written in C with bindings in C++, R, and Python
geoarrow-rs: Rust implementation of theGeoArrow specification and bindings to GeoRust algorithms for efficient spatialoperations on GeoArrow memory. See also:
- Python bindings to geoarrow-rs
- geoarrow-wasm, JavaScript (WebAssembly) bindings to geoarrow-rs.
geoarrow-python: Python bindings to geoarrow-cthat provide integrations with libraries like pyarrow, pandas, andgeopandas.
geoarrow-r: R bindings to geoarrow-c that provideintegrations with libraries like sf and Arrow for geospatial data handling.
geoarrow-js: Pure TypeScript implementation of GeoArrow, on top of the Arrow JavaScript implementation.

Downstream libraries

Lonboard: fast, interactive geospatial vector data visualization in Jupyter, building on top of GeoArrow.

About

Specification for storing geospatial data in Apache Arrow

geoarrow.org

Releases2

Version 0.2.0 Latest

May 27, 2025

+ 1 release

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

GeoArrow Specification

Relationship with GeoParquet

Implementations

Downstream libraries

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases2

Packages

Uh oh!

Contributors11

Uh oh!

Movatterモバイル変換

License

geoarrow/geoarrow

Folders and files

Latest commit

History

Repository files navigation

GeoArrow Specification

Relationship with GeoParquet

Implementations

Downstream libraries

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases2

Packages0

Uh oh!

Contributors11

Uh oh!

Packages