- Notifications
You must be signed in to change notification settings - Fork3.8k
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
License
apache/arrow
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Apache Arrow is a universal columnar format and multi-language toolbox for fastdata interchange and in-memory analytics. It contains a set of technologies thatenable data systems to efficiently store, process, and move data.
Major components of the project include:
- The Arrow Columnar Format:a standard and efficient in-memory representation of various datatypes, plain or nested
- The Arrow IPC Format:an efficient serialization of the Arrow format and associated metadata,for communication between processes and heterogeneous environments
- ADBC (Arrow Database Connectivity)
↗
: Arrow-powered API,drivers, and libraries for access to databases and query engines - The Arrow Flight RPC protocol:based on the Arrow IPC format, a building block for remote services exchangingArrow data with application-defined semantics (for example a storage server or a database)
- C++ libraries
- C bindings using GLib
- C# .NET libraries
- Gandiva:anLLVM-based Arrow expression compiler, part of the C++ codebase
- Go libraries
↗
- Java libraries
↗
- JavaScript libraries
↗
- Julia implementation
↗
- Python libraries
- R libraries
- Ruby libraries
- Rust libraries
↗
- Swift libraries
↗
The↗
icon denotes that this component of the project is maintained in a separaterepository.
Arrow is anApache Software Foundation project. Learn more atarrow.apache.org.
The reference Arrow libraries contain many distinct software components:
- Columnar vector and table-like containers (similar to data frames) supportingflat or nested types
- Fast, language agnostic metadata messaging layer (using Google's Flatbufferslibrary)
- Reference-counted off-heap buffer memory management, for zero-copy memorysharing and handling memory-mapped files
- IO interfaces to local and remote filesystems
- Self-describing binary wire formats (streaming and batch/file-like) forremote procedure calls (RPC) and interprocess communication (IPC)
- Integration tests for verifying binary compatibility between theimplementations (e.g. sending data from Java to C++)
- Conversions to and from other in-memory data structures
- Readers and writers for various widely-used file formats (such as Parquet, CSV)
The official Arrow libraries in this repository are in different stages ofimplementing the Arrow format and related features. See our currentfeature matrixon git main.
Please read our latestproject contribution guide.
Even if you do not plan to contribute to Apache Arrow itself or Arrowintegrations in other projects, we'd be happy to have you involved:
- Join the mailing list: send an email todev-subscribe@arrow.apache.org. Share your ideas and use cases for theproject.
- Follow our activity onGitHub issues
- Learn the format
- Contribute code to one of the reference implementations
About
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Topics
Resources
License
Code of conduct
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.