Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Apache Arrow

From Wikipedia, the free encyclopedia
Software framework
Apache Arrow
DeveloperApache Software Foundation
Initial releaseOctober 10, 2016; 9 years ago (2016-10-10)
Stable release
22.0.0[1] Edit this on Wikidata / 24 October 2025; 26 days ago (24 October 2025)
Repositorygithub.com/apache/arrow
Written inC,C++,C#,Go,Java,JavaScript,MATLAB,Python,R,Ruby,Rust
TypeData format, algorithms
LicenseApache License 2.0
Websitearrow.apache.org

Apache Arrow is alanguage-agnosticsoftware framework for developing data analytics applications that processcolumnar data. It contains a standardized column-oriented memory format that is able to represent flat and hierarchical data for efficient analytic operations on modernCPU andGPU hardware.[2][3][4][5][6] This reduces or eliminates factors that limit the feasibility of working with large sets of data, such as the cost, volatility, or physical constraints ofdynamic random-access memory.[7]

Interoperability

[edit]

Arrow can be used withApache Parquet,Apache Spark,NumPy,PySpark,pandas and other data processing libraries.The project includes nativesoftware libraries written in C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python (PyArrow[8]), R, Ruby, and Rust. Arrow allows for zero-copy reads and fast data access and interchange without serialization overhead between these languages and systems.[2]

Applications

[edit]

Arrow has been used in diverse domains, including analytics,[9] genomics,[10][7] and cloud computing.[11]

Comparison to Apache Parquet and ORC

[edit]

Apache Parquet andApache ORC are popular examples of on-disk columnar data formats. Arrow is designed as a complement to these formats for processing data in-memory.[12] The hardware resource engineering trade-offs for in-memory processing vary from those associated with on-disk storage.[13] The Arrow and Parquet projects include libraries that allow for reading and writing data between the two formats.[14]

Governance

[edit]

Apache Arrow was announced byThe Apache Software Foundation on February 17, 2016,[15] with development led by a coalition of developers from other open source data analytics projects.[16][17][6][18][19] The initial codebase and Java library was seeded by code fromApache Drill.[15]

References

[edit]
  1. ^"Release 22.0.0". 24 October 2025. Retrieved11 November 2025.
  2. ^ab"Apache Arrow and Distributed Compute with Kubernetes". 13 Dec 2018.
  3. ^Baer, Tony (17 February 2016)."Apache Arrow: Lining Up The Ducks In A Row... Or Column".Seeking Alpha.
  4. ^Baer, Tony (25 February 2019)."Apache Arrow: The little data accelerator that could".ZDNet.
  5. ^Hall, Susan (23 February 2016)."Apache Arrow's Columnar Layouts of Data Could Accelerate Hadoop, Spark".The New Stack.
  6. ^abYegulalp, Serdar (27 February 2016)."Apache Arrow aims to speed access to big data".InfoWorld.
  7. ^abTanveer Ahmad (2019)."ArrowSAM: In-Memory Genomics Data Processing through Apache Arrow Framework".bioRxiv 741843.doi:10.1101/741843.
  8. ^"Python — Apache Arrow v20.0.0".
  9. ^Dinsmore T.W. (2016). "In-Memory Analytics: Satisfying the Need for Speed".Disruptive Analytics. Apress, Berkeley, CA. pp. 97–116.doi:10.1007/978-1-4842-1311-7_5.ISBN 978-1-4842-1312-4.
  10. ^Versaci F, Pireddu L, Zanetti G (2016)."Scalable genomics: from raw data to aligned reads on Apache YARN"(PDF).IEEE International Conference on Big Data:1232–1241.
  11. ^Maas M, Asanović K, Kubiatowicz J (2017). "Return of the Runtimes: Rethinking the Language Runtime System for the Cloud 3.0 Era".Proceedings of the 16th Workshop on Hot Topics in Operating Systems. pp. 138–143.doi:10.1145/3102980.3103003.ISBN 978-1-4503-5068-6.
  12. ^Le Dem, Julien."Apache Arrow and Apache Parquet: Why We Needed Different Projects for Columnar Data, On Disk and In-Memory".KDnuggets.
  13. ^"Apache Arrow vs. Parquet and ORC: Do we really need a third Apache project for columnar data representation?". 2017-10-31.
  14. ^"PyArrow:Reading and Writing the Apache Parquet Format".
  15. ^ab"The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project".The Apache Software Foundation Blog. 17 February 2016.Archived from the original on 2016-03-13.
  16. ^Martin, Alexander J. (17 February 2016)."Apache Foundation rushes out Apache Arrow as top-level project".The Register.
  17. ^"Big data gets a new open-source project, Apache Arrow: It offers performance improvements of more than 100x on analytical workloads, the foundation says". 2016-02-17. Archived fromthe original on 2016-07-27. Retrieved2018-01-31.
  18. ^Le Dem, Julien (28 November 2016)."The first release of Apache Arrow".SD Times.
  19. ^"Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow".

External links

[edit]
Top-level
projects
Commons
Incubator
Other projects
Attic
Licenses
Retrieved from "https://en.wikipedia.org/w/index.php?title=Apache_Arrow&oldid=1314569195"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp