Getting Involved#
Right now the primary audience for Apache Arrow are the developers of datasystems; most people will use Apache Arrow indirectly through systems that useit for internal data handling and interoperating with other Arrow-enabledsystems.
Even if you do not plan to contribute to Apache Arrow itself or Arrowintegrations in other projects, we’d be happy to have you involved:
Join the mailing list: send an email todev-subscribe@arrow.apache.org.Share your ideas and use cases for the project or read through theArchive.
Follow our activity onGitHub
Learn theFormat / Specification
PyArrow Architecture#
PyArrow is for the major part a wrapper around the functionalities thatArrow C++ implementation provides. The library tries to take what’s availablein C++ and expose it through a user experience that is more pythonic andless complex to use. So while in some cases it might be easy to map what’sin C++ to what’s in Python, in many cases the C++ classes and methods areused as foundations to build easier to use entities.
The
*.pyfiles in the pyarrow package are usually where the entitiesexposed to the user are declared. In some cases, those files might directlyimport the entities from inner implementation if they want to expose itas is without modification.The
lib.pyxfile is where the majority of the core C++ libarrowcapabilities are exposed to Python. Most of the implementation of thismodule relies on included*.pxifiles where the specific piecesare built. While being exposed to Python aspyarrow.libits contentshould be considered internal. The public classes are then directly exposedin other modules (likepyarrowitself) by virtue of importing them frompyarrow.libThe
_*.pyxfiles are where the glue code is usually created, it putstogether the C++ capabilities turning it into Python classes and methods.They can be considered the internal implementation of the capabilitiesexposed by the*.pyfiles.The
includes/*.pxdfiles are where the raw C++ library APIs are declaredfor usage in Cython. Here the C++ classes and methods are declared as they areso that in the other.pyxfiles they can be used to implement Python classes,functions and helpers.Apart from Arrow C++ library, which dependence is mentioned in the previous line,PyArrow is also based on PyArrow C++, dedicated pieces of code that live in
python/pyarrow/src/arrow/pythondirectory and provide the low levelcode for capabilities like converting to and from numpy or pandas and the classesthat allow to use Python objects and callbacks in C++.

