Getting Involved#

Right now the primary audience for Apache Arrow are the developers of datasystems; most people will use Apache Arrow indirectly through systems that useit for internal data handling and interoperating with other Arrow-enabledsystems.

Even if you do not plan to contribute to Apache Arrow itself or Arrowintegrations in other projects, we’d be happy to have you involved:

PyArrow Architecture#

PyArrow is for the major part a wrapper around the functionalities thatArrow C++ implementation provides. The library tries to take what’s availablein C++ and expose it through a user experience that is more pythonic andless complex to use. So while in some cases it might be easy to map what’sin C++ to what’s in Python, in many cases the C++ classes and methods areused as foundations to build easier to use entities.

Four layers of PyArrow architecture: .py, .pyx, .pxd and low level C++ code.
  • The*.py files in the pyarrow package are usually where the entitiesexposed to the user are declared. In some cases, those files might directlyimport the entities from inner implementation if they want to expose itas is without modification.

  • Thelib.pyx file is where the majority of the core C++ libarrowcapabilities are exposed to Python. Most of the implementation of thismodule relies on included*.pxi files where the specific piecesare built. While being exposed to Python aspyarrow.lib its contentshould be considered internal. The public classes are then directly exposedin other modules (likepyarrow itself) by virtue of importing them frompyarrow.lib

  • The_*.pyx files are where the glue code is usually created, it putstogether the C++ capabilities turning it into Python classes and methods.They can be considered the internal implementation of the capabilitiesexposed by the*.py files.

  • Theincludes/*.pxd files are where the raw C++ library APIs are declaredfor usage in Cython. Here the C++ classes and methods are declared as they areso that in the other.pyx files they can be used to implement Python classes,functions and helpers.

  • Apart from Arrow C++ library, which dependence is mentioned in the previous line,PyArrow is also based on PyArrow C++, dedicated pieces of code that live inpython/pyarrow/src/arrow/python directory and provide the low levelcode for capabilities like converting to and from numpy or pandas and the classesthat allow to use Python objects and callbacks in C++.