Installing PyArrow#
System Compatibility#
PyArrow is regularly built and tested on Windows, macOS and variousLinux distributions. We strongly recommend using a 64-bit system.
Python Compatibility#
PyArrow is currently compatible with Python 3.10, 3.11, 3.12 and 3.13.
Using Conda#
Install the latest version of PyArrow fromconda-forge usingConda:
condainstall-cconda-forgepyarrow
Note
While thepyarrowconda-forge package isthe right choice for most users, both a minimal and maximal variant of thepackage exist, either of which may be better for your use case. SeeDifferences between conda-forge packages.
Using Pip#
Install the latest version fromPyPI (Windows, Linux,and macOS):
pipinstallpyarrow
If you encounter any importing issues of the pip wheels on Windows, you mayneed to install thelatest Visual C++ Redistributable for Visual Studio.
Warning
On Linux, you will need pip >= 19.0 to detect the prebuilt binary packages.
Installing nightly packages or from source#
Dependencies#
Optional dependencies
NumPy 1.21.2 or higher.
pandas 1.3.4 or higher,
cffi.
Additional packages PyArrow is compatible with arefsspecandpytz,dateutil ortzdata package for timezones.
tzdata on Windows#
While Arrow uses the OS-provided timezone database on Linux and macOS, it requires auser-provided database on Windows. To download and extract the text version ofthe IANA timezone database follow the instructions in the C++Runtime Dependencies or use pyarrow utility functionpyarrow.util.download_tzdata_on_windows() that does the same.
By default, the timezone database will be detected at%USERPROFILE%\Downloads\tzdata.If the database has been downloaded in a different location, you will need to seta custom path to the database from Python:
>>>importpyarrowaspa>>>pa.set_timezone_db_path("custom_path")
You may encounter problems writing datetime data to an ORC file if you installpyarrow with pip. One possible solution to fix this problem:
Install tzdata with
pipinstalltzdataSet the environment variable
TZDIR=path\to\.venv\Lib\site-packages\tzdata\
You can find wheretzdata is installed with the following pythoncommand:
>>>importtzdata>>>print(tzdata.__file__)path\to\.venv\Lib\site-packages\tzdata\__init__.py
Differences between conda-forge packages#
Onconda-forge, PyArrow is published as threeseparate packages, each providing varying levels of functionality. This is incontrast to PyPi, where only a single PyArrow package is provided.
The purpose of this split is to minimize the size of the installed package formost users (pyarrow), provide a smaller, minimal package for specialized usecases (pyarrow-core), while still providing a complete package for users whorequire it (pyarrow-all). What was historicallypyarrow onconda-forge is nowpyarrow-all, though mostusers can continue usingpyarrow.
Thepyarrow-core package includes the following functionality:
Compute Functions (i.e.,
pyarrow.compute)Streaming, Serialization, and IPC (i.e.,
pyarrow.ipc)Filesystem Interface (i.e.,
pyarrow.fs. Note: It’s planned to move cloud fileystems (i.e.,S3,GCS, etc) intopyarrowin a future release thoughLocal FS will remain inpyarrow-core.)File formats:Arrow/Feather,JSON,CSV,ORC (but not Parquet)
Thepyarrow package adds the following:
Acero (i.e.,
pyarrow.acero)Tabular Datasets (i.e.,
pyarrow.dataset)Parquet (i.e.,
pyarrow.parquet)Substrait (i.e.,
pyarrow.substrait)
Finally,pyarrow-all adds:
Arrow Flight RPC and Flight SQL (i.e.,
pyarrow.flight)Gandiva (i.e.,
pyarrow.gandiva)
The following table lists the functionality provided by each package and may beuseful when deciding to use one package over another or whenCreating A Custom Selection.
Component | Package | pyarrow-core | pyarrow | pyarrow-all |
Core | pyarrow-core | ✓ | ✓ | ✓ |
Parquet | libparquet | ✓ | ✓ | |
Dataset | libarrow-dataset | ✓ | ✓ | |
Acero | libarrow-acero | ✓ | ✓ | |
Substrait | libarrow-substrait | ✓ | ✓ | |
Flight | libarrow-flight | ✓ | ||
Flight SQL | libarrow-flight-sql | ✓ | ||
Gandiva | libarrow-gandiva | ✓ |
Creating A Custom Selection#
If you know which components you need and want to control what’s installed, youcan create a custom selection of packages to include only the extra features youneed. For example, to installpyarrow-core and add support for reading andwriting Parquet, installlibparquet alongsidepyarrow-core:
condainstall-cconda-forgepyarrow-corelibparquet
Or if you wish to usepyarrow but need support for Flight RPC:
condainstall-cconda-forgepyarrowlibarrow-flight

