CUDA Integration #

Arrow is not limited to CPU buffers (located in the computer’s main memory,also named “host memory”). It also has provisions for accessing bufferslocated on a CUDA-capable GPU device (in “device memory”).

Note

This functionality is optional and must have been enabled at build time.If this is not done by your package manager, you might have to build Arrowyourself.

CUDA Contexts#

A CUDA context represents access to a particular CUDA-capable device.For example, this is creating a CUDA context accessing CUDA device number 0:

>>>frompyarrowimportcuda>>>ctx=cuda.Context(0)>>>

CUDA Buffers#

A CUDA buffer can be created by copying data from host memory to the memoryof a CUDA device, using theContext.buffer_from_data() method.The source data can be any Python buffer-like object, including Arrow buffers:

>>>importnumpyasnp>>>arr=np.arange(4,dtype=np.int32)>>>arr.nbytes16>>>cuda_buf=ctx.buffer_from_data(arr)>>>type(cuda_buf)pyarrow._cuda.CudaBuffer>>>cuda_buf.size# The buffer's size in bytes16>>>cuda_buf.address# The buffer's address in device memory30088364544>>>cuda_buf.context.device_number0

Conversely, you can copy back a CUDA buffer to device memory, getting a regularCPU buffer:

>>>buf=cuda_buf.copy_to_host()>>>type(buf)pyarrow.lib.Buffer>>>np.frombuffer(buf,dtype=np.int32)array([0, 1, 2, 3], dtype=int32)

Warning

Many Arrow functions expect a CPU buffer but will not check the buffer’sactual type. You will get a crash if you pass a CUDA buffer to such afunction:

>>>pa.py_buffer(b"x"*16).equals(cuda_buf)Segmentation fault

Numba Integration#

There is not much you can do directly with Arrow CUDA buffers from Python,but they support interoperation withNumba,a JIT compiler which can turn Python code into optimized CUDA kernels.

Arrow to Numba#

First let’s define a Numba CUDA kernel operating on anint32 array. Here,we will simply increment each array element (assuming the array is writable):

importnumba.cuda@numba.cuda.jitdefincrement_by_one(an_array):pos=numba.cuda.grid(1)ifpos<an_array.size:an_array[pos]+=1

Then we need to wrap our CUDA buffer into a Numba “device array” with the rightarray metadata (shape, strides and datatype). This is necessary so that Numbacan identify the array’s characteristics and compile the kernel with theappropriate type declarations.

In this case the metadata can simply be got from the original Numpy array.Note the GPU data isn’t copied, just pointed to:

>>>fromnumba.cuda.cudadrv.devicearrayimportDeviceNDArray>>>device_arr=DeviceNDArray(arr.shape,arr.strides,arr.dtype,gpu_data=cuda_buf.to_numba())

(ideally we could have defined an Arrow array in CPU memory, copied it to CUDAmemory without losing type information, and then invoked the Numba kernel on itwithout constructing the DeviceNDArray by hand; this is not yet possible)

Finally we can run the Numba CUDA kernel on the Numba device array (herewith a 16x16 grid size):

>>>increment_by_one[16,16](device_arr)

And the results can be checked by copying back the CUDA buffer to CPU memory:

>>>np.frombuffer(cuda_buf.copy_to_host(),dtype=np.int32)array([1, 2, 3, 4], dtype=int32)

Numba to Arrow#

Conversely, a Numba-created device array can be viewed as an Arrow CUDA buffer,using theCudaBuffer.from_numba() factory method.

For the sake of example, let’s first create a Numba device array:

>>>arr=np.arange(10,14,dtype=np.int32)>>>arrarray([10, 11, 12, 13], dtype=int32)>>>device_arr=numba.cuda.to_device(arr)

Then we can create a CUDA buffer pointing the device array’s memory.We don’t need to pass a CUDA context explicitly this time: the appropriateCUDA context is automatically retrieved and adapted from the Numba object.

>>>cuda_buf=cuda.CudaBuffer.from_numba(device_arr.gpu_data)>>>cuda_buf.size16>>>cuda_buf.address30088364032>>>cuda_buf.context.device_number0

Of course, we can copy the CUDA buffer back to host memory:

>>>np.frombuffer(cuda_buf.copy_to_host(),dtype=np.int32)array([10, 11, 12, 13], dtype=int32)

Movatterモバイル変換