NotificationsYou must be signed in to change notification settings
Fork56.4k
Star85.3k

Allow access to CUDA pointers for interoperability with other libraries#16513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

opencv-pushbot merged 1 commit intoopencv:masterfrompwuertz:cuda_py_interop

Mar 5, 2020

Merged

Allow access to CUDA pointers for interoperability with other libraries#16513

opencv-pushbot merged 1 commit intoopencv:masterfrompwuertz:cuda_py_interop

Mar 5, 2020

Conversation

Copy link

Contributor

pwuertz commentedFeb 5, 2020•
edited by alalek
Loading

This is a proposal for addingCV_WRAP compatiblecudaPtr() getter methods toGpuMat andStream, required for enabling interoperability betweenOpenCV and other CUDA supporting python libraries likeNumba,CuPy,PyTorch, etc.

Here is an example for sharing aGpuMat withCuPy:

importcv2ascvimportcupyascp# Create GPU array with OpenCVdata_gpu_cv=cv.cuda_GpuMat()data_gpu_cv.upload(np.eye(64,dtype=np.float32))# Modify the same GPU array with CuPydata_gpu_cp=cp.asarray(CudaArrayInterface(data_gpu_cv))data_gpu_cp*=42.0# Download and verifyassertnp.allclose(data_gpu_cp.get(),np.eye(64)*42.0)

In this exampleCudaArrayInterface is a (incomplete) adapter class that implements thecuda array interface used by other frameworks:

classCudaArrayInterface:def__init__(self,gpu_mat):w,h=gpu_mat.size()type_map= {cv.CV_8U:"u1",cv.CV_8S:"i1",cv.CV_16U:"u2",cv.CV_16S:"i2",cv.CV_32S:"i4",cv.CV_32F:"f4",cv.CV_64F:"f8",        }self.__cuda_array_interface__= {"version":2,"shape": (h,w),"data": (gpu_mat.cudaPtr(),False),"typestr":type_map[gpu_mat.type()],"strides": (gpu_mat.step,gpu_mat.elemSize()),        }

If possible, I'd like to implement__cuda_array_interface__ within theGpuMat python binding in a future PR (not sure how to define a python property using the wrapper generator though).

force_builders=Custombuildworker:Custom=linux-4build_image:Custom=ubuntu-cuda:18.04

pwuertz mentioned this pull request

Feb 6, 2020

Specify synchronization semantics of CUDA Array Interfacenumba/numba#5162

Merged

Copy link

leofang commentedFeb 7, 2020

Hi@pwuertz, thanks for joining the discussion on Numba. I don't have comment for your effort here (yet), but I'd like to kindly ask that when this PR (and any subsequent ones) is merged, it'd be nice if you could follownumba/numba#5104 and add opencv to the list 🙂 Thank you.

Copy link

Member

alalek commentedFeb 10, 2020

CudaArrayInterface

I believe it should own upstreamgpu_mat object in its fields (extend lifetime ofdata_gpu_cv in example).

Copy link

ContributorAuthor

pwuertz commentedFeb 10, 2020

@alalek Please note thatclass CudaArrayInterface is not part of this PR. The PR was meant to provide the minimum requirement for any kind of interoperability in python, which is the access to the CUDA pointers.

For out-of-the-box interoperability with other libraries like Numba and Cupy I was planning on implementing the CUDA array interface (CAI) in a follow-up PR. As described earlier, I'm having trouble figuring out some OpenCV python binding generator details though.

Also note that within the current CAI specification (version 2), the responsibility for lifetime and synchronization resides with the user (similar to using cv::Mat constructors with data pointers). If this changes in some future version I'd be willing to update the CAI version on the OpenCV side of course.

asmorkalov requested changes

Feb 14, 2020

View reviewed changes

modules/core/include/opencv2/core/cuda.hpp

		operatorbool_type()const;

		//! return Pointer to CUDA stream
		CV_WRAPvoid*cudaPtr()const;

Copy link

Contributor

asmorkalovFeb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There isStreamAccessor to do it:https://docs.opencv.org/master/d6/df1/structcv_1_1cuda_1_1StreamAccessor.html. I think it's better to wrap accessor rather expose private fields.

Copy link

Member

alalekFeb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

StreamAccessor

Not sure that this can help to reach the final goal:

to implement__cuda_array_interface__ within theGpuMat python binding

modules/core/include/opencv2/core/cuda.inl.hpp

		inline
		void*GpuMat::cudaPtr()const
		{
		return data;

Copy link

Contributor

asmorkalovFeb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There isCV_PROP_RW macro that allows to expose object properties to Python and other languages. You do not need own method for it.

Copy link

Member

alalekFeb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I believe current approach is fine.
We should not allow changing of this pointer throughCV_PROP_RW.

asmorkalov added the pr: Discussion Required label

Feb 14, 2020

Copy link

ContributorAuthor

pwuertz commentedFeb 14, 2020

@asmorkalov

I tried addinguchar* data asCV_PROP, but it doesn't compile.CV_PROP apparently doesn't support pointer types.
Like@alalek, I figured that it shouldn't be allowed to change thedata pointer via public interface (neither from C++ nor from python), so having/promoting a getter method for it seems like a good thing.
TheGpuMat::data pointer looks identical toMat::data (name, uint8_t*, doc), yet it represents something completely different. I thinkvoid* cudaPtr() is a good name for what this pointer represents - an opaque handle to the CUDA array, not a data pointer for reading / writing uint8_t.

Copy link

ContributorAuthor

pwuertz commentedFeb 14, 2020

@alalek It's true that__cuda_array_interface__ currently does not specify stream handling, but stream interoperability really is a low hanging fruit. With access to the CUDA pointers, I did the following:

# allocated dataX_cpu, dataX_gpu_cv, stream_cv# (proof of principle numba views dataX_gpu_nb, stream_nb)t1=time.time()data1_gpu_cv.upload(arr=data1_cpu,stream=stream_cv)# OpenCV uploadkernel_nb(data1_gpu_nb,out=data2_gpu_nb,stream=stream_nb)# Numba operationdata2_gpu_cv.download(dst=data2_cpu,stream=stream_cv)# OpenCV downloadt2=time.time()# CPU time: 0.001 sstream_nb.synchronize()t3=time.time()# GPU time: 0.042 s# verified data2_cpu

So you can freely mix OpenCV and Numba operations on a single, fully async stream.

@asmorkalov Oh sorry, haven't noticed theStreamAccessor class before. I'll try adding wrapper definitions to it. This means thatcudaStream_t needs some kind of globally defined conversion rule too?

Copy link

ContributorAuthor

pwuertz commentedFeb 14, 2020

@asmorkalovStreamAccessor uses thecudaStream_t typedef, thus having a hard dependency on the CUDA SDK. I assume this is the reason for keeping it separate fromStream, which provides a public interface without CUDA dependency.

Even withHAVE_CUDA defined, neithermisc/python headers norcv2.cpp are able to include<cuda_runtime.h>.

How to proceed? Usingvoid* for transporting CUDA stream pointers appears to be the least intrusive solution..