Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Use cuda filters to support 10-bit videos#899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
scotts merged 2 commits intometa-pytorch:mainfromdvrogozh:filters-cuda2
Sep 23, 2025

Conversation

@dvrogozh
Copy link
Contributor

As we discussed in#853, that's the version to enabled 10-bit support in cuda device interface by managing cuda filter graph within cuda device interface. This change makes these changes:

  1. Added support for 10-bit (and likely 12-bit which I did not test) viascale_cuda ffmpeg filter on ffmpeg>=n5.0
  2. Added support for 10-bit via CPU fallback on ffmpegn4.4 (as color conversion inscale_cuda appeared in ffmpeg>=n5.0)
  3. Added support to directly returned input from in CPU device interface if it does not require conversion (used in combination with CUDAn4.4 fallback)

CC:@scotts@NicolasHug

@meta-clameta-clabot added the CLA SignedThis label is managed by the Meta Open Source bot. labelSep 12, 2025
@dvrogozhdvrogozh mentioned this pull requestSep 12, 2025
auto deleter = [filteredAVFramePtr](void*) {
UniqueAVFrameavFrameToDelete(filteredAVFramePtr);
std::vector<int64_t> strides = {avFrame->linesize[0],3,1};
AVFrame*avFrameClone =av_frame_clone(avFrame.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We weren't callingav_frame_clone() here before. Was that an error?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No, that was not a mistake. The function has changed. Previously it combined 2 operations: 1) conversion of the frame with filtergraph, 2) creating a tensor. The frame converted by filtergraph lived locally insideUniqueAVFrame and we just released it in the end to the tensordeleter. I, the schema was:

torch::Tensor func(const UniqueAVFrame& input) {   UniqueAVFrame output = filtergraph->convert(input);   AVFrame* outputPtr = output.release();   return torch::from_blob(output->data, deleter(outputPtr), ...);}

In the new code I moved frame conversion with filtergraph outside of the function. And function signature has changed - it started to accept just a constant reference to a frame without knowing whether it will be further needed or not. But we still need to pass a reference to thedeleter to make sure that tensor will unreference it once it's not needed. To achieve that we clone the frame which references the same data and pass it to the deleter. In this way caller of the function can still do something with the frame if needed, or just unreference it (what we actually are doing). So, schema now is:

torch::Tensor func(const UniqueAVFrame& frame) {   AVFrame* frameClone = av_frame_clone(frame.get());   return torch::from_blob(frameClone->data, deleter(frameClone), ...);}

scotts reacted with thumbs up emoji
std::stringstream filters;

unsigned version_int =avfilter_version();
if (version_int <AV_VERSION_INT(8,0,103)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't love that we're doing FFmpeg versions checks here, but we also do this in several other places inCudaDeviceInterface.cpp. In the rest of the C++ code, we've kept such checks hidden behind functions inFFMPEGCommon. Most of those, however, are about a particular field of a struct being renamed. This logic is very specific to filtergraph, and it does make sense to keep that here.

There's no action to take based on this comment, I'm just pointing out it's an awkward situation. I might end up refactoring it in some of my decoder-native transform work.

Copy link
Contributor

@scottsscotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@dvrogozh, thank you for fixing this! There's some minor changes to make, the biggest one being I think we're mistakenly comparing pointer values instead of actual objects. But this is great, we should be able to merge after these minor changes.

For:meta-pytorch#776Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Copy link
Contributor

@scottsscotts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@dvrogozh, this looks great, thank you for the fix!

@scottsscotts merged commitfc60ed6 intometa-pytorch:mainSep 23, 2025
47 checks passed
}
return;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@dvrogozh QQ - do we expect frames to come out asAV_PIX_FMT_RGB24? I'm a bit surprised that this would be the case, I'd assume most encoded frames are in YUV or something similar?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This change here in CPU device interface is due to CPU fallback in CUDA device interface to handle ffmpeg-4.4 10-bit streams:
https://github.com/dvrogozh/torchcodec/blob/f298fa7e722e9e0932813545e819e7da1a31994d/src/torchcodec/_core/CudaDeviceInterface.cpp#L242-L247
The filter graph gets setup in CUDA device interface which outputs RGB24, then the output needs to be converted to tensor. Which happens here in CPU device interface.

Above being said, I think there might be 2 cases in the future where we might see RGB24 coming out of decoders:

  1. Images support and "image" codecs such as motion jpeg
  2. HW decoders sometimes blend decoding and color space/scaling into single pass for the optimization purposes.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@NicolasHugNicolasHugNicolasHug left review comments

@scottsscottsscotts approved these changes

Assignees

No one assigned

Labels

CLA SignedThis label is managed by the Meta Open Source bot.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@dvrogozh@scotts@NicolasHug

[8]ページ先頭

©2009-2025 Movatter.jp