Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Abstract filters support#853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
dvrogozh wants to merge3 commits intometa-pytorch:main
base:main
Choose a base branch
Loading
fromdvrogozh:filters-cuda

Conversation

@dvrogozh
Copy link
Contributor

@dvrogozhdvrogozh commentedAug 26, 2025
edited
Loading

Changes:

  1. AddedDeviceInterface::initializeFiltersContext() API which returns device specific FilterGraph initialization settings
  2. Enabled filter graphs onSingleStreamDecoder level
  3. Switched CPU device interface to use above changes
  4. Enabled filter graphs in CUDA device interface to handle non-NV12 decoders output (10/12-bits videos) thruscale_cuda
    • Filter graph converts 10/12-bit videos to NV12 (asscale_cuda does not currently support conversion of YUV to RGB) and then cuda device interface converts NV12 to RGB (via existing NPP path)

Basically idea behind this change is the following. Let device interface to perform trivial and performance optimized conversions in theconvertAVFrameToFrameOutput() method. If device interface can not handle conversion in theconvertAVFrameToFrameOutput(), it can setup ffmpeg filters pipeline by returning valid description ininitializeFiltersContext().

I tested the pipeline on 10-bit videos, h264 and h265. However, this setup should be valid for 12-bit videos which I did not try.

Note:

  1. On ffmpeg-n4.4 filters pipeline walls back to CPU asscale_cuda does not support format conversion in this ffmpeg version (was added from n5.0)
  2. More work might be needed to alignscale_cuda converted outputs with CPU implementation. I do see the difference in the outputs likely due to different handling of color standards. This is something I was not able to overcome at the moment.

CC:@scotts@NicolasHug@eromomon

@meta-clameta-clabot added the CLA SignedThis label is managed by the Meta Open Source bot. labelAug 26, 2025
@dvrogozh
Copy link
ContributorAuthor

I forgot to mention that I did not change cpu device interface yet in a similar way to cuda (initializeFiltersContext() not implemented for cpu). This would be a next step if this design will be accepted.

@dvrogozh
Copy link
ContributorAuthor

I've added one more commit to update CPU device interface to use filters path introduce inSingleStreamDecoder andDeviceInterface. Still a draft as#831 needs to be reviewed first.

@dvrogozhdvrogozh marked this pull request as ready for reviewSeptember 2, 2025 19:18
@dvrogozh
Copy link
ContributorAuthor

I've rebase the PR on top of the recently landed#831 prerequisite. Change is ready for review.@scotts@NicolasHug

@dvrogozh
Copy link
ContributorAuthor

I pushed update to fix a linter issue which I have overlooked.

Error: Unable to download artifact(s): Artifact not found for name: pytorch_torchcodec__3.10_cu129_x86_64

This error in CI test seems unrelated to the change.

@dvrogozh
Copy link
ContributorAuthor

Pushed update to address another lint issue in the last commit.

@dvrogozh
Copy link
ContributorAuthor

avFrame->width,
"x",
avFrame->height);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not sure what we're enabling by exposingDeviceInterface::initializeFiltersContext() and then calling it here inSingleStreamDecoder. That is, ifDeviceInterface just didn't expose the concept of filter graphs at all, each device implementation could still decide to use a filter graph on its own.

We already have the ability to ask a device to do device-specific frame conversions withDeviceInterface::convertAVFrameToFrameOutput(). Pulling out filter graph capabilities from that, and then forcing that to be orchestrated at the level ofSingleStreamDecoder does not, as far as I can tell, buy us any new capabilities. But it does come at a cost, becauseCpuDeviceInterface::convertAVFrameToFrameOutput() is now (to me) quite strange: it does swscale based conversion and filter graph cleanup.

The logic in this PR is effectively:

if (stream.kind == AUDIO) {convertAudio(inFrame, outFrame);}elseif (stream.kind == VIDEO && device !=nullptr) {    filterGraph = device->getFilterGraph();if (filterGraph !=nullptr) {        inFrame = filterGraph->convert(inFrame);    }    device->convertVideo(inFrame, outFrame);}return outFrame;

I don't see how the above buys us anything over:

if (stream.kind == AUDIO) {convertAudio(inFrame, outFrame);}elseif (stream.kind == VIDEO && device !=nullptr) {// device is free to use filter graph if it wants; it's// entirely an implementation detail for the device     device->convertVideo(inFrame, outFrame);}return outFrame;

I do understand that in the future, we'll likely want to tell a device "Hey, here are the particular filters we want you to apply." But I don't think these interfaces actually buy us that capability. I think we can just as easily through theVideoStreamOptions we currently pass toDeviceInterface::convertAVFrameToFrameOutput().

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This PR moves ownership and managing of filter graph toSingleStreamDecoder. Otherwise this will be done in each device interface separately while doing that inSingleStreamDecoder makes these behave the same and share across all devices. There is further considerations around that which I highlight in#853 (comment).

@scotts
Copy link
Contributor

@dvrogozh, thank you for this work! I left a detailed comment where I'm trying to understand the value we're getting out of the new abstractions - please help me understand that better. At a higher-level, I wonder if we could take the following approach:

  1. DeviceInterface is left unchanged; we don't add the concept of filter graph initialization to it.
  2. SingleStreamDecoder is left unchanged.
  3. CpuDeviceInterface is left unchanged.
  4. CudaDeviceInterface is changed to add the new Cuda-specific filter graph implementation.

I'm sorry if that's the direction you were already going in - I know I pushed to see what the full generalization would look like. One quirk of the approach above is that each device may end up implementing the same patterns for their filter graph implementation. In that case, we could potentially add some member functions toDeviceInterface, but they would be private. They wouldn't be called bySingleStreamDecoder, but just guides for how best to implement filter graph support for a device.

@dvrogozh
Copy link
ContributorAuthor

@scotts : you have highlighted the alternate design approach to take. I think we have 2 options on a plate:

  1. Keep decoder output conversions (filter graphs, sws calls, npp calls, etc.) within device interface classes
  2. Define output conversions separate from device interface classes. In this case device interfaceconvertAVFrameToFrameOutput()becomes responsible only for the trivial operation - wrap AVFrame into Tensor without any conversions.

I am proposing to consider 2nd direction which gives more modular structure for the project and simplifies device interface to a set of trivial operations. I am trying to reduce complexity of writing a device interface for non-CUDA GPUs by reusing as much as possible.

In the sense of 2nd direction, this PR is the first step. As you saw, it moves ownership and management of the Filter Graph out of device interface (toSingleStreamDecoder). If we would take this direction further, I suggest to consider:

  1. AbstractFilterGraph to allow multiple implementations, i.e. understand under Filter Graph any conversion not only via ffmpeg filters, but also via sws or npp or other libraries.
  2. ImplementSwsGraph andNppGraph, i.e. move these out ofCpuDeviceInterface andCudaDeviceInterface
  3. Above steps will leave device interface to support only trivial wrapping/copy of AVFrame into Tensor which is backend specific operation

Please, share your thoughts on the above. If you want, we can start by going with 1st direction (keep conversion in device interfaces) and continue weighing 2nd approach. I think enabling encoders might contribute to its justification as we again will consider filter operations. This time before encoding. If we will decide to go with 1st approach for now, then I will submit another PR with the focus on enabling 10-bit support in CUDA device interface.

@scotts
Copy link
Contributor

@dvrogozh, thanks for explaining the direction you're going in! Based on where we are, I still prefer the first option: keeping all decoder output conversions within the device implementation.

I do think that in the future, we'll want better abstractions around filtergraph, swscale and NPP. That is, these are all ways to convert a raw decoded frame into our desired color space and potentially perform some native transforms. But what the exact abstractions we should build is not obvious to me right now. In particular, this PR has us in an in-between state where some stuff remains in the device implementation, and some it outside.

By pursuing option 1, I think we'll eventually get to a state where we have three device implementations: CPU and CUDA in this repo, and XPU in an Intel repo. At that point, it will be more clear to us what the patterns are and what abstractions to build.

For:meta-pytorch#776Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@dvrogozhdvrogozh changed the titleUse cuda filters to support 10-bit videosAbstract filters supportSep 12, 2025
@dvrogozh
Copy link
ContributorAuthor

By pursuing option 1, I think we'll eventually get to a state...

@scotts : I have submitted a PR to follow option 1. Please, help to review:

I will probably keep#853 opened and rebase it on top of#899 to continue working on option 2.

@dvrogozhdvrogozh marked this pull request as draftSeptember 12, 2025 23:49
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@scottsscottsscotts left review comments

@NicolasHugNicolasHugAwaiting requested review from NicolasHug

Assignees

No one assigned

Labels

CLA SignedThis label is managed by the Meta Open Source bot.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@dvrogozh@scotts

[8]ページ先頭

©2009-2025 Movatter.jp