AddedDeviceInterface::initializeFiltersContext() API which returns device specific FilterGraph initialization settings
Enabled filter graphs onSingleStreamDecoder level
Switched CPU device interface to use above changes
Enabled filter graphs in CUDA device interface to handle non-NV12 decoders output (10/12-bits videos) thruscale_cuda
- Filter graph converts 10/12-bit videos to NV12 (asscale_cuda does not currently support conversion of YUV to RGB) and then cuda device interface converts NV12 to RGB (via existing NPP path)

Basically idea behind this change is the following. Let device interface to perform trivial and performance optimized conversions in theconvertAVFrameToFrameOutput() method. If device interface can not handle conversion in theconvertAVFrameToFrameOutput(), it can setup ffmpeg filters pipeline by returning valid description ininitializeFiltersContext().

I tested the pipeline on 10-bit videos, h264 and h265. However, this setup should be valid for 12-bit videos which I did not try.

Note:

On ffmpeg-n4.4 filters pipeline walls back to CPU asscale_cuda does not support format conversion in this ffmpeg version (was added from n5.0)
More work might be needed to alignscale_cuda converted outputs with CPU implementation. I do see the difference in the outputs likely due to different handling of color standards. This is something I was not able to overcome at the moment.

CC:@scotts @NicolasHug @eromomon

meta-clabot added the CLA SignedThis label is managed by the Meta Open Source bot. label

Aug 26, 2025

dvrogozh mentioned this pull request

Aug 26, 2025

Proper 10bit videos support on GPU#776

Open

Copy link

ContributorAuthor

dvrogozh commentedAug 26, 2025

I forgot to mention that I did not change cpu device interface yet in a similar way to cuda (initializeFiltersContext() not implemented for cpu). This would be a next step if this design will be accepted.

dvrogozh force-pushed thefilters-cuda branch from58b4921 tobcde35eCompare

August 28, 2025 21:01

Copy link

ContributorAuthor

dvrogozh commentedAug 28, 2025

I've added one more commit to update CPU device interface to use filters path introduce inSingleStreamDecoder andDeviceInterface. Still a draft as#831 needs to be reviewed first.

dvrogozh mentioned this pull request

Aug 29, 2025

Move filter graph to stand alone class#831

Merged

dvrogozh force-pushed thefilters-cuda branch frombcde35e toa27a9e6Compare

September 2, 2025 19:08

dvrogozh marked this pull request as ready for review

September 2, 2025 19:18

Copy link

ContributorAuthor

dvrogozh commentedSep 2, 2025

I've rebase the PR on top of the recently landed#831 prerequisite. Change is ready for review.@scotts @NicolasHug

NicolasHug self-requested a review

September 3, 2025 09:58

dvrogozh force-pushed thefilters-cuda branch froma27a9e6 to7de6c57Compare

September 4, 2025 22:28

Copy link

ContributorAuthor

dvrogozh commentedSep 4, 2025

I pushed update to fix a linter issue which I have overlooked.

Error: Unable to download artifact(s): Artifact not found for name: pytorch_torchcodec__3.10_cu129_x86_64

This error in CI test seems unrelated to the change.

dvrogozh force-pushed thefilters-cuda branch from7de6c57 to78bb3b4Compare

September 5, 2025 14:09

Copy link

ContributorAuthor

dvrogozh commentedSep 5, 2025

Pushed update to address another lint issue in the last commit.

dvrogozh force-pushed thefilters-cuda branch from78bb3b4 tof400a6dCompare

September 5, 2025 21:11

Copy link

ContributorAuthor

dvrogozh commentedSep 5, 2025

Rebased on top of:

Refactor context structs to use constructors #869

scotts reviewed

Sep 6, 2025

View reviewed changes

src/torchcodec/_core/SingleStreamDecoder.cpp

		avFrame->width,
		"x",
		avFrame->height);
		}

Copy link

Contributor

scottsSep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not sure what we're enabling by exposingDeviceInterface::initializeFiltersContext() and then calling it here inSingleStreamDecoder. That is, ifDeviceInterface just didn't expose the concept of filter graphs at all, each device implementation could still decide to use a filter graph on its own.

We already have the ability to ask a device to do device-specific frame conversions withDeviceInterface::convertAVFrameToFrameOutput(). Pulling out filter graph capabilities from that, and then forcing that to be orchestrated at the level ofSingleStreamDecoder does not, as far as I can tell, buy us any new capabilities. But it does come at a cost, becauseCpuDeviceInterface::convertAVFrameToFrameOutput() is now (to me) quite strange: it does swscale based conversion and filter graph cleanup.

The logic in this PR is effectively:

if (stream.kind == AUDIO) {convertAudio(inFrame, outFrame);}elseif (stream.kind == VIDEO && device !=nullptr) {    filterGraph = device->getFilterGraph();if (filterGraph !=nullptr) {        inFrame = filterGraph->convert(inFrame);    }    device->convertVideo(inFrame, outFrame);}return outFrame;

I don't see how the above buys us anything over:

if (stream.kind == AUDIO) {convertAudio(inFrame, outFrame);}elseif (stream.kind == VIDEO && device !=nullptr) {// device is free to use filter graph if it wants; it's// entirely an implementation detail for the device     device->convertVideo(inFrame, outFrame);}return outFrame;

I do understand that in the future, we'll likely want to tell a device "Hey, here are the particular filters we want you to apply." But I don't think these interfaces actually buy us that capability. I think we can just as easily through theVideoStreamOptions we currently pass toDeviceInterface::convertAVFrameToFrameOutput().

Copy link

ContributorAuthor

dvrogozhSep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This PR moves ownership and managing of filter graph toSingleStreamDecoder. Otherwise this will be done in each device interface separately while doing that inSingleStreamDecoder makes these behave the same and share across all devices. There is further considerations around that which I highlight in#853 (comment).

Copy link

Contributor

scotts commentedSep 6, 2025

@dvrogozh, thank you for this work! I left a detailed comment where I'm trying to understand the value we're getting out of the new abstractions - please help me understand that better. At a higher-level, I wonder if we could take the following approach:

DeviceInterface is left unchanged; we don't add the concept of filter graph initialization to it.
SingleStreamDecoder is left unchanged.
CpuDeviceInterface is left unchanged.
CudaDeviceInterface is changed to add the new Cuda-specific filter graph implementation.

I'm sorry if that's the direction you were already going in - I know I pushed to see what the full generalization would look like. One quirk of the approach above is that each device may end up implementing the same patterns for their filter graph implementation. In that case, we could potentially add some member functions toDeviceInterface, but they would be private. They wouldn't be called bySingleStreamDecoder, but just guides for how best to implement filter graph support for a device.

Copy link

ContributorAuthor

dvrogozh commentedSep 8, 2025

@scotts : you have highlighted the alternate design approach to take. I think we have 2 options on a plate:

Keep decoder output conversions (filter graphs, sws calls, npp calls, etc.) within device interface classes
Define output conversions separate from device interface classes. In this case device interfaceconvertAVFrameToFrameOutput()becomes responsible only for the trivial operation - wrap AVFrame into Tensor without any conversions.

I am proposing to consider 2nd direction which gives more modular structure for the project and simplifies device interface to a set of trivial operations. I am trying to reduce complexity of writing a device interface for non-CUDA GPUs by reusing as much as possible.

In the sense of 2nd direction, this PR is the first step. As you saw, it moves ownership and management of the Filter Graph out of device interface (toSingleStreamDecoder). If we would take this direction further, I suggest to consider:

AbstractFilterGraph to allow multiple implementations, i.e. understand under Filter Graph any conversion not only via ffmpeg filters, but also via sws or npp or other libraries.
ImplementSwsGraph andNppGraph, i.e. move these out ofCpuDeviceInterface andCudaDeviceInterface
Above steps will leave device interface to support only trivial wrapping/copy of AVFrame into Tensor which is backend specific operation

Please, share your thoughts on the above. If you want, we can start by going with 1st direction (keep conversion in device interfaces) and continue weighing 2nd approach. I think enabling encoders might contribute to its justification as we again will consider filter operations. This time before encoding. If we will decide to go with 1st approach for now, then I will submit another PR with the focus on enabling 10-bit support in CUDA device interface.

Copy link

Contributor

scotts commentedSep 12, 2025

@dvrogozh, thanks for explaining the direction you're going in! Based on where we are, I still prefer the first option: keeping all decoder output conversions within the device implementation.

I do think that in the future, we'll want better abstractions around filtergraph, swscale and NPP. That is, these are all ways to convert a raw decoded frame into our desired color space and potentially perform some native transforms. But what the exact abstractions we should build is not obvious to me right now. In particular, this PR has us in an in-between state where some stuff remains in the device implementation, and some it outside.

By pursuing option 1, I think we'll eventually get to a state where we have three device implementations: CPU and CUDA in this repo, and XPU in an Intel repo. At that point, it will be more clear to us what the patterns are and what abstractions to build.

dvrogozh added3 commits

September 12, 2025 17:30

Use cuda filters to support 10-bit videos

23f3df9

For:meta-pytorch#776Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

Implement initializeFiltersContext for CPU device interface

745ed48

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

Drop timeBase from convertAVFrameToFrameOutput API

8f899e1

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

dvrogozh force-pushed thefilters-cuda branch fromf400a6d to8f899e1Compare

September 12, 2025 17:41

dvrogozh changed the title~~Use cuda filters to support 10-bit videos~~Abstract filters support

Sep 12, 2025

dvrogozh mentioned this pull request

Sep 12, 2025

Use cuda filters to support 10-bit videos#899

Merged

Copy link

ContributorAuthor

dvrogozh commentedSep 12, 2025

By pursuing option 1, I think we'll eventually get to a state...

@scotts : I have submitted a PR to follow option 1. Please, help to review:

Use cuda filters to support 10-bit videos #899

I will probably keep#853 opened and rebase it on top of#899 to continue working on option 2.

dvrogozh marked this pull request as draft

September 12, 2025 23:49

Labels

CLA Signed

This label is managed by the Meta Open Source bot.

Movatterモバイル変換

Abstract filters support#853

Are you sure you want to change the base?

Abstract filters support#853

Conversation

dvrogozh commentedAug 26, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

dvrogozh commentedAug 26, 2025

Uh oh!

dvrogozh commentedAug 28, 2025

Uh oh!

dvrogozh commentedSep 2, 2025

Uh oh!

dvrogozh commentedSep 4, 2025

Uh oh!

dvrogozh commentedSep 5, 2025

Uh oh!

dvrogozh commentedSep 5, 2025

Uh oh!

scottsSep 6, 2025

Choose a reason for hiding this comment

Uh oh!

dvrogozhSep 8, 2025

Choose a reason for hiding this comment

Uh oh!

scotts commentedSep 6, 2025

Uh oh!

dvrogozh commentedSep 8, 2025

Uh oh!

scotts commentedSep 12, 2025

Uh oh!

dvrogozh commentedSep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dvrogozh commentedAug 26, 2025•
edited
Loading