NotificationsYou must be signed in to change notification settings
Fork72
Star835

Add performance tips tutorial#1065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Draft

mollyxu wants to merge7 commits intometa-pytorch:main

base:main

Choose a base branch

frommollyxu:performance-tips-tutorial

Draft

Add performance tips tutorial#1065

mollyxu wants to merge7 commits intometa-pytorch:mainfrommollyxu:performance-tips-tutorial

Conversation

Copy link

Contributor

mollyxu commentedNov 20, 2025

Consolidate performance tips in docs

first draft of performance tips tutorial

304fdf9

meta-clabot added the CLA SignedThis label is managed by the Meta Open Source bot. label

Nov 20, 2025

modify format

5693776

meta-pytorch deleted a comment frommeta-codesyncbot

Nov 20, 2025

mollyxu added2 commits

November 20, 2025 14:18

Merge branch 'meta-pytorch:main' into performance-tips-tutorial

e8b2a73

Merge branch 'meta-pytorch:main' into performance-tips-tutorial

7ac0d2f

NicolasHug reviewed

Nov 21, 2025

View reviewed changes

Copy link

Contributor

NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Made a first pass, thanks@mollyxu , it looks great!

examples/decoding/performance_tips.py OutdatedShow resolvedHide resolved

examples/decoding/performance_tips.py Outdated

Comment on lines 36 to 37

		# If you need to decode multiple frames at once, it is faster when using the batch methods. TorchCodec's batch APIs reduce overhead and can leverage
		# internal optimizations.

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nit: here it might be useful to explicitly say that the batch methods are faster than the single-frame decoding methods - e.g.get_frames_at() is faster than callingget_frame_at() multiple times.

examples/decoding/performance_tips.py OutdatedShow resolvedHide resolved

examples/decoding/performance_tips.pyShow resolvedHide resolved

examples/decoding/performance_tips.py OutdatedShow resolvedHide resolved

examples/decoding/performance_tips.py Outdated

		# - If you care about exactness of frame seeking, use “exact”.
		# - If you can sacrifice exactness of seeking for speed, which is usually the case when doing clip sampling, use “approximate”.
		# - If your videos don’t have variable framerate and their metadata is correct, then “approximate” mode is a net win: it will be just as accurate as the “exact” mode while still being significantly faster.
		# - If your size is small enough and we’re decoding a lot of frames, there’s a chance exact mode is actually faster.

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Above:

This is a good description. I think we can be more nuanced about when to recommend approximate, e.g. we should try to clearly articulate the last 3 bullet points which are currently slightly overlapping and contradictory (we now know that approximate won't always be "a net win").

That's on me: I need to first have a clear understanding of why approximate mode is sometimes slower, and I'll need to update the approximate mode tutorial with more detailed recommendations.

I won't be able to do that in the next few days, so to unblock yourself I think you can just remove the claims about approximate being strictly superior ( bullet points 2 and 3), and the more generic reco could be something like

If the video is long and you're only decoding a small amount of frames, approximate mode should be faster.

It's not super actionable for users but I hope the dedicated tutorial I'll edit will be more precise.

examples/decoding/performance_tips.py OutdatedShow resolvedHide resolved

examples/decoding/performance_tips.py

		#
		# Performance impact: CUDA decoding can significantly outperform CPU decoding,
		# especially for high-resolution videos and when combined with GPU-based transforms.
		# Actual speedup varies by hardware, resolution, and codec.

Copy link

Contributor

NicolasHugNov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's good to have those bullet points here. They overlap with what is already in the CUDA decoding tutorial, and I think we'll want to remove them from there and have them here instead.

Eventually we'll also want to update the CUDA tutorial to explain to users how to check whether they're falling back to the CPU.

Mainly here in this tutorial, I think we should insist on one thing (as the main point): users should be using the Beta interface with

withset_cuda_backend("beta"):dec=VideoDecoder("file.mp4",device="cuda")

address feedback

a74f653

Copy link

ContributorAuthor

mollyxu commentedNov 21, 2025

Thanks for the feedback!

Dan-Flores reviewed

Nov 21, 2025

View reviewed changes

examples/decoding/performance_tips.py

		# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` for specific indices
		# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_in_range` for ranges
		# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_at` for timestamps
		# - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_in_range` for time ranges

Copy link

Contributor

Dan-FloresNov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe it would be more clear to group the similar functions here?
For example, we could add two small headers to group index based vs timestamp based retrieval:

For index based frame retrieval:

get_frames_at
get_frames_in_range

For timestamp based frame retrieval:
...

Copy link

Contributor

Dan-Flores commentedNov 21, 2025

Let's updatedocs/source/index.rst so this tutorial appears on the mainindex.html page (similar tothese changes)

Dan-Flores reviewed

Nov 21, 2025

View reviewed changes

examples/decoding/performance_tips.py

		#
		# - You need bit-exact results
		# - Small resolution videos and the PCI-e transfer latency is large
		# - GPU is already busy and CPU is idle

Copy link

Contributor

Dan-FloresNov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

super nit: this is a personal writing style preference, but within a section lets consistently use either active or passive voice. For example, we could remove "you" from the first bullet point, and instead use the passive voice: "bit-exact results are needed"