- Notifications
You must be signed in to change notification settings - Fork7k
Open
Description
🚀 The feature
Torchvision'sread_image
currently decodes JPEG images at full resolution. However, bothlibjpeg
andlibjpeg-turbo
support decoding at lower resolutions (1/2, 1/4, 1/8 of the original size).
Introducing asize_hint
parameter would allow users to specify an approximate target size, withtorchvision
selecting the closest larger available scale factor and downscale the JPEG image during decoding.
Example Usage:
fromtorchvision.io.imageimportdecode_imagetensor=decode_image("image.jpeg",size_hint=(224,224))
Motivation, pitch
- Many ML pipelines process images at fixed sizes (e.g., 224x224 for ImageNet models). Decoding large images only to downscale them later is inefficient.
- This can improve memory usage as we do not need to hold the full-sized image in the memory.
- Pillow provides a similar feature via
Image.draft
, allowing for approximate size-based decoding.
Alternatives
- Using Pillow for decoding with downscaling, but torchvision’s native decoder is typically faster than decoding using Pillow and then converting to tensor.
- Decode and then resize, but this is inefficient, see benchmark below.
Additional context
Benchmark
We implemented a proof-of-concept and ran performance tests on decoding a 1920x1080 image into 960x540.
We compared the following:
- Use existing
decode_jpeg
and resize after. - Patch
decode_jpeg
to allowlibjpeg
/libjpeg-turbo
downscaling via thesize_hint
parameters.
Benchmark results (1000 iters):
9.91s call .../test_jpeg.py::test_torchvision_image_load_with_resize_960_5404.00s call .../test_jpeg.py::test_fastjpeg_image_load_with_size_hint_960_540
~2.5X speed up.
I'm happy to contribute a patch if people consider this useful.
Metadata
Metadata
Assignees
Labels
No labels