Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Decoder metadata interface#2672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
197g wants to merge10 commits intomain
base:main
Choose a base branch
Loading
fromdecoder-metadata-interface
Draft

Conversation

@197g
Copy link
Member

@197g197g commentedNov 26, 2025
edited
Loading

See#2245, the intendedImageDecoder changes.

This changes theImageDecoder trait to fix some underlying issues. The main change is a clarification to the responsibilities; the trait is an interface from an implementortowards theimage library. That is, the protocol established from its interface should allowus to drive the decoder into our buffers and our metadata. It isnot optimized to be used by an external caller which should prefer the use ofImageReader and other inherent methods instead.

This is a work-in-progress, below motivates the changes and discusses open points.

  • ImageDecoder::peek_layout encourages decoders to read headers after the constructor. This fixes the inherent problem we had with communicating limits. The sequences for internal use is roughly:
    letmut decoder = …;decoder.set_limits();// Other global configuration we have?{// Potentially multiple times:let layout_info = decoder.peek_layout()?;letmut buffer =allocate_for(&layout_info);  decoder.read_image(&mut buffer)?;  decoder.xmp_metadata()?;// and other meta}// … for sequences, start again from `peek_layout()`
  • ImageDecoder::read_image(&mut self) no longer consumesself. We no longer need the additionalboxed method and its trait work around, the trait is now dyn-compatible.

Discussion

  • Implementnext_layout more consistently
    • avif
    • tga
    • pnm
    • tiff
    • dxt
    • qoi
    • dds
    • gif
  • Maybeinitpeek_layout should return the full layout information in a single struct. We have a similar open issue forpng in its own crate, and the related work fortiff is in the pipeline where itsBufferLayoutPreference already exists to be extended with said information.
    • Review limits and remove its size bounds insofar as they can be checked against the communicated bounds in the metadata step by theimage side.
    • Idea: If a decoder supports builtin transforms (e.g. YCbCr -> Rgb conversion, grayscale, thumbnailing) that are more efficient than post-processing then there could be a negotiation phase here where information is polled twice / multiple times by different methods. The design should leave this negative space to be added in1.1, but it's not highly critical.
  • Fix the sequence decoder to use the new API
  • Make sure thatread_image is 'destructive' in all decoders, i.e. re-reading an image and reading an image beforeinit should never access an incorrect part of the underlying stream but instead return an error. Affects pnm and qoi for instance where the read will interpret bytes based on the dimensions and color, which would be invalid before reading the header and only valid forone read.
  • Tests for reading an image withread_image then switching to a sequence reader. But that is supposed to become mainly an adapter that implements the iterator protocol.
  • Remove remnants of the dyn-compatibility issue.
  • Adapt to the possibility of fetching metadata after the image. This includes changingImageReader with a new interface to return some of it. That may be better suited for a separate PR though.
    • Extract the CICP part of the metadata asCicpRgb and apply it to a decodedDynamicImage.
    • Ensure that this is supported by all the bindings.
  • Deal with limits:Decoder metadata interface #2672 (comment)

RunDevelopment reacted with thumbs up emoji
@mstoeckl
Copy link
Contributor

mstoeckl commentedNov 27, 2025
edited
Loading

The main change is a clarification to the responsibilities; the trait is an interface from an implementor towards the image library. That is, the protocol established from its interface should allow us to drive the decoder into our buffers and our metadata. It is not optimized to be used by an external caller which should prefer the use of ImageReader and other inherent methods instead.

With this framing, I think Limits::max_image_width and Limits::max_image_height no longer need to be communicated to or handled by the ImageDecoder trait, because the external code can check ImageDecoder::dimensions() before invoking ImageDecoder::read_image(); only the memory limit (Limits::max_alloc) is essential. That being said, the current way Limits are handled by ImageDecoder isn't that awkward to implement, so to reduce migration costs keeping the current ImageDecoder::set_limits() API may be OK.

197g reacted with thumbs up emoji

@fintelia
Copy link
Contributor

A couple thoughts...

I do like the idea of handling animation decoding with this same trait. To understand, are you thinking of "sequences" as being animations or also stuff like the multiple images stored in a TIFF file? Even just handling animation has some tricky cases though. For instance in PNG, the default image that you get if you treat the image as non-animated may be different from the first frame of the animation. We might need both aread_image and aread_frame method.

The addition of aninit method doesn't seem like it gains us much. The tricky part of our currentnew+set_limits API is that you get to look at the image dimensions and total output size in bytes when deciding what decoding limits to set. Requiringinit (and by extensionset_limits) to be called before reading the dimensions makes it basically the same as just having awith_limits constructor.

@197g
Copy link
MemberAuthor

Requiring init (and by extension set_limits) to be called before reading the dimensions makes it basically the same as just having a with_limits constructor.

It's a dyn-compatible way that achieves the goal of the constructor so it is actually an abstraction.

The tricky part of our current new+set_limits API is that you get to look at the image dimensions and total output size in bytes when deciding what decoding limits to set.

What do you by this? The main problem inpng that I'm aware of is the lack of configured limits for reading the header in theImageReader path, that motivated the extra constructor in the first place. Withpng we can not modify the limits after the fact but also we don't really perform any large size-dependent allocation within the decoder.

I'm also not suggesting that callingset_limits after the layout inspection would be disallowed but obviously is decoder dependent on whether that 'frees' additional capacity. I guess if that is sufficient remains to be seen? When we allocate a buffer (with applied allocator limits)´ that allows forwarding the remaining buffer size to the decoder. Or, set aside a different buffer allowance for metadata vs. image data. Whatever change is necessary inpng just comes on top anyways, theinit flow just allows us to abstract this and thus apply it with an existingBox<dyn ImageDecoder> so we don't have to do it all before. Indeed, as the comment on size eludes to we may want to different limit structs: one user facing that we use inImageReader and one binding-facing that is passed toImageDecoder::set_limits. Then settings just need to be

@197g
Copy link
MemberAuthor

197g commentedDec 4, 2025

@fintelia This now includes the other changes including toImageReader as a draft of what I meant in#2679 (comment). In short:

  • The file guessing routines and theconstruction of the boxed decoder are split into a separate type,ImageFile, which provides the previous methods ofImageReader but also aninto_reader for the mutable, stateful interface.
  • ImageReader:
    • features all accessors for metadata considering that some formats fill said metadata (or a pointer to it) after an image is decoded.
    • hasviewbox as a re-imagining of the previousImageDecoderRect trait but split into two responsibilites: the trait does theefficiency decision on an image-by-image basis with an interface that allows a partial application of the viewbox (in jpeg and tiff we would decode whole tiles); then the reader takes care of translating that into an exact layout. Note that another type of image buffer with offset+rowpitch information could do that adjustment zerocopy—I still want to get those benefits of the type erased buffer/image-canvas someday and this fits in.
  • The code also retrieves the CICP from the color profile and annotates theDynamicImage with it where available. For sanity's sake themoxcms integration was rewritten to allow a smaller dependency to be used here, I'll split these off the PR if we decide to go that route.
  • Conceivably there's again_map (or similar) that may be queried similar to the metadata methods. For that to be more ergonomic I'd like to seriously considerread_plane for, in tiff lingo, planar images as well as associated and non-associated mask data; and more speculatively other extra samples that are bump maps? uv? true cmyk?. While that does not necessarily all go into1.* for any output that is not quite neatly statically sorted and sized as anRgba 4-channel-homogeneous-host-order, I imagine it will bemuch simpler for a decoder to provides its data successively in multiple calls instead of a contiguous large byte slice. Similar toviewbox we'd allow this whereImageReader provides the compatibility to re-layout the image for the actual user—except where explicitly instructed. AdjustingImageReader::decode to that effect should be no problem in principle.

@RunDevelopment
Copy link
Contributor

I can't speak about image metadata, but I really don't like the newImageDecoder interface as both an implementor of the interface and a potential user of it. Right now, it's just not clear to me at all how decoders should behave. My problems are:

  1. init. This is just two-phase initialization and opens so many questions.
    • Do usershave to call it? The docs say "should" not "must" be called beforeread_image.
    • Are users allowed to call it multiple times? If so, the decoder has to keep track of whether the header has already been read.
    • Sinceinit returns a layout, what's the point ofdimensions() andcolor_type()? And what if they disagree?
    • What shoulddimensions and co do beforeinit is called?
    • Ifinit fails, what should happen to methods likedimensions andread_image? When called, should they panic, return an error, return default values?
    • After callingread_image, do you have to re-init before callingread_image again?
  2. viewbox makes it more difficult to implement decoders.
    • Now they always have to internally keep track of the viewbox rect if rect decoding is supported.
    • After callingviewbox, what shoulddimensions be? If they should be the viewbox size, should they reflect the new viewbox even before callinginit?
    • It's not clear what should happen ifviewbox returns ok, butinit errors.
    • What should happen if users supply a viewbox outside the bounds of the image?
  3. When callingviewbox, is the offset of the rect relative to the (0,0) of the full image or the last set viewbox?
  4. What should happen ifread_image is called twice? Should it read the same image again, error, read the next image in the sequence? The docs don't say.
    • If the intended behavior is "read the same image again", then those semantics would force all decoders to requireSeek for the reader (or keep an in memory copy of the image for subsequent reads). Not an unreasonable requirement, but it should be explicitly documented.

Regarding rectangle decoding, I think it would be better if we force decoders to support arbitrary rects. That's because the current interface is actually less efficient by allowing decoder to support only certain rects. To read a specific rect that is not supported as is,ImageReader has to read a too-large rect and then crop the read image, allocating the memory for the too-large image only to throw it away. It isforced to do this, because of the API.

However, most image formats are based on lines of block (macro pixels). So we can do a trick. Decode a line according to the too-large rect, and then only copy the pixels in the real rect to the output buffer. This reduces the memory overhead for unsupported rects fromO(width*height) toO(width*block_height). Supported rects don't need this dance and can decode into the output buffer directly. I.e. that's kinda what DDS does.

And if a format can't do the line-based trick for unsupported rects, then decoders should just allocate a temp buffer for the too-large rect and then crop (=copy what is needed). This is still just as efficient as the bestImageReader can do.

For use cases where users can use rowpitch to ignore the exccess parts of the too-large rect, we could just have a method that gives back a preferred rect, which can be decoded very efficiently.

So the API could look like this:

traitImageDecoder{// .../// Returns a viewbox that contains all pixels of the given rect but can potentially be decoded more efficiently./// If rect decoding is not supported or no more-efficient rect exists, the given rect is returned as is.fnpreferred_viewbox(&self,viewbox:Rect) ->Rect{        viewbox// default impl}fnread_image_rect(&mutself,buf,viewbox) ->ImageResult{Err(ImageError::Decoding(Decoding::RectDecodingNotSupported))// or similar}

This API should make rect decoding easier to use, easier to implement, and allow for more efficient implementations.

@197g197gforce-pushed thedecoder-metadata-interface branch from86c9194 tocdc0363CompareDecember 7, 2025 18:22
@197g
Copy link
MemberAuthor

197g commentedDec 7, 2025

  1. init. This is just two-phase initialization and opens so many questions.

That was one of the open questions, the argument you're presenting makes it clear it should return the layout and that's it. Renamed tonext_layout accordingly. I'd like toremove the existingdimensions()/color_type methods from the trait as well. There's no point using separate method calls for communicating them.


  • For use cases where users can use rowpitch, […]

    That is ultimately the crux of the problem. I'd say it's pretty much the only problem even though that does not appreciate the complexity. A lot of what you put forth is overly specific to solving one instance of it, obviously focusing on DDS. That's not bad but take a step back to the larger picture. There's no good way to communicate all kinds of layouts that the callercould handle: tiled, planar, depths, sample types …. With the information being exchanged right now, no-one can find a best-match between the requirements ofimage's data types (andLimits) and what the decoder can provide. This won't be solved by moving complexity into the decoders, we need to get structured information out of them primarily, then make that decision / handling the resulting byte data inimage's code.

    1. viewbox makes it more difficult to implement decoders.

    The point of the default implementation in this PR is that it is purely opt-in. Don't implement the method for decoders that can not provide viewbox decoding and everything workscorrectly. The documentation seems to be confusing, point taken. We're always going to have inefficiencies, I'm for working through thedistinct alternative layouts that allow an optimization one-by-one. More importantly for this PR immediately is what outcome a caller may want and what interface would give it to them—in this case I've worked on the use-case of extracting part of an atlas.

  • However, most image formats are based on lines of block (macro pixels). So we can do a trick.

    I'mnot designing anything in this interface around a singular "trick", that's the wrong way around. That is how we got here. That's precisely what createdImageDecoderRect, almost to the dot. Falsehoods programmer's assume about image decoding will lead that to this breaking down and to be horrible to maintain. The trick you mention should live in the decoder's trait impl and nowhere else and we can bring it back where appropriate and possible. (Note that if you do it for a specific format, some formats will be even more efficient and not require you decode anything line-by-line but skip ahead, do tiles, … That's just to drive home the point that you donot want to do this above the decoder abstraction but below it in theImageDecoder impl).

  • It is forced to do this, because of the API.

    The decoder impls is onlyforced to do anything if we force it via an interface—this PR does not;read_image_rect(&mut self, buf, viewbox) does force a decoder to be able to handle all possible viewboxes—this PR does not. I'm definitely taking worse short-term efficiency over code maintenance problems—the latter won't get us efficiency in the long run either.


When calling viewbox, is the offset of the rect relative to the (0,0) of the full image or the last set viewbox?

It's suppose to be to the full image. Yeah, that needs more documentation and pointers to the proper implementation.

RunDevelopment reacted with thumbs up emoji

self.limits.check_dimensions(layout.width, layout.height)?;
// Check that we do not allocate a bigger buffer than we are allowed to
// FIXME: should this rather go in `DynamicImage::from_decoder` somehow?
self.limits.reserve(layout.total_bytes())?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This has a subtle behavior change. Previously we reserved the space for the output buffer andthen passed the reduced memory limit onto the decoder to limit its metadata/scratch buffer allocations. But at this point in the code we've already told the decoder that it is free to use the entire memory limit amount for those purposes. So we effectively only limiting memory to 2x the requested amount.

Copy link
MemberAuthor

@197g197gDec 8, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's true. The quicker fix here is to pass limits only once we have gotten the layout which would be closest to what we're doing right now on the main branch but obviously not really address if decoders are able to handle it. Ingif we .. duplicate the allocation arbitrarily, right now, too:

image/src/codecs/gif.rs

Lines 172 to 174 in7f44147

self.limits.reserve_usize(buffer_size)?;
letmut frame_buffer =vec![0; buffer_size];
self.limits.free_usize(buffer_size);

We must rework the limits system regardless if it is to work with animations, too. That is to say at some point the struct will have been moved into the decoder but some higher level part of IO is still going to do allocations; but also de-allocations (!). Should those count? How do we access the limits if so?

For consuming animations you'd typically have a swapchain-like fixed capacity queue setup. The decoder would consume some of the limits while allocating for each new frame, then those limits should get refreshed over time when the frames have been fully consumed and dropped.. We can't fully handle this either right now (as seen in gif).

/// methods than [`ImageFile`] and under the hood dispatches into the set of supported
/// [`ImageDecoder`] implementations. For decoder interface that are provided for efficiency it
/// negotiates support with the underlying decoder and then emulates them if necessary.
pubstructImageReader<'lt>{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I like having two structs, though it would be nice to pick names that make the connection clearer. PerhapsImageReader /ImageReaderOptions orImageFile /ImageFileBuilder or similar? We could make it a bit less verbose by adding constructors directly on the second struct for the common operations:

implImageReader{fnnew<R>(reader:R) ->ImageResult<Self>{ImageReaderOptions::from_reader(reader).with_guessed_format().load()}fnopen<P>(path:P) ->ImageResult<Self>{ImageReaderOptions::from_path(path).load()}}

RunDevelopment reacted with thumbs up emoji
}
}

implImageReader<'_>{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: it is easier for scrolling through the file if the impl blocks for each struct immediately follow the definitions

/// The decoder should attempt to restrict the amount of decoder image data to the indicated
/// box. It should return `Ok` if it supports the viewbox directly, or `Err` if it does not
/// support that exact viewbox. In the either case the decoder must indicate the new image
/// dimensions in subsequent calls to [`Self::init`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Which formats do we expect to be able to implement viewboxes?

TIFF should be able to do it by only reading the necessarily strips/tiles. And DDS can because GPU compressed formats are designed for random access. But for other compressed formats, I think it might be rather annoying?

Copy link
MemberAuthor

@197g197gDec 8, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Evenjpeg could profit. We'll need to be more precise in the description but I had intended that, in either return case, thevalues outside the initially indicated viewbox will be allowed to be treated as 'dont-care' by the decoder in their original (or only partially cropped) layout. For these pixels the decoder not need present the actual underlying image data.

With some integration that would allow wins beyond just a binary decoding decision. Some form of integration would be possible for:

  • all the PAM varieties, bmp, tga save on IO with full support; for small viewboxes if weSeek/read_at
  • injpeg we need not process the data from any of the MCU's that do not intersect the data even if the bitstream must be decoded to align with the file
  • avif andwebp seem similar to that where they might avoid processing values (obviously needing integration of it).

For anything non-RGB we get to save on conversion, too. I expect we uncover different forms of gains once we have a better understanding of a declarative form.

self.viewbox =Some(viewbox);
}

/// Get the previously decoded EXIF metadata if any.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The "previously decoded" part here makes me a bit nervous. I think we'll want to be clearer about what the user has to do to make sure they don't getNone for images that actually do have EXIF data

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is a bit odd and will depend on the format and metadatum. For instance, XMP is encoded per-image in tiff but only once in gif (despite us representing this as an image sequence) and also only once in png (no word about APNG). Problematically, in gif and png the standard requires absolutely no ordering with any of the other chunks. So it might be encountered before all of the header information is done; or after all the images have been consumed.

The problem with back references is of course the unclear association. And when multiple are included we always have a problem with consuming them 'after the end' since it should need to be buffered or the decoder able to store seek-back points (like png). Encoding all that in the interface is a challenge, iwill incur some unavoidable complexity.

@RunDevelopment

This comment was marked as outdated.

@197g

This comment was marked as resolved.

@RunDevelopment

This comment was marked as resolved.

@197g197gforce-pushed thedecoder-metadata-interface branch fromcdc0363 to1a114c3CompareDecember 16, 2025 17:40
This was one of the silently broken features when adding the newmethods. While reviewing how to address the underlying cause with aclippy fix it became obvious a few other methods were missing, too.
Change ImageDecoder::read_image to `fn(&mut self)`. There are somecodecs where metadata and other information is only available afterdecoding.Add ImageDecoder::init for codecs where reading the header needs to takeinto account limits and other configuration we are to introduce.
Renames it `peek_layout` to clarify the relation to `read_image`, thatit is to be called potentially multiple times for each such call.Document the trait more with this relationship, moving some methodsaround to clarify.
The latter interface will be something that interfaces and interactswith the underlying decoder. Introduces ImageDecoder::set_viewport as astand-in for what is intended here: color, subsampling, gain mapapplication will take more interaction, clarification of how thelibrary will request their data, and adjustments to the layoutdefinition by the decoder.
The purpose of the trait is to be careful with the boundary to the`moxcms` dependency. We use it because it is a quality implementationbut it is heavy weight for what we need. There's other possible ways toprovide transfer functions and color space transforms. Now this alsointroduces ICC profile parsing but again that could be done with a*much* lighter dependency as we only need basic information from it. Thetrait should make every little additional cross-dependency a consciousdecision.Also it should be the start of a customization point, by feature flag oractually at runtime.
@197g197gforce-pushed thedecoder-metadata-interface branch from1a114c3 to306c6d2CompareDecember 16, 2025 18:40
@197g
Copy link
MemberAuthor

Resolving the naming question aspeek_layout, hopefully satisfactory for now.

No longer responsible for ensuring the size constraints are met underthe new policy and with available of constructing a reader from aninstance of a boxed decoder.
This allows us to write a generic iterator which uses the same decoderfunction to generate a whole sequence of frames. The attributes aredesigned to be extensible to describe changes in available metadata aswell, very concretely some formats require that XMP/ICC/… are polled foreach individual image whereas others have one for the whole file and putthat at the end. So there's no universal sequence for querying themetadata, and we need to hold runtime information. This will be thefocus of the next commit.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@finteliafinteliafintelia left review comments

+1 more reviewer

@RunDevelopmentRunDevelopmentRunDevelopment left review comments

Reviewers whose approvals may not affect merge requirements

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@197g@mstoeckl@fintelia@RunDevelopment

[8]ページ先頭

©2009-2025 Movatter.jp