Immutable biovecs and biovec iterators¶
Kent Overstreet <kmo@daterainc.com>
As of 3.13, biovecs should never be modified after a bio has been submitted.Instead, we have a new struct bvec_iter which represents a range of a biovec -the iterator will be modified as the bio is completed, not the biovec.
More specifically, old code that needed to partially complete a bio wouldupdate bi_sector and bi_size, and advance bi_idx to the next biovec. If itended up partway through a biovec, it would increment bv_offset and decrementbv_len by the number of bytes completed in that biovec.
In the new scheme of things, everything that must be mutated in order topartially complete a bio is segregated into struct bvec_iter: bi_sector,bi_size and bi_idx have been moved there; and instead of modifying bv_offsetand bv_len, struct bvec_iter has bi_bvec_done, which represents the number ofbytes completed in the current bvec.
There are a bunch of new helper macros for hiding the gory details - inparticular, presenting the illusion of partially completed biovecs so thatnormal code doesn’t have to deal with bi_bvec_done.
Driver code should no longer refer to biovecs directly; we now havebio_iovec() and bio_iter_iovec() macros that return literal struct biovecs,constructed from the raw biovecs but taking into account bi_bvec_done andbi_size.
bio_for_each_segment() has been updated to take a bvec_iter argumentinstead of an integer (that corresponded to bi_idx); for a lot of code theconversion just required changing the types of the arguments tobio_for_each_segment().
Advancing a bvec_iter is done with bio_advance_iter();
bio_advance()is awrapper around bio_advance_iter() that operates on bio->bi_iter, and alsoadvances the bio integrity’s iter if present.There is a lower level advance function - bvec_iter_advance() - which takesa pointer to a biovec, not a bio; this is used by the bio integrity code.
What’s all this get us?¶
Having a real iterator, and making biovecs immutable, has a number ofadvantages:
Before, iterating over bios was very awkward when you weren’t processingexactly one bvec at a time - for example,
bio_copy_data()in block/bio.c,which copies the contents of one bio into another. Because the biovecswouldn’t necessarily be the same size, the old code was tricky convoluted -it had to walk two different bios at the same time, keeping both bi_idx andand offset into the current biovec for each.The new code is much more straightforward - have a look. This sort ofpattern comes up in a lot of places; a lot of drivers were essentially opencoding bvec iterators before, and having common implementation considerablysimplifies a lot of code.
Before, any code that might need to use the biovec after the bio had beencompleted (perhaps to copy the data somewhere else, or perhaps to resubmitit somewhere else if there was an error) had to save the entire bvec array- again, this was being done in a fair number of places.
Biovecs can be shared between multiple bios - a bvec iter can represent anarbitrary range of an existing biovec, both starting and ending midwaythrough biovecs. This is what enables efficient splitting of arbitrarybios. Note that this means we _only_ use bi_size to determine when we’vereached the end of a bio, not bi_vcnt - and the bio_iovec() macro takesbi_size into account when constructing biovecs.
Splitting bios is now much simpler. The old
bio_split()didn’t even work onbios with more than a single bvec! Now, we can efficiently split arbitrarysize bios - because the new bio can share the old bio’s biovec.Care must be taken to ensure the biovec isn’t freed while the split bio isstill using it, in case the original bio completes first, though. Using
bio_chain()when splitting bios helps with this.Submitting partially completed bios is now perfectly fine - this comes upoccasionally in stacking block drivers and various code (e.g. md andbcache) had some ugly workarounds for this.
It used to be the case that submitting a partially completed bio would workfine to _most_ devices, but since accessing the raw bvec array was thenorm, not all drivers would respect bi_idx and those would break. Now,since all drivers _must_ go through the bvec iterator - and have beenaudited to make sure they are - submitting partially completed bios isperfectly fine.
Other implications:¶
Almost all usage of bi_idx is now incorrect and has been removed; instead,where previously you would have used bi_idx you’d now use a bvec_iter,probably passing it to one of the helper macros.
I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), younow use bio_iter_iovec(), which takes a bvec_iter and returns aliteral struct bio_vec - constructed on the fly from the raw biovec buttaking into account bi_bvec_done (and bi_size).
bi_vcnt can’t be trusted or relied upon by driver code - i.e. anything thatdoesn’t actually own the bio. The reason is twofold: firstly, it’s notactually needed for iterating over the bio anymore - we only use bi_size.Secondly, when cloning a bio and reusing (a portion of) the original bio’sbiovec, in order to calculate bi_vcnt for the new bio we’d have to iterateover all the biovecs in the new bio - which is silly as it’s not needed.
So, don’t use bi_vcnt anymore.
The current interface allows the block layer to split bios as needed, so wecould eliminate a lot of complexity particularly in stacked drivers. Codethat creates bios can then create whatever size bios are convenient, andmore importantly stacked drivers don’t have to deal with both their own biosize limitations and the limitations of the underlying devices. Thusthere’s no need to define ->merge_bvec_fn() callbacks for individual blockdrivers.
Usage of helpers:¶
- The following helpers whose names have the suffix of_all can only be usedon non-BIO_CLONED bio. They are usually used by filesystem code. Driversshouldn’t use them because the bio may have been split before it reached thedriver.
bio_for_each_segment_all()bio_for_each_bvec_all()bio_first_bvec_all()bio_first_page_all()bio_last_bvec_all()
The following helpers iterate over single-page segment. The passed ‘structbio_vec’ will contain a single-page IO vector during the iteration:
bio_for_each_segment()bio_for_each_segment_all()
The following helpers iterate over multi-page bvec. The passed ‘structbio_vec’ will contain a multi-page IO vector during the iteration:
bio_for_each_bvec()bio_for_each_bvec_all()rq_for_each_bvec()