Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Comments

feat: BlobWriteChannelV2 - same throughput less GC#2110

Merged
BenWhitehead merged 3 commits intomainfrom
blob-write-channel-v2
Jul 13, 2023
Merged

feat: BlobWriteChannelV2 - same throughput less GC#2110
BenWhitehead merged 3 commits intomainfrom
blob-write-channel-v2

Conversation

@BenWhitehead
Copy link
Collaborator

@BenWhiteheadBenWhitehead commentedJul 11, 2023
edited
Loading

Use stable buffer allocation with laziness.

Leverage new JsonResumableSession to provide more robustness and easier separation of concerns compared to BlobWriteChannel

  • rename blobWriteChannel.ser.properties to the correct blobReadChannel.ser.properties

Runtime improvments

Throughput is on par with the existing v1 implementation, however GC impact has been lightened with the new implementation.

Below is the summary of the GC improvement between v1 and v2.

These GC numbers were collected while uploading 4096 randomly sized objects, from 128KiB..2GiB across 16 concurrent threads, using a default chunkSize of 16MiB.

metricunitv1v2% decrease
gc.alloc.rateMB/sec2240.0561457.73134.924
gc.alloc.rate.normB/op95579672621763840373050733.207
gc.churn.G1_Eden_SpaceMB/sec1597.0091454.3048.936
gc.churn.G1_Eden_Space.normB/op6814184243206369029652486.533
gc.churn.G1_Old_GenMB/sec691.87711.31698.364
gc.churn.G1_Old_Gen.normB/op295213237398495594433198.321
gc.churn.G1_Survivor_SpaceMB/sec0.0040.00250.000
gc.churn.G1_Survivor_Space.normB/op157286478643250.000
gc.countcounts1670131921.018
gc.timems15936952740.217

Overall allocation rate is decreased, while Old_Gen use is almost entirely eliminated.

openjdk version "11.0.18" 2023-01-17OpenJDK Runtime Environment (build 11.0.18+10-post-Debian-1deb11u1)OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Debian-1deb11u1, mixed mode, sharing)-Xms12g -Xmx12g

All other java parameters are defaults.

@BenWhiteheadBenWhitehead added do not mergeIndicates a pull request not ready for merge, due to either quality or timing. owlbot:ignoreinstruct owl-bot to ignore a PR labelsJul 11, 2023
@BenWhiteheadBenWhitehead requested a review froma team as acode ownerJuly 11, 2023 22:55
@BenWhiteheadBenWhitehead requested a review froma teamJuly 11, 2023 22:55
@product-auto-labelproduct-auto-labelbot added size: xlPull request size is extra large. api: storageIssues related to the googleapis/java-storage API. labelsJul 11, 2023
public BlobWriteChannel writer(URL signedURL) {
public StorageWriteChannel writer(URL signedURL) {
// TODO: is it possible to know if a signed url is configured to have a constraint which makes
// it idempotent?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The signed URL in this case creates a resumable upload session which are idempotent per session.

Creating a signed URL that can only be once would requirehttps://cloud.google.com/storage/docs/xml-api/reference-headers#xgoogifgenerationmatch with value 0 but it's possible that a session is created but not completed when this preconditions is checked which can't be helped.

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If I'm understanding correctly, that if-generation-match header would be part of the signed URL and not something our client could be aware of. Meaning, our client doesn't have visibility into any precondition which might apply to the request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

oof, this is an interesting case; if a user generates a v4 signed url and adds this header as signed then our client tries to create a resumable upload; it will fail without this header in the request with the expected value.

Our client code could extract the header/values pairs from the signed URL, but sounds like a feature request.

ByteRangeSpec rangeSpec = ByteRangeSpec.explicit(cumulativeByteCount, newFinalByteOffset);
if (available % ByteSizeConstants._256KiB == 0) {
header = HttpContentRange.of(rangeSpec);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Wouldn't this cause issues for customers that write data as it comes in (might be less than 256KiB) so buffer would be filled over multiple writes to perform a final flush on falling out of writer scope or explicitly flushing or close()?

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Potentially in standalone yet, however customers are not interacting with this class directly themselves.

We have the following pipeline to usage of this class (bash pipeline shorthand):

BlobWriteChannelV2 | BaseStorageWriteChannel | DefaultBufferedWritableByteChannel | ApiaryUnbufferedWritableByteChannel                     ^-- buffer allocation and alignment happens here

frankyn reacted with thumbs up emoji
byteBuffers(100, 1_000),
byteBuffers(1_000, 10_000),
byteBuffers(10_000, 100_000),
byteBuffers(100_000, 1_000_000)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pinged you through chat; but what's the rationale for using multiple buffers versus one? A few reasons come to mind (easier to allocate smaller contiguous memory buffers, individual buffers can be passed as-is to transport writes); and wondering if there are others.

Copy link
CollaboratorAuthor

@BenWhiteheadBenWhiteheadJul 12, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One big one for us specifically, is from the buffered to unbuffered layer, this enables the ability to reduce (and in some cases completely remove) copies between theByteBuffer provided to ourwrite method, and any underlying location it needs to be propagated to.

A codified test case that illustrates this minimized copying can be seen here:

* 0 61
* data: |------------------------------------------------------------|
* 16 (16) 32 (16) 48 (13) 61
* writes: |---------------|---------------|---------------|------------|
* 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
* flushes: |--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--|--||

In that test flushes will happen every 3 bytes, where we're doing 16 byte writes. In order to flush, we don't need to copy into our buffer, we can simply take a 3 byte slice of the providedByteBuffer and flush that. Then if there is any outstanding bytes they are enqueued. Then the next timewrite is called, in a single call we can flush our buffer and the slice from the newByteBuffer.

frankyn reacted with thumbs up emoji
Base automatically changed fromput-file tomainJuly 13, 2023 20:06
Use stable buffer allocation with laziness.Leverage new JsonResumableSession to provide more robustness and easierseparation of concerns compared to BlobWriteChannel* rename blobWriteChannel.ser.properties to the correct blobReadChannel.ser.properties### Runtime improvmentsThroughput is on par with the existing v1 implementation, however GCimpact has been lightened with the new implementation.Below is the summary of the GC improvement between v1 and v2.These GC numbers were collected while uploading 4096 randomly sizedobjects, from 128KiB..2GiB across 16 concurrent threads, using a defaultchunkSize of 16MiB.| metric                          | unit   |           v1 |           v2 | % decrease ||---------------------------------|--------|-------------:|-------------:|-----------:|| gc.alloc.rate                   | MB/sec |     2240.056 |     1457.731 |     34.924 || gc.alloc.rate.norm              | B/op   | 955796726217 | 638403730507 |     33.207 || gc.churn.G1_Eden_Space          | MB/sec |     1597.009 |     1454.304 |      8.936 || gc.churn.G1_Eden_Space.norm     | B/op   | 681418424320 | 636902965248 |      6.533 || gc.churn.G1_Old_Gen             | MB/sec |      691.877 |       11.316 |     98.364 || gc.churn.G1_Old_Gen.norm        | B/op   | 295213237398 |   4955944331 |     98.321 || gc.churn.G1_Survivor_Space      | MB/sec |        0.004 |        0.002 |     50.000 || gc.churn.G1_Survivor_Space.norm | B/op   |      1572864 |       786432 |     50.000 || gc.count                        | counts |         1670 |         1319 |     21.018 || gc.time                         | ms     |        15936 |         9527 |     40.217 |Overall allocation rate is decreased, while Old_Gen use is almost entirely eliminated.```openjdk version "11.0.18" 2023-01-17OpenJDK Runtime Environment (build 11.0.18+10-post-Debian-1deb11u1)OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Debian-1deb11u1, mixedmode, sharing)-Xms12g -Xmx12g```All other java parameters are defaults.
@BenWhiteheadBenWhitehead removed the do not mergeIndicates a pull request not ready for merge, due to either quality or timing. labelJul 13, 2023
@BenWhiteheadBenWhitehead merged commit1b52a10 intomainJul 13, 2023
@BenWhiteheadBenWhitehead deleted the blob-write-channel-v2 branchJuly 13, 2023 23:20
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

1 more reviewer

@frankynfrankynfrankyn approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

api: storageIssues related to the googleapis/java-storage API.owlbot:ignoreinstruct owl-bot to ignore a PRsize: xlPull request size is extra large.

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@BenWhitehead@frankyn

[8]ページ先頭

©2009-2026 Movatter.jp