This PR replaces the old code generator, the generated parts of the code now use microgenerator. The latter also implies dropping support for Python 2.7 and 3.5.

There are a lot of changes and the transition itself was far from smooth, thus it's probably best to review this commit by commit (I tried to make the commits self-contained with each containing a single change/fix).

Things to focus on in reviews

Regenerating the code overrides some of the URLs in the samples README. Seems like a synthtoolissue.
The clients do not support theclient_config argument anymore. At least the ordering keys feature used that to change the timeout to "infinity". We need to see if that's crucial, and if the same can be set in a different way (maybe through the retry policy...)
SERVICE_ADDRESS and_DEFAULT_SCOPES constants in clients might be obsolete. Let's see if there are more modern alternatives (they are currently still injected into the generated clients).
Regenerating the code out of the box fails, because theBazel tool incorrectly tries to use Python 2, resulting in syntax errors (could be just a problem on my machine, but it is a known Bazelissue).
Workaround:
- Add the snippet from thecomment togoogle/pubsub/v1/BUILD.bazel file in the localgoogleapis clone and point thesynthtool to it:
```
$ export SYNTHTOOL_GOOGLEAPIS=/path/to/googleapis
```
- Patch the localsynthtool installation. Add the following two Bazel arguments to the_generate_code() method insynthtool/gcp/gapic_bazel.py (lines 177-178):
```
"--python_top=//google/pubsub/v1:myruntime","--incompatible_use_python_toolchains=false",
```
The workaround should convince Bazel to use Python 3, as this is the Python version in the configs.

Things left to do

~~Double check that the new client performance is adequate. There have been reports of possibly degraded performance with the new microgenerator.~~ The 10% performance hit was declared acceptable, not a release blocker anymore.
After approvals, re-generate the code one more time to make sure it works without errors (such as a new version of black wanting to re-generate some files and causing the CI check to fail).
Lint samples. It appears that the "fixup_keywords" step in the migration guide made the linter unhappy. 😄
Fix samples and their tests.
Fix system tests.
Determine how to handle methods that are now either missing, e.g.get_iam_policy(), or do not support all config options anymore, e.gcreate_subscription(). Adjust or delete?
~~Determine a replacement for now-unsupportedclient_config argument to client constructor. Or does the code generator need an update?~~client_config has been replaced by custom Retry settings passed to the GAPIC clientpublish(). If we want to support custom retries, we must update the user-facing client'spublish() method.
Add UPGRADING guide to the docs (also depends on the previous point - some of the changes might actually be bugs in the new generator).
Make the samples CI checks required?

PR checklist

Make sure to open an issue as abug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

plamut added the type: processA process-related concern. May include testing, release, or the like. label

Jul 15, 2020

googlebot added the cla: yesThis human has signed the Contributor License Agreement. label

Jul 15, 2020

plamut changed the title~~Transition the library to the new microgenerator~~chore: transition the library to the new microgenerator

Jul 15, 2020

plamut force-pushed theuse-microgenerator branch 4 times, most recently froma18ea36 to1ecc48fCompare

July 16, 2020 10:02

plamut added17 commits

July 16, 2020 13:43

chore: regenerate the library with microgenerator

d218232

Update setup.py for Python 3.6+

529c853

Fix GAPIC types imports

a744caf

Inject SERVICE_ADDRESS and _DEFAULT_SCOPES to generated classes

f416327

Cleanup types.py

256081a

Adjust determining message size to new protobufs

dbf76f0

Adjust Message.publish_time property to new protobuf

1c08fc4

Remove client_config argument to publisher client

8568830

The argument is not supported anymore in the new generated client.

Remove subscriber client config

5a881e5

This configuration is not anymore supported by the generatedsubscriber client.

Fix method patch in gapic instance method test

022e710

Fix transport-related test failures

d44494f

Adjust calls to changed client.publish() signature

19952aa

Remove obsolete replacement rules from synth.py

d8b0a34

Fix streaming pull monkeypatch (no prefetch)

cd8ad7e

Update supported Python versions in README

1253156

Adjust system tests to new method signatures

5a78181

Adjust method calls in samples to new client

2b01cc4

plamut force-pushed theuse-microgenerator branch from1ecc48f to2b01cc4Compare

July 16, 2020 11:44

Copy link

ContributorAuthor

plamut commentedJul 16, 2020

I believe this PR is now ready to be reviewed. The things that still fail are related to incompatible changes in the client itself, for example:

Missing IAM methods -get_iam_policy(),set iam policy(),test_iam_permission()
Missingcreate_subscription() parameters -dead_letter_policy,expiration_policy, etc.
Missingclient_config parameter in publisher client constructor (is there an alternative mechanism?)

We need to figure out how to handle these and I will add theUPGRADING.md file to the docs after all these things have been clarified.

Since this PR is quite complex, every pair of eyes would be beneficial. 🙂

@pradn Please check if ordering keys are affected in any way due to the lack ofclient_config options - they need "infinite" timeouts, right?
@anguillanneuf Any feedback on samples would be great, as you're probably most familiar with them.
@busunkim96 General feedback on the migration process in case I missed something.

Let's not rush and take the time to review this thoroughly, thanks!

plamut marked this pull request as ready for review

July 16, 2020 11:57

plamut requested review fromanguillanneuf andhongalex ascode owners

July 16, 2020 11:57

plamut requested a review frompradn

July 16, 2020 11:57

Copy link

software-dov commentedAug 19, 2020

I don't think I understand the benchmark result very well; are the two different throughput numbers forcps_publisher_task.py andcps_subscriber_task.py respectively?

Also, just had a thought that may be relevant: is the manual layer invoking the asynchronous client or the synchronous? I see a lot of task/future names and semantics, which made me ask. The asynchronous surface has had basically no optimization focus.

Copy link

ContributorAuthor

plamut commentedAug 20, 2020•
edited
Loading

I did some publisher profiling withyappi using roughlythe code the benchmark framework uses and it seems that in the main thread a lot of time (~70%) is spent constructingPubsubMessage instances inside thepublisher.publish() method:

fromgoogle.pubsub_v1importtypesasgapic_types...# Create the Pub/Sub message object.message=gapic_types.PubsubMessage(data=data,ordering_key=ordering_key,attributes=attrs)

FWIW, profiling the current released version (i.e. themaster) branch did now show that piece of code to be problematic.

Is instantiating aPubsubMessage more expensive with the microgenerator and are there ways to construct it faster, e.g. by circumventing any checks/extra logic that might be redundant for this use case?

Copy link

software-dov commentedAug 20, 2020

Yes, constructing messages is more expensive with the microgenerator because of its use ofproto-plus. Depending on whatdata,ordering_key, andattrs (actual params, not formal) are, proto-plus may be doing non-trivial work to marshal python structures to protobuf structures.

Can you send me the raw data in some way? I'd like to take a closer look at which part of construction is expensive.
There's a potential hack to get around this, but it's not really something we can recommend our users do.
It's possible to interact with 'vanilla' proto types in Python and then just shove them into proto-plus types before passing the result into a method call.

E.g.

vanilla_pb=gapic_types.PubsubMessage.pb(data=data,ordering_key=ordering_key,attributes=attrs)proto_plus_pb=gapic_types.PubsubMessage.wrap(vanilla_pb)

Copy link

ContributorAuthor

plamut commentedAug 21, 2020•
edited
Loading

@software-dov

There's a potential hack to get around this, but it's not really something we can recommend our users do.

This actually improved things considerably - the benchmark showed a publisher throughput of ~57 MB/s, which is only ~11% worse than the current stable version. In addition, thePubsubMessage instance is constructed internally (users only pass in the message data and kwargs), meaning that we can actually use it.

Can you send me the raw data in some way?

The following is roughly how the messages look like in the benchmark and my local script for profiling:

data=b"A"*250ordering_key=""attrs= {'clientId':'0','sendTime':'1598046901869','sequenceNumber':'0'}...message=gapic_types.PubsubMessage(data=data,ordering_key=ordering_key,attributes=attrs)

(sequenceNumber monotonically increases)

If using the hack, instantiation is considerably faster, but there is still quite a lot of time spent inMarshal.to_proto() - if we can speed this up further, it would be great.

I'll also see if the same can be used to improve subscriber performance.

I don't think I understand the benchmark result very well; are the two different throughput numbers for cps_publisher_task.py and cps_subscriber_task.py respectively?

Correct. The benchmark framework spins up separate compute instances for publisher and subscriber and then measures how well they perform individually.

Also, just had a thought that may be relevant: is the manual layer invoking the asynchronous client or the synchronous? I see a lot of task/future names and semantics, which made me ask.

It's the synchronous client, i.e.google.pubsub_v1.services.publisher.client.PublisherClient. Those futures are subclasses ofgoogle.api_core.future.Future and are managed manually by the (hand-written) publisher client itself.

Copy link

ContributorAuthor

plamut commentedAug 22, 2020

I have some good news - circumventing the wrapper classes around the raw protobuf messages to speed up instantiation and attribute access seems to help a lot. Running the benchmarks with the two most recent experimental commits produces the following:

INFO-Results for cps-gcloud-python-publisher:1157113 [main] INFO com.google.pubsub.flic.Driver  - Results for cps-gcloud-python-publisher:INFO-50%: 5.0625 - 7.593751157124 [main] INFO com.google.pubsub.flic.Driver  - 50%: 5.0625 - 7.59375INFO-99%: 86.49755859375 - 129.7463378906251157124 [main] INFO com.google.pubsub.flic.Driver  - 99%: 86.49755859375 - 129.746337890625INFO-99.9%: 194.6195068359375 - 291.929260253906251157124 [main] INFO com.google.pubsub.flic.Driver  - 99.9%: 194.6195068359375 - 291.92926025390625INFO-Average throughput: 57.02 MB/s1157126 [main] INFO com.google.pubsub.flic.Driver  - Average throughput: 57.02 MB/sINFO-Results for cps-gcloud-python-subscriber:1157126 [main] INFO com.google.pubsub.flic.Driver  - Results for cps-gcloud-python-subscriber:INFO-50%: 2216.8378200531006 - 3325.2567300796511157127 [main] INFO com.google.pubsub.flic.Driver  - 50%: 2216.8378200531006 - 3325.256730079651INFO-99%: 16834.112196028233 - 25251.168294042351157127 [main] INFO com.google.pubsub.flic.Driver  - 99%: 16834.112196028233 - 25251.16829404235INFO-99.9%: 25251.16829404235 - 37876.752441063521157128 [main] INFO com.google.pubsub.flic.Driver  - 99.9%: 25251.16829404235 - 37876.75244106352INFO-Average throughput: 57.83 MB/s1157128 [main] INFO com.google.pubsub.flic.Driver  - Average throughput: 57.83 MB/s

The performance hit is now only around 10% (publisher ~12%, subscriber ~8%), which might actually be acceptable.

Optimize creating and accesing pubsub messages

c29d7f8

Profiling shows that the speed of creating a new pubsub message andthe speed of accessing the message's attributes significantly affectsthe throughput of publisher and subscriber.This commit makes everything faster by circumventing the wrapper classaround the raw protobuf pubsub messages where possible.

plamut force-pushed theuse-microgenerator branch from30dce66 toc29d7f8Compare

August 22, 2020 11:20

Copy link

software-dov commentedAug 24, 2020

Optimizingmarshal.to_proto is a mid-low priority ongoing effort. The good news is that any changes made to proto-plus will be available to all users for free and transparently, so if the new throughput numbers are acceptable they may eventually get better without having to cut a new release.
The not so good news is that I don't thinkto_proto is likely to get much faster in the general case. There may be specific instances we can optimize, or some crazy fast-path shenanigans, but the code there is doing nonzero work in order to facilitate a more flexible and ergonomic interface. The downside of proto-plus is that its user visible benefits come with a performance cost.

software-dov approved these changes

Aug 24, 2020

View reviewed changes

Copy link

software-dov commentedAug 24, 2020

My bad, when I said "the raw data" above, I meant the profiling data. I don't particularly want to set up the benchmark environment, but I do want to try looking through the data to see what the hot spots are and if they could be optimized.

Copy link

software-dov commentedAug 24, 2020

I have a special, optimized-but-not-reviewed-or-production-ready tarball of proto-plus. Is it possible to run the benchmark suite using it? My own benchmarking has these changes saving about 10-15 milliseconds out of about 500 milliseconds. YMMV, but if it improves the throughput enough it could be worth putting the proto-plus changes up for review.
proto-plus-1.7.1.tar.gz

Copy link

ContributorAuthor

plamut commentedAug 25, 2020

I have a special, optimized-but-not-reviewed-or-production-ready tarball of proto-plus. Is it possible to run the benchmark suite using it?

Ifpip can access and install it, we can specify this version as a dependency in the framework'srequirements.txt, should be doable.

Just to clarify, would you like to run the benchmark using the tip of this PR branch, i.e. the version that already includes the optimizations that try to bypass the protobuf buffer classes? Or on the version that excludes this last commit and uses the wrapper classes more heavily?

My bad, when I said "the raw data" above, I meant the profiling data.

No worries, sent you an email with the data just now.

Copy link

software-dov commentedAug 25, 2020

The tip of this PR branch.

Changes are also available in this branch of my fork:https://github.com/software-dov/proto-plus-python/tree/hackaday

Copy link

ContributorAuthor

plamut commentedAug 26, 2020•
edited
Loading

First run, using proto-plus 1.7.1 on the tip of this PR branch. The results seem more or less similar to the previous benchmark:

INFO-Results for cps-gcloud-python-publisher:1151295 [main] INFO com.google.pubsub.flic.Driver  - Results for cps-gcloud-python-publisher:INFO-50%: 5.0625 - 7.593751151304 [main] INFO com.google.pubsub.flic.Driver  - 50%: 5.0625 - 7.59375INFO-99%: 86.49755859375 - 129.7463378906251151304 [main] INFO com.google.pubsub.flic.Driver  - 99%: 86.49755859375 - 129.746337890625INFO-99.9%: 194.6195068359375 - 291.929260253906251151304 [main] INFO com.google.pubsub.flic.Driver  - 99.9%: 194.6195068359375 - 291.92926025390625INFO-Average throughput: 56.58 MB/s1151306 [main] INFO com.google.pubsub.flic.Driver  - Average throughput: 56.58 MB/sINFO-Results for cps-gcloud-python-subscriber:1151306 [main] INFO com.google.pubsub.flic.Driver  - Results for cps-gcloud-python-subscriber:INFO-50%: 1477.8918800354004 - 2216.83782005310061151306 [main] INFO com.google.pubsub.flic.Driver  - 50%: 1477.8918800354004 - 2216.8378200531006INFO-99%: 25251.16829404235 - 37876.752441063521151307 [main] INFO com.google.pubsub.flic.Driver  - 99%: 25251.16829404235 - 37876.75244106352INFO-99.9%: 37876.75244106352 - 56815.1286615952851151307 [main] INFO com.google.pubsub.flic.Driver  - 99.9%: 37876.75244106352 - 56815.128661595285INFO-Average throughput: 57.37 MB/s1151307 [main] INFO com.google.pubsub.flic.Driver  - Average throughput: 57.37 MB/s

I also did another run, the measured throughput was around 55 MB/s - a bit worse, but within the normal results variance (the typical difference appears to be somewhere in the 1-2 MB/s range).

Copy link

software-dov commentedAug 26, 2020

Okay. In light of those numbers I'm inclined not to merge the proto-plus changes. Is there an approval process for accepting the throughput regression?

Copy link

ContributorAuthor

plamut commentedAug 26, 2020•
edited
Loading

Please just mind that that the proto-plus optimizations might not have a noticeable effect here, as the optimizations added here actively try to circumvent proto-plus, but other libraries might still see benefits.

I'm not aware of any formal processes, but we do have weekly PubSub meetings on Thursdays where we discuss these things. I added the question to the agenda whether a -10% performance hit is acceptable in the new major release.

Update: Was confirmed, -10% is good enough for now, although we should strive to improve this further in the mid-term.

hongalex approved these changes

Aug 27, 2020

View reviewed changes

plamut requested a review fromcguardia

August 28, 2020 10:02

plamut mentioned this pull request

Aug 31, 2020

fix: Expose types.PublisherOptions#167

Closed

4 tasks

cguardia approved these changes

Sep 7, 2020

View reviewed changes

Copy link

Contributor

cguardia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This looks good to me. Sorry this took so long. Had to go commit by commit.

Copy link

ContributorAuthor

plamut commentedSep 7, 2020

@cguardia That's fine, appreciated. I will re-generate and benchmark the code again now, just in case, and see if we are ready for the major release.

Regenerate the code with the latest changes

23e1612

Copy link

ContributorAuthor

plamut commentedSep 7, 2020•
edited
Loading

After re-generating the code yet again, performance still appears to be in line with the previous benchmarks:

INFO-Results for cps-gcloud-python-publisher:1095535 [main] INFO com.google.pubsub.flic.Driver  - Results for cps-gcloud-python-publisher:INFO-50%: 5.0625 - 7.593751095545 [main] INFO com.google.pubsub.flic.Driver  - 50%: 5.0625 - 7.59375INFO-99%: 86.49755859375 - 129.7463378906251095546 [main] INFO com.google.pubsub.flic.Driver  - 99%: 86.49755859375 - 129.746337890625INFO-99.9%: 194.6195068359375 - 291.929260253906251095546 [main] INFO com.google.pubsub.flic.Driver  - 99.9%: 194.6195068359375 - 291.92926025390625INFO-Average throughput: 56.04 MB/s1095547 [main] INFO com.google.pubsub.flic.Driver  - Average throughput: 56.04 MB/sINFO-Results for cps-gcloud-python-subscriber:1095547 [main] INFO com.google.pubsub.flic.Driver  - Results for cps-gcloud-python-subscriber:INFO-50%: 194.6195068359375 - 291.929260253906251095549 [main] INFO com.google.pubsub.flic.Driver  - 50%: 194.6195068359375 - 291.92926025390625INFO-99%: 16834.112196028233 - 25251.168294042351095549 [main] INFO com.google.pubsub.flic.Driver  - 99%: 16834.112196028233 - 25251.16829404235INFO-99.9%: 37876.75244106352 - 56815.1286615952851095550 [main] INFO com.google.pubsub.flic.Driver  - 99.9%: 37876.75244106352 - 56815.128661595285INFO-Average throughput: 56.33 MB/s1095550 [main] INFO com.google.pubsub.flic.Driver  - Average throughput: 56.33 MB/s

Merging. 🎉

Edit: Ah,@kamalaboulhosn expressed a wish to review theUPGRADING guide, will wait with merging a bit more.