Posted onApr 6, 2022 • Edited onMay 17, 2023

How to overcome Cloud Run's 32MB request limit

#googlecloud #terraform #python

Cloud Run is an awesome serverless product provided by Google Cloud which is often a perfect fit to run containerized web services. It offers many advantages such as autoscaling, rolling updates, autorestart, scale to 0 to name just a few. All of it without the hassle of provisioning and managing any cluster !

You would definitely pick this product to host, say, a Python Flask Rest API with the following design:

1- Upload a data file by HTTP POST to a REST endpoint
2- Process the file
3- Insert the data into BigQuery using the client lib

Which is perfectly fine... Unless you want to be able to handle data file bigger than 32 MB !

Indeed, Cloud Run won't let you upload such a big file. Instead, you'll get an error message:

413: Request entity too large

Congratulations, you've just hit the hardsize limit of Cloud Run inbound requests.
But don't worry, you can keep using Cloud Run for your service, if you apply the improved design below:

Improved design, with Cloud Storage, Signed Url and PubSub notifications

To work around the limitation, you can design a solution based uponCloud Storage signed urls:

This time, the file is not directly uploaded to the REST endpoint, but uploaded to cloud storage instead, thus bypassing the 32 MB limitation.

The downside of this process is that the client has to make two requests instead of one. Hence, the whole new sequence goes like this:

1- the client requests a signed url to upload to
2- the webservice, using the Cloud Storage client, generates a signed url and returns it to the client
3- the client uploads the file to the Cloud Storage bucket directly (HTTP PUT to the signed url)
4- at the end of the file upload, the notificationOBJECT_FINALIZE is sent to PubSub
5- the notification is then pushed back to the webservice on Cloud Run through a subscription
6- the webservice reacts to the notification by downloading the file
7- the webservice can then process the file, in the exact same way it did it in the original design
8- likewise, data are inserted into BigQuery

This design is entirely serverless and scales neatly, without any single point of failure. Now, let's see in more details how to implement it.

Make a signed url from Cloud Run

Gotcha! It is required for the Cloud Run service to have the roleroles/iam.serviceAccountTokenCreator in order to be able to generate a signed url. It is not really documented, and if you don't grant it, you get a HTTP error 403 without much more information.

This python code, courtesy ofthis blog post by Evan Peterson, exposes how to produce signed urls with the Cloud Run webservice's default service account, without requiring the private key file locally (which is big no-no for security reason !)

fromtypingimportOptionalfromdatetimeimporttimedeltafromgoogleimportauthfromgoogle.auth.transportimportrequestsfromgoogle.cloud.storageimportClientdefmake_signed_upload_url(bucket:str,blob:str,*,exp:Optional[timedelta]=None,content_type="application/octet-stream",min_size=1,max_size=int(1e6)):"""    Compute a GCS signed upload URL without needing a private key file.    Can only be called when a service account is used as the application    default credentials, and when that service account has the proper IAM    roles, like `roles/storage.objectCreator` for the bucket, and    `roles/iam.serviceAccountTokenCreator`.    Source: https://stackoverflow.com/a/64245028    Parameters    ----------    bucket : str        Name of the GCS bucket the signed URL will reference.    blob : str        Name of the GCS blob (in `bucket`) the signed URL will reference.    exp : timedelta, optional        Time from now when the signed url will expire.    content_type : str, optional        The required mime type of the data that is uploaded to the generated        signed url.    min_size : int, optional        The minimum size the uploaded file can be, in bytes (inclusive).        If the file is smaller than this, GCS will return a 400 code on upload.    max_size : int, optional        The maximum size the uploaded file can be, in bytes (inclusive).        If the file is larger than this, GCS will return a 400 code on upload."""ifexpisNone:exp=timedelta(hours=1)credentials,project_id=auth.default()ifcredentials.tokenisNone:# Perform a refresh request to populate the access token of the# current credentials.credentials.refresh(requests.Request())client=Client()bucket=client.get_bucket(bucket)blob=bucket.blob(blob)returnblob.generate_signed_url(version="v4",expiration=exp,service_account_email=credentials.service_account_email,access_token=credentials.token,method="PUT",content_type=content_type,headers={"X-Goog-Content-Length-Range":f"{min_size},{max_size}"})

Terraform

There is no robust way to do Cloud without infra as code, andTerraform is the perfect tool to manage your Cloud resources.

Here are the Terraform fragments for deploying this design:

# Resources to handle big data files (>32 Mb)# These files are uploaded to a special bucket with notificationsprovider"google-beta"{project=<yourGCPprojectname>}data"google_project""default"{provider=google-beta}resource"google_storage_bucket""bigframes_bucket"{project=<yourGCPprojectname>name="upload-big-files"location="EU"cors{origin=["*"]method=["*"]response_header=["Content-Type","Access-Control-Allow-Origin","X-Goog-Content-Length-Range"]max_age_seconds=3600}}resource"google_service_account""default"{provider=google-betaaccount_id="sa-webservice"}resource"google_storage_bucket_iam_member""bigframes_admin"{bucket=google_storage_bucket.bigframes_bucket.namerole="roles/storage.admin"member="serviceAccount:${google_service_account.default.email}"}# required to generate a signed urlresource"google_service_account_iam_member""tokencreator"{provider=google-betaservice_account_id=google_service_account.default.namerole="roles/iam.serviceAccountTokenCreator"member="serviceAccount:${google_service_account.default.email}"}# upload topic for notificationsresource"google_pubsub_topic""bigframes_topic"{provider=google-betaname="topic-bigframes"}# upload deadletter topic for failed notificationsresource"google_pubsub_topic""bigframes_topic_deadletter"{provider=google-betaname="topic-bigframesdeadletter"}# add frame upload notifications on the bucketresource"google_storage_notification""bigframes_notification"{provider=google-betabucket=google_storage_bucket.bigframes_bucket.namepayload_format="JSON_API_V1"topic=google_pubsub_topic.bigframes_topic.idevent_types=["OBJECT_FINALIZE"]depends_on=[google_pubsub_topic_iam_binding.bigframes_binding]}# required for storage notifications# seriously, Google, this should be by default !resource"google_pubsub_topic_iam_binding""bigframes_binding"{topic=google_pubsub_topic.bigframes_topic.idrole="roles/pubsub.publisher"members=["serviceAccount:service-${data.google_project.default.number}@gs-project-accounts.iam.gserviceaccount.com"]}# frame upload main subresource"google_pubsub_subscription""bigframes_sub"{provider=google-betaname="sub-bigframes"topic=google_pubsub_topic.bigframes_topic.idpush_config{push_endpoint=<URLwherepushednotificationarePOST-ed>}dead_letter_policy{dead_letter_topic=google_pubsub_topic.bigframes_topic_deadletter.id}}# frame upload deadletter subscriptionresource"google_pubsub_subscription""bigframes_sub_deadletter"{provider=google-betaname="sub-bigframesdeadletter"topic=google_pubsub_topic.bigframes_topic_deadletter.idack_deadline_seconds=600push_config{push_endpoint=<URLwherepushednotificationarePOST-ed>}}

Justterraform deploy it !

How to upload

One final gotcha: to upload to Cloud Storage with the signed url, you must set an additional header in thePUT request:
X-Goog-Content-Length-Range: <min size>,<max size>
wheremin size andmax size matchmin_size andmax_size of themake_signed_upload_url() method above.

Conclusion

Have you experienced this design ? How would you improve it ? Please let me know in the comments.

Thanks for reading! I’m Matthieu, data engineer at Stack Labs.
If you want to discover theStack Labs Data Platform or join an enthousiastData Engineering team, please contact us.

Design schemas made withExcalidraw and the GCP Icons library by@clementbosc
Cover photo byjoel herzog onUnsplash

Top comments(2)

Bruno Hernandez

Joined
May 9, 2023

• May 9 '23

Copy link

I went through the same problem on cloudrun and my request was limited to 32MB. I tried signed URLs, but in my case, since I had a form where I attached the file, I always needed to read it, and it was blocked due to that limitation. Finally the solution was to change the site to HTTP/2 and I solved the problem.

Note: Not only did I have to deploy to cloudrun by switching to HTTP/2 in the networking tab, but I also had to serve the website with a server that supports HTTP/2. For example, hypercorn.

In my Dockerfile, I just ran it in ASGI:

CMD exec hypercorn --bind :$PORT --workers 1 myproject.asgi:application

matthieucham

Programming is the art of adding bugs to an empty text file.

Work
Data engineer and senior developer at Stack-Labs
Joined
Jan 14, 2021

• May 11 '23

Copy link

Hi Bruno, thanks for your insight. I guess that in your situation, the form was also hosted on cloud run. In this case, you cannot implement my pattern. So, nice work having found a solution and thanks for sharing !

As a workaround, you could have implemented a piece of frontend in let's say ReactJS, whose only purpose would have been to let the user select a file and push it directly to cloud storage. That way the upload would have been done from the user's browser without cloud run in the picture :)