Build an ML vision analytics solution with Dataflow and Cloud Vision API

Last reviewed 2024-05-23 UTC

In this reference architecture, you'll learn about the use cases,design alternatives, and design considerations when deploying aDataflow pipeline to process image files withCloud Vision and to store processed results in BigQuery.You can use those stored results for large scale data analysis and to trainBigQuery ML models.

This reference architecture document is intended for data engineers and datascientists.

Architecture

The following diagram illustrates the system flow for this referencearchitecture.

An architecture showing the flow of information for ingest and trigger, processing, and store and analyze processes.

As shown in the preceding diagram, information flows as follows:

  1. Ingest and trigger: This is the first stage of the system flow where images first enter the system. During this stage, the following actions occur:

    1. Clients upload image files to a Cloud Storage bucket.
    2. For each file upload, the Cloud Storage automaticallysends an input notification by publishing a message to Pub/Sub.
  2. Process: This stage immediately follows the ingest and trigger stage. For each new input notification, the following actions occur:

    1. The Dataflow pipeline listens for these fileinput notifications, extracts file metadata from thePub/Sub message, and sends the file reference toVision API for processing.
    2. Vision API reads the image and creates annotations.
    3. The Dataflow pipeline stores the annotations producedby Vision API in BigQuery tables.
  3. Store and analyze: This is the final stage in the flow. At this stage,you can do the following with the saved results:

    1. Query BigQuery tables and analyze the stored annotations.
    2. Use BigQuery ML or Vertex AI to build models andexecute predictions based on the stored annotations.
    3. Perform additional analysis in the Dataflow pipeline (notshown on this diagram).

Products used

This reference architecture uses the following Google Cloud products:

Use cases

Vision API supportsmultiple processing features,including image labeling, face and landmark detection, optical characterrecognition, explicit content tagging, and others. Each of these features enableseveral use cases that are applicable to different industries. This documentcontains some simple examples of what's possible when usingVision API, but the spectrum of possible applications is very broad.

Vision API also offers powerful pre-trained machine learning modelsthrough REST and RPC APIs. You can assign labels to images and classify theminto millions of predefined categories. It helps you detect objects, readprinted and handwritten text, and build valuable metadata into your imagecatalog.

This architecture doesn't require any model training before you can use it. Ifyou need a custom model trained on your specific data, Vertex AIlets you train an AutoML or a custom model for computer visionobjectives, like image classification and object detection. Or, you can useVertex AI Vision for an end-to-end application development environment that lets youbuild, deploy, and manage computer vision applications.

Design alternatives

Instead of storing images in a Google Cloud Storage bucket, theprocess that produces the images can publish them directly to a messagingsystem—Pub/Sub for example—and the Dataflow pipeline cansend the images directly to Vision API.

This design alternative can be a good solutionfor latency-sensitive use cases where you need to analyze images of relativelysmall sizes. Pub/Sub limits the maximum size of the message to10 Mb.

If you need to batch process a large number of images, you can use aspecifically designedasyncBatchAnnotate API.

Design considerations

This section describes the design considerations for this referencearchitecture:

Security, privacy, and compliance

Images received from untrusted sources can contain malware. BecauseVision API doesn't execute anything based on the images it analyzes,image-based malware wouldn't affect the API. If you need to scan images, changethe Dataflow pipeline to add a scanning step. To achieve the sameresult, you can also use a separate subscription to the Pub/Subtopic and scan images in a separate process.

For more information, seeAutomate malware scanning for files uploaded to Cloud Storage.

Vision API usesIdentity and Access Management (IAM) for authentication. To access the Vision API, thesecurity principal needsCloud Storage > Storage object viewer(roles/storage.objectViewer) access to the bucket that contains the files thatyou want to analyze.

For security principles and recommendations that are specific to AI and ML workloads, seeAI and ML perspective: Securityin the Well-Architected Framework.

Cost optimization

Compared to the other options discussed, like low-latency processing andasynchronous batch processing, this reference architecture uses a cost-efficientway to process the images in streaming pipelines by batching the API requests.The lower latency direct image streaming mentioned in theDesign alternatives section could be more expensive due to the additional Pub/Sub andDataflow costs. For image processing that doesn't need to happenwithin seconds or minutes, you can run the Dataflow pipeline inbatch mode. Running the pipeline in batch mode can provide some savings whencompared to what it costs to run the streaming pipeline.

Vision API supports offlineasynchronous batch image annotation for all features. The asynchronous request supports up to 2,000images per batch. In response, Vision API returns JSON files that arestored in a Cloud Storage bucket.

Vision API also provides a set of features for analyzing images.The pricing is per image per feature. To reduce costs, only request the specific featuresyou need for your solution.

To generate a cost estimate based on your projected usage, use thepricing calculator.

For cost optimization principles and recommendations that are specific to AI and ML workloads, seeAI and ML perspective: Cost optimizationin the Well-Architected Framework.

Performance optimization

Vision API is a resource intensive API. Because of that, processingimages at scale requires careful orchestration of the API calls. TheDataflow pipeline takes care of batching the API requests,gracefully handling of the exceptions related to reaching quotas, and producingcustom metrics of the API usage. These metrics can help you decide if an APIquota increase is warranted, or if the Dataflow pipelineparameters should be adjusted to reduce the frequency of requests. For moreinformation about increasing quota requests for Vision API, seeQuotas and limits.

The Dataflow pipeline has several parameters that can affect theprocessing latencies. For more information about these parameters, seeDeploy an ML vision analytics solution with Dataflow and Vision API.

For performance optimization principles and recommendations that are specific to AI and ML workloads, seeAI and ML perspective: Performance optimizationin the Well-Architected Framework.

Deployment

To deploy this architecture, seeDeploy an ML vision analytics solution with Dataflow and Vision API.

What's next

Contributors

Authors:

Other contributors:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-05-23 UTC.