AI and ML perspective: Cost optimization

This document inWell-Architected Framework: AI and ML perspective provides an overview of principles and recommendations to optimize the cost ofyour AI systems throughout the ML lifecycle. By adopting a proactive andinformed cost management approach, your organization can realize the fullpotential of AI and ML systems and also maintain financial discipline. Therecommendations in this document align with thecost optimization pillar of the Google Cloud Well-Architected Framework.

AI and ML systems can help you unlock valuable insights and predictivecapabilities from data. For example, you canreduce friction in internal processes,improve user experiences,andgain deeper customer insights.The cloud offers vast amounts of resources and quick time-to-value without largeup-front investments for AI and ML workloads. To maximize business value and toalign the spending with your business goals, you need to understand the costdrivers, proactively optimize costs, set up spending controls, and adoptFinOps practices.

The recommendations in this document are mapped to the following coreprinciples:

Define and measure costs and returns

To effectively manage AI and ML costs in Google Cloud, you must define andmeasure the cloud resource costs and the business value of your AI and MLinitiatives. To help you track expenses granularly, Google Cloud providescomprehensive billing and cost management tools, such as the following:

Cloud Billing reports and tables
Looker Studio dashboards, budgets, and alerts
Cloud Monitoring
Cloud Logging

To make informed decisions about resource allocation and optimization, considerthe following recommendations.

Establish business goals and KPIs

Align the technical choices in your AI and ML projects with business goals andkey performance indicators (KPIs).

Define strategic objectives and ROI-focused KPIs

Ensure that AI and ML projects are aligned with strategic objectives likerevenue growth, cost reduction, customer satisfaction, and efficiency. Engagestakeholders to understand the business priorities. Define AI and ML objectivesthat are specific, measurable, attainable, relevant, and time-bound (SMART). Forexample, a SMART objective is: "Reduce chat handling time for customer supportby 15% in 6 months by using an AI chatbot".

To make progress towards your business goals and to measure the return oninvestment (ROI), define KPIs for the following categories of metrics:

Costs for training, inference, storage, and network resources, includingspecific unit costs (such as the cost per inference, data point, or task).These metrics help you gain insights into efficiency and cost optimizationopportunities. You can track these costs by usingCloud Billing reports andCloud Monitoring dashboards.
Business value metrics like revenue growth, cost savings, customersatisfaction, efficiency, accuracy, and adoption. You can track thesemetrics by usingBigQuery analytics andLooker dashboards.
Industry-specific metrics like the following:
- Retail industry: measure revenue lift and churn
- Healthcare industry: measure patient time and patient outcomes
- Finance industry: measure fraud reduction
Project-specific metrics. You can track these metrics by usingVertex AI Experiments andevaluation.
- Predictive AI: measure accuracy and precision
- Generative AI: measure adoption, satisfaction, and content quality
- Computer vision AI: measure accuracy

Foster a culture of cost awareness and continuous optimization

AdoptFinOps principles to ensure that each AI and ML project hasestimated costs and has ways to measure and track actual costs throughout its lifecycle. Ensurethat the costs and business benefits of your projects have assigned owners andclear accountability.

For more information, seeFoster a culture of cost awareness in the Cost Optimization pillar of the Google Cloud Well-Architected Framework.

Drive value and continuous optimization through iteration and feedback

Map your AI and ML applications directly to your business goals and measure theROI.

To validate your ROI hypotheses, start with pilot projects and use the followingiterative optimization cycle:

Monitor continuously and analyze data: Monitor KPIs and costs toidentify deviations and opportunities for optimization.
Make data-driven adjustments: Optimize strategies, models,infrastructure, and resource allocation based on data insights.
Refine iteratively: Adapt business objectives and KPIs based onthe things you learned and the evolving business needs. This iterationhelps you maintain relevance and strategic alignment.
Establish a feedback loop: Review performance, costs, and value withstakeholders to inform ongoing optimization and future project planning.

Manage billing data with Cloud Billing and labels

Effective cost optimization requires visibility into the source of each costelement. The recommendations in this section can help you use Google Cloudtools to get granular insights into your AI and ML costs. You can also attributecosts to specific AI and ML projects, teams, and activities. These insights laythe groundwork for cost optimization.

Organize and label Google Cloud resources

Structure your projects and resources in a hierarchy that reflects yourorganizational structure and your AI and ML workflows. To track and analyzecosts at different levels, organize your Google Cloud resources byusing organizations, folders, and projects. For more information, seeDecide a resource hierarchy for your Google Cloud landing zone.
Apply meaningfullabels to your resources. You can use labels thatindicate the project, team, environment, model name, dataset, use case, andperformance requirements. Labels provide valuable context for your billingdata and enable granular cost analysis.
Maintain consistency in your labeling conventions across all of your AIand ML projects. Consistent labeling conventions ensure that your billingdata is organized and can be readily analyzed.

Use billing-related tools

To facilitate detailed analysis and reporting,export the billing data to BigQuery.BigQuery has powerful query capabilities that let youanalyze the billing data to help you understand your costs.
To aggregate costs by labels, projects, or specific time periods, youcan write custom SQL queries in BigQuery. Such queries letyou attribute costs to specific AI and ML activities, such as modeltraining, hyperparameter tuning, or inference.
To identify cost anomalies or unexpected spending spikes, use theanalytic capabilities in BigQuery. This approach can helpyou detect potential issues or inefficiencies in your AI and ML workloads.
To identify and manage unexpected costs, use theanomaly detection dashboard in Cloud Billing.
To distribute costs across different teams or departments based onresource usage, use Google Cloud'scost allocation feature. Cost allocation promotes accountability and transparency.
To gain insights into spending patterns, explore the prebuiltCloud Billing reports.You can filter and customize these reports to focus on specific AI and MLprojects or services.

Monitor resources continuously with dashboards, alerts, and reports

To create a scalable and resilient way to track costs, you need continuousmonitoring and reporting. Dashboards, alerts, and reports constitute thefoundation for effective cost tracking. This foundation lets you maintainconstant access to cost information, identify areas of optimization, andensure alignment between business goals and costs.

Create a reporting system

Create scheduled reports and share them with appropriate stakeholders.

UseCloud Monitoring to collect metrics from various sources, including your applications,infrastructure, and Google Cloud services like Compute Engine,Google Kubernetes Engine (GKE), and Cloud Run functions. To visualizemetrics and logs in real time, you can use the prebuilt Cloud Monitoringdashboard or create custom dashboards. Custom dashboards let you define and addmetrics to track specific aspects of your systems, like model performance, APIcalls, or business-level KPIs.

UseCloud Logging for centralized collection and storage of logs from your applications, systems,and Google Cloud services. Use the logs for the following purposes:

Track costs and utilization of resources like CPU, memory, storage, andnetwork.
Identify cases of over-provisioning (where resources aren't fullyutilized) and under-provisioning (where there are insufficient resources).Over-provisioning results in unnecessary costs. Under-provisioning slowstraining times and might cause performance issues.
Identify idle or underutilized resources, such as VMs and GPUs, and takesteps to shut down or rightsize them to optimize costs.
Identify cost spikes to detect sudden and unexpected increases inresource usage or costs.

UseLooker orLooker Studio to create interactive dashboards and reports. Connect the dashboards and reportsto various data sources, including BigQuery andCloud Monitoring.

Set alert thresholds based on key KPIs

For your KPIs, determine the thresholds that should trigger alerts. Meaningfulalert thresholds can help you avoid alert fatigue. Createalerting policies in Cloud Monitoring to get notifications related to your KPIs. For example,you can get notifications when accuracy drops below a certain threshold orlatency exceeds a defined limit. Alerts based on log data can notify you aboutpotential cost issues in real time. Such alerts let you take corrective actionspromptly and prevent further financial loss.

Optimize resource allocation

To achieve cost efficiency for your AI and ML workloads in Google Cloud, youmust optimize resource allocation. To help you avoid unnecessary expenses andensure that your workloads have the resources that they need to performoptimally, align resource allocation with the needs of your workloads.

To optimize the allocation of cloud resources to AI and ML workloads, considerthe following recommendations.

Use autoscaling to dynamically adjust resources

Use Google Cloud services that support autoscaling, which automaticallyadjusts resource allocation to match the current demand. Autoscaling providesthe following benefits:

Cost and performance optimization: You avoid paying for idleresources. At the same time, autoscaling ensures that your systems have thenecessary resources to perform optimally, even at peak load.
Improved efficiency: You free up your team to focus on other tasks.
Increased agility: You can respond quickly to changing demands andmaintain high availability for your applications.

The following table summarizes the techniques that you can use to implementautoscaling for different stages of your AI projects.

Stage	Autoscaling techniques
Training	Use managed services likeVertex AI orGKE, which offer built-in autoscaling capabilities for training jobs. Configure autoscaling policies to scale the number of traininginstances based on metrics like CPU utilization, memory usage, and jobqueue length. Use custom scaling metrics to fine-tune autoscaling behavior foryour specific workloads.
Inference	Deploy your models on scalable platforms likeVertex AI Inference,GPUs on GKE, orTPUs on GKE. Use autoscaling features to adjust the number of replicas based onmetrics like request rate, latency, and resource utilization. Implement load balancing to distribute traffic evenly acrossreplicas and ensure high availability.

Start with small models and datasets

To help reduce costs, test ML hypotheses at a small scale when possible and usean iterative approach. This approach, with smaller models and datasets, providesthe following benefits:

Reduced costs from the start: Less compute power, storage, andprocessing time can result in lower costs during the initialexperimentation and development phases.
Faster iteration: Less training time is required, which lets youiterate faster, explore alternative approaches, and identify promisingdirections more efficiently.
Reduced complexity: Simpler debugging, analysis, and interpretationof results, which leads to faster development cycles.
Efficient resource utilization: Reduced chance of over-provisioningresources. You provision only the resources that are necessary for thecurrent workload.

Consider the following recommendations:

Use sample data first: Train your models on a representative subsetof your data. This approach lets you assess the model's performance andidentify potential issues without processing the entire dataset.
Experiment by using notebooks: Start with smaller instances and scaleas needed. You can useVertex AI Workbench,a managed Jupyter notebook environment that's well suited forexperimentation with different model architectures and datasets.
Start with simpler or pre-trained models: UseVertex AI Model Garden to discover and explore the pre-trained models. Such models require fewercomputational resources. Gradually increase thecomplexity as needed based on performance requirements.
- Use pre-trained models for tasks like image classification andnatural language processing. To save on training costs, you canfine-tune the models on smaller datasets initially.
- UseBigQuery ML for structured data. BigQuery ML lets you create and deploymodels directly within BigQuery. This approach can becost-effective for initial experimentation, because you can take advantageof the pay-per-query pricing model for BigQuery.
Scale for resource optimization: Use Google Cloud's flexibleinfrastructure to scale resources as needed. Start with smaller instancesand adjust their size or number when necessary.

Discover resource requirements through experimentation

Resource requirements for AI and ML workloads can vary significantly. Tooptimize resource allocation and costs, you must understand the specific needsof your workloads through systematic experimentation. To identify the mostefficient configuration for your models, test different configurations andanalyze their performance. Then, based on the requirements, right-size theresources that you used for training and serving.

We recommend the following approach for experimentation:

Start with a baseline: Begin with a baseline configuration based onyour initial estimates of the workload requirements. To create a baseline,you can use the cost estimator for new workloads or use an existing billingreport. For more information, seeUnlock the true cost of enterprise AI on Google Cloud.
Understand your quotas: Before launching extensive experiments,familiarize yourself with your Google Cloud projectquotas for the resources and APIs that you plan to use. The quotas determine therange of configurations that you can realistically test. By becomingfamiliar with quotas, you can work within the available resource limitsduring the experimentation phase.
Experiment systematically: Adjust parameters like the number ofCPUs, amount of memory, number and type of GPUs and TPUs, and storagecapacity.Vertex AI training andVertex AI predictions let you experiment with different machine types and configurations.
Monitor utilization, cost, and performance: Track the resourceutilization, cost, and key performance metrics such as training time,inference latency, and model accuracy, for each configuration that youexperiment with.
- To track resource utilization and performance metrics, you canuse the Vertex AI console.
- To collect and analyze detailed performance metrics, useCloud Monitoring.
- To view costs, useCloud Billing reports andCloud Monitoring dashboards.
- To identify performance bottlenecks in your models and optimizeresource utilization, use profiling tools likeVertex AI TensorBoard.
Analyze costs: Compare the cost and performance of eachconfiguration to identify the most cost-effective option.
Establish resource thresholds and improvement targets based on quotas:Define thresholds for when scaling begins to yielddiminishing returns in performance, such as minimal reduction in trainingtime or latency for a significant cost increase. Consider project quotaswhen setting these thresholds. Determine the point where the cost andpotential quota implications of further scaling are no longer justified byperformance gains.
Refine iteratively: Repeat the experimentation process withrefined configurations based on your findings. Always ensure that theresource usage remains within your allocated quotas and aligns withestablished cost-benefit thresholds.

Use MLOps to reduce inefficiencies

As organizations increasingly use ML to drive innovation and efficiency,managing the ML lifecycle effectively becomes critical. ML operations (MLOps) isa set of practices that automate and streamline the ML lifecycle, from modeldevelopment to deployment and monitoring.

Align MLOps with cost drivers

To take advantage of MLOps for cost efficiency, identify the primary costdrivers in the ML lifecycle. Then, you can adopt and implement MLOps practicesthat are aligned with the cost drivers. Prioritize and adopt the MLOps featuresthat address the most impactful cost drivers. This approach helps ensure amanageable and successful path to significant cost savings.

Implement MLOps for cost optimization

The following are common MLOps practices that help to reduce cost:

Version control: Tools like Git can help you to track versions ofcode, data, and models. Version control ensures reproducibility,facilitates collaboration, and prevents costly rework that can be caused byversioning issues.
Continuous integration and continuous delivery (CI/CD):Cloud Build andArtifact Registry let you implement CI/CD pipelines to automate building, testing, anddeployment of your ML models. CI/CD pipelines ensure efficient resourceutilization and minimize the costs associated with manual interventions.
Observability:Cloud Monitoring andCloud Logging let you track model performance in production, identify issues, and triggeralerts for proactive intervention. Observability lets you maintain modelaccuracy, optimize resource allocation, and prevent costly downtime orperformance degradation.
Model retraining:Vertex AI Pipelines simplifies the processes for retraining models periodically or whenperformance degrades. When you use Vertex AI Pipelines forretraining, it helps ensure that your models remain accurate and efficient,which can prevent unnecessary resource consumption and maintain optimalperformance.
Automated testing and evaluation:Vertex AI helps you accelerate and standardize model evaluation. Implement automatedtests throughout the ML lifecycle to ensure the quality and reliability ofyour models. Such tests can help you catch errors early, prevent costlyissues in production, and reduce the need for extensive manual testing.

For more information, seeMLOps: Continuous delivery and automation pipelines in machine learning.

Enforce data management and governance practices

Effective data management and governance practices are critical to costoptimization. Well organized data can encourage teams to reuse datasets, avoidneedless duplication, and reduce the effort to obtain high quality data. Byproactively managing data, you can reduce storage costs, enhance data quality,and ensure that your ML models are trained on the most relevant and valuabledata.

To implement data management and governance practices, consider the followingrecommendations.

Establish and adopt a data governance framework

The growing prominence of AI and ML has made data the most valuable asset fororganizations that are undergoing digital transformation. A robust framework fordata governance is a crucial requirement for managing AI and ML workloads cost-effectively atscale. A data governance framework with clearly defined policies, procedures,and roles provides a structured approach for managing data throughout itslifecycle. Such a framework helps to improve data quality, enhance security,improve utilization, and reduce redundancy.

Establish a data governance framework

There are many pre-existing frameworks for data governance, such as theframeworks published by theEDM Council,with options available for different industries and organization sizes. Chooseand adapt a framework that aligns with your specific needs and priorities.

Implement the data governance framework

Google Cloud provides the following services and tools to help you implement arobust data governance framework:

Dataplex Universal Catalog is an intelligent data fabric that helps you unify distributed data andautomate data governance without the need to consolidate data sets in oneplace. This helps to reduce the cost to distribute and maintain data,facilitate data discovery, and promote reuse.
- To organize data, use Dataplex Universal Catalog abstractions andset uplogical data lakes and zones.
- To administer access to data lakes and zones, useGoogle Groups andDataplex Universal Catalog roles.
- To streamline data quality processes, enableauto data quality.
Dataplex Universal Catalogis also a fully managed and scalable metadata management service. Thecatalog provides a foundation that ensures that data assets are accessible andreusable.
- Metadata from thesupported Google Cloud sources is automatically ingested into the universal catalog. For data sourcesoutside of Google Cloud,create custom entries.
- To improve the discoverability and management of data assets,enrich technical metadata with business metadata by usingaspects.
- Ensure that data scientists and ML practitioners have sufficientpermissions to access Dataplex Universal Catalog and use thesearch function.
BigQuery sharing lets you efficiently and securely exchange data assets across yourorganizations to address challenges of data reliability and cost.
- Set updata exchanges and ensure that curated data assets can be viewed aslistings.
- Usedata clean rooms to securely manage access to sensitive data and efficiently partnerwith external teams and organizations on AI and ML projects.
- Ensure that data scientists and ML practitioners have sufficientpermissions to view and publish datasets to BigQuery sharing.

Make datasets and features reusable throughout the ML lifecycle

For significant efficiency and cost benefits, reuse datasets and featuresacross multiple ML projects. When you avoid redundant data engineering andfeature development efforts, your organization can accelerate model development,reduce infrastructure costs, and free up valuable resources for other criticaltasks.

Google Cloud provides the following services and tools to help you reusedatasets and features:

Data and ML practitioners can publishdata products to maximize reuse across teams. The data products can then be discoveredand used through Dataplex Universal Catalog andBigQuery sharing.
For tabular and structured datasets, you can useVertex AI Feature Store to promote reusability and streamline feature management throughBigQuery.
You can store unstructured data in Cloud Storage and govern the databy usingBigQuery object tables and signed URLs.
You can manage vector embeddings by including metadata in yourVector Search indexes.

Automate and streamline with MLOps

A primary benefit of adopting MLOps practices is a reduction in costs fortechnology and personnel. Automation helps you avoid the duplication of MLactivities and reduce the workload for data scientists and ML engineers.

To automate and streamline ML development with MLOps, consider the followingrecommendations.

Automate and standardize data collection and processing

To help reduce ML development effort and time, automate and standardize yourdata collection and processing technologies.

Automate data collection and processing

This section summarizes the products, tools, and techniques that you can use toautomate data collection and processing.

Identify and choose the relevant data sources for your AI and ML tasks:

Database options such asCloud SQL,Spanner,AlloyDB for PostgreSQL,Firestore,andBigQuery.Your choice depends on your requirements, such as latency on write access(static or dynamic), data volume (high or low), and data format(structured, unstructured, or semi-structured). For more information, seeGoogle Cloud databases.
Data lakes such as Cloud Storage withBigLake.
Dataplex Universal Catalog for governing data across sources.
Streaming events platforms such asPub/Sub,Dataflow,orApache Kafka.
External APIs.

For each of your data sources, choose an ingestion tool:

Dataflow: For batch and stream processing of data fromvarious sources, with ML-component integration. For an event-drivenarchitecture, you can combine Dataflow withEventarc to efficiently process data for ML. To enhance MLOps and ML job efficiency,use GPU and right-fitting capabilities.
Cloud Run functions:For event-driven data ingestion that gets triggered by changes in datasources for real-time applications.
BigQuery: For classical tabular data ingestion withfrequent access.

Choose tools for data transformation and loading:

Use tools such asDataflow orDataform to automate data transformations like feature scaling, encoding categoricalvariables, and creating new features in batch, streaming, or real time. Thetools that you select depend upon your requirements and chosen services.
UseVertex AI Feature Store to automate feature creation and management. You can centralize featuresfor reuse across different models and projects.

Standardize data collection and processing

To discover, understand, and manage data assets, use metadata managementservices likeDataplex Universal Catalog.It helps you standardize data definitions and ensure consistency across yourorganization.

To enforce standardization and avoid the cost of maintaining multiple customimplementations, use automated training pipelines and orchestration. For moreinformation, see the next section.

Automate training pipelines and reuse existing assets

To boost efficiency and productivity in MLOps, automated training pipelines arecrucial. Google Cloud offers a robust set of tools and services to buildand deploy training pipelines, with a strong emphasis on reusing existingassets. Automated training pipelines help to accelerate model development,ensure consistency, and reduce redundant effort.

Automate training pipelines

The following table describes the Google Cloud services and features thatyou can use to automate the different functions of a training pipeline.

Function	Google Cloud services and features
Orchestration: Define complex ML workflowsconsisting of multiple steps and dependencies. You can define eachstep as a separate containerized task, which helps you manageand scale individual tasks with ease.	To create and orchestrate pipelines, useVertex AI Pipelines or Kubeflow Pipelines. These tools support simple data transformation, model training, model deployment, and pipeline versioning. They let you define dependencies between steps, manage data flow, and automate the execution of the entire workflow. For complex operational tasks with heavy CI/CD and extract, transform, and load (ETL) requirements, useCloud Composer. If you prefer Airflow for data orchestration, Cloud Composer is a compatible managed service that's built on Airflow. For pipelines that are managed outside of Vertex AI Pipelines, useWorkflows for infrastructure-focused tasks like starting and stopping VMs or integrating with external systems. To automate your CI/CD process, useCloud Build withPub/Sub. You can set up notifications and automatic triggers for when new code is pushed or when a new model needs to be trained. For a fully-managed, scalable solution for pipeline management, useCloud Data Fusion.
Versioning: Track and control different versions of pipelines and components to ensure reproducibility and auditability.	StoreKubeflow pipeline templates in a Kubeflow Pipelines repository inArtifact Registry.
Reusability: Reuse existing pipeline components and artifacts, such as prepared datasets and trained models, to accelerate development.	Store your pipeline templates inCloud Storage and share them across your organization.
Monitoring: Monitor pipeline execution to identify and address any issues.	Use Cloud Logging and Cloud Monitoring. For more information, seeMonitor resources continuously with dashboards, alerts, and reports.

Expand reusability beyond pipelines

Look for opportunities to expand reusability beyond training pipelines. Thefollowing are examples of Google Cloud capabilities that let you reuse MLfeatures, datasets, models, and code.

Vertex AI Feature Store provides a centralized repository for organizing, storing, and serving MLfeatures. It lets you reuse features across different projects and models,which can improve consistency and reduce feature engineering effort. Youcan store, share, and access features for both online and offline use cases.
Vertex AI datasets enable teams to create and manage datasets centrally, so your organizationcan maximize reusability and reduce data duplication. Your teams can searchand discover the datasets by usingDataplex Universal Catalog.
Vertex AI Model Registry lets you store, manage, and deploy your trained models.Model Registry letsyou reuse the models in subsequent pipelines or for online prediction,which helps you take advantage of previous training efforts.
Custom containers let you package your training code and dependencies into containers andstore the containers in Artifact Registry. Custom containers let youprovide consistent and reproducible training environments across differentpipelines and projects.

Use Google Cloud services for model evaluation and tuning

Google Cloud offers a powerful suite of tools and services to streamlineand automate model evaluation and tuning. These tools and services can help youreduce your time to production and reduce the resources required for continuoustraining and monitoring. By using these services, your AI and ML teams canenhance model performance with fewer expensive iterations, achieve fasterresults, and minimize wasted compute resources.

Use resource-efficient model evaluation and experimentation

Begin an AI project with experiments before you scale up your solution. In yourexperiments, track various metadata such as dataset version, model parameters,and model type. For further reproducibility and comparison of the results, usemetadata tracking in addition to code versioning, similar to the capabilities inGit. To avoid missing information or deploying the wrong version in production,useVertex AI Experiments before you implement full-scale deployment or training jobs.

Vertex AI Experiments lets you do the following:

Streamline and automate metadata tracking and discovery through a userfriendly UI and API for production-ready workloads.
Analyze the model's performance metrics and compare metrics acrossmultiple models.

After the model is trained, continuously monitor the performance and data driftover time for incoming data. To streamline this process, useVertex AI Model Monitoring to directly access the created models inModel Registry.Model Monitoring also automates monitoring for data andresults through online and batch predictions. You can export the results toBigQuery for further analysis and tracking.

Choose optimal strategies to automate training

For hyperparameter tuning, we recommend the following approaches:

To automate the process of finding the optimal hyperparameters for yourmodels, useVertex AI hyperparameter tuning.Vertex AI uses advanced algorithms to explore thehyperparameter space and identify the best configuration.
For efficient hyperparameter tuning, consider usingBayesian optimization techniques, especially when you deal with complex models and large datasets.

For distributed training, we recommend the following approaches:

For large datasets and complex models, use the distributed traininginfrastructure of Vertex AI. This approach lets you trainyour models on multiple machines, which helps to significantly reducetraining time and associated costs. Use tools like the following:
- Vertex AI tuning to perform supervised fine-tuning of Gemini, Imagen, and other models.
- Vertex AI training orRay on Vertex AI for custom distributed training.
Choose optimized ML frameworks, like Keras and PyTorch, that supportdistributed training and efficient resource utilization.

Use explainable AI

It's crucial to understand why a model makes certain decisions and to identifypotential biases or areas for improvement. UseVertex Explainable AI to gain insights into your model's predictions. Vertex Explainable AI offers a wayto automate feature-based and example-based explanations that are linked to yourVertex AI experiments.

Feature-based: To understand which features are most influential inyour model's predictions, analyzefeature attributions.This understanding can guide feature-engineering efforts and improve modelinterpretability.
Example-based:To return a list of examples (typically from the training set) that aremost similar to the input, Vertex AI uses nearest neighborsearch. Because similar inputs generally yield similar predictions, you canuse these explanations to explore and explain a model's behavior.

Use managed services and pre-trained models

Adopt an incremental approach to model selection and model development. Thisapproach helps you avoid excessive costs that are associated with startingafresh every time. To control costs, use ML frameworks, managed services, andpre-trained models.

To get the maximum value from managed services and pre-trained models, considerthe following recommendations.

Use notebooks for exploration and experiments

Notebook environments are crucial for cost-effective ML experimentation. A notebookprovides an interactive and collaborative space for data scientists andengineers to explore data, develop models, share knowledge, and iterateefficiently. Collaboration and knowledge sharing through notebooks significantlyaccelerates development, code reviews, and knowledge transfer. Notebooks helpstreamline workflows and reduce duplicated effort.

Instead of procuring and managing expensive hardware for your developmentenvironment, you can use the scalable and on-demand infrastructure ofVertex AI Workbench and Colab Enterprise.

Vertex AI Workbench is a Jupyter notebook development environment for the entire data scienceworkflow. You can interact with Vertex AI and other Google Cloudservices from within an instance's Jupyter notebook.Vertex AI Workbench integrations and features help you do thefollowing:
- Access and explore data from a Jupyter notebook by usingBigQuery and Cloud Storage integrations.
- Automate recurring updates to a model by using scheduledexecutions of code that runs on Vertex AI.
- Process data quickly by running a notebook on aDataproc cluster.
- Run a notebook as a step in a pipeline by usingVertex AI Pipelines.
Colab Enterprise is a collaborative, managed notebook environment that has the security andcompliance capabilities of Google Cloud.Colab Enterprise is ideal if your project's prioritiesinclude collaborative development and reducing the effort to manageinfrastructure. Colab Enterprise integrates withGoogle Cloud services and AI-powered assistance that usesGemini. Colab Enterprise lets you do the following:
- Work in notebooks without the need to manage infrastructure.
- Share a notebook with a single user, Google group, orGoogle Workspace domain. You can control notebook access throughIdentity and Access Management (IAM).
- Interact with features built into Vertex AI andBigQuery.

To track changes and revert to previous versions when necessary, you canintegrate your notebooks with version control tools like Git.

Start with existing and pre-trained models

Training complex models from scratch, especially deep-learning models, requiressignificant computational resources and time. To accelerate your model selectionand development process, start with existing and pre-trained models. Thesemodels, which are trained on vast datasets, eliminate the need to train modelsfrom scratch and significantly reduce cost and development time.

Reduce training and development costs

Select an appropriate model or API for each ML task and combine them to createan end-to-end ML development process.

Vertex AI Model Garden offers a vast collection of pre-trained models for tasks such as imageclassification, object detection, and natural language processing. The modelsare grouped into the following categories:

Google models like the Gemini family of models and Imagen for image generation.
Open-source models like Gemma and Llama.
Third-party models from partners like Anthropic and Mistral AI.

Google Cloud providesAI and ML APIs that let developers integrate powerful AI capabilities into applications withoutthe need to build models from scratch.

Cloud Vision API lets you derive insights from images. This API is valuable for applicationslike image analysis, content moderation, and automated data entry.
Cloud Natural Language API lets you analyze text to understand its structure and meaning. This API isuseful for tasks like customer feedback analysis, content categorization,and understanding social media trends.
Speech-to-Text API converts audio to text. This API supports a wide range of languages anddialects.
Video Intelligence API analyzes video content to identify objects, scenes, and actions. Use thisAPI for video content analysis, content moderation, and video search.
Document AI API processes documents to extract, classify, and understand data. This APIhelps you automate document processing workflows.
Dialogflow API enables the creation of conversational interfaces, such as chatbots andvoice assistants. You can use this API to create customer service bots andvirtual assistants.
Gemini API in Vertex AI provides access to Google's most capable and general-purpose AI model.

Reduce tuning costs

To help reduce the need for extensive data and compute time, fine-tune yourpre-trained models on specific datasets. We recommend the followingapproaches:

Learning transfer: Use the knowledge from a pre-trained model for anew task, instead of starting from scratch. This approach requires lessdata and compute time, which helps to reduce costs.
Adapter tuning (parameter-efficient tuning):Adapt models to new tasks or domains without full fine-tuning. This approachrequires significantly lower computational resources and a smaller dataset.
Supervised fine tuning:Adapt model behavior with a labeled dataset. This approachsimplifies the management of the underlying infrastructure and thedevelopment effort that's required for a custom training job.

Explore and experiment by using Vertex AI Studio

Vertex AI Studio lets you rapidly test, prototype, and deploy generative AI applications.

Integration with Model Garden: Provides quick access tothe latest models and lets you efficiently deploy the models to save timeand costs.
Unified access to specialized models: Consolidates access to a widerange of pre-trained models and APIs, including those for chat, text,media, translation, and speech. This unified access can help you reduce thetime spent searching for and integrating individual services.

Use managed services to train or serve models

Managed services can help reduce the cost of model training and simplify theinfrastructure management, which lets you focus on model development andoptimization. This approach can result in significant cost benefits andincreased efficiency.

Reduce operational overhead

To reduce the complexity and cost of infrastructure management, use managedservices such as the following:

Vertex AI training provides a fully managed environment for training your models at scale. Youcan choose from various prebuilt containers with popular ML frameworks oruse your own custom containers. Google Cloud handles infrastructureprovisioning, scaling, and maintenance, so you incur lower operational overhead.
Vertex AI predictions handles infrastructure scaling, load balancing, and request routing. Youget high availability and performance without manual intervention.
Ray on Vertex AI provides a fully managed Ray cluster. You can use the cluster to runcomplex custom AI workloads that perform many computations (hyperparametertuning, model fine-tuning, distributed model training, and reinforcementlearning from human feedback) without the need to manage your owninfrastructure.

Use managed services to optimize resource utilization

For details about efficient resource utilization, seeOptimize resource utilization.

Contributors

Authors:

Isaac Lo | AI Business Development Manager
Anastasia Prokaeva | Field Solutions Architect, Generative AI
Amy Southwood | Technical Solutions Consultant, Data Analytics & AI

Other contributors:

Filipe Gracio, PhD | Customer Engineer, AI/ML Specialist
Kumar Dhanagopal | Cross-Product Solution Developer
Marwan Al Shawi | Partner Customer Engineer
Nicolas Pintaux | Customer Engineer, Application Modernization Specialist

Reliability

Performance optimization

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-05-28 UTC.

Movatterモバイル変換

AI and ML perspective: Cost optimization Stay organized with collections Save and categorize content based on your preferences.