Agentic AI use case: Automate data science workflows

Last reviewed 2025-12-08 UTC

This document describes a high-level architecture for an application thatruns a data science workflow to automate complex data analytics and machinelearning tasks.

This architecture uses datasets that are hosted in BigQuery orAlloyDB for PostgreSQL. The architecture is a multi-agent system that lets users runactions in natural language commands and it eliminates the need to write complexSQL or Python code.

The intended audience for this document includes architects, developers, andadministrators who build and manage agentic AI applications. This architecturelets business and data teams analyze metrics across a wide range of industries,such as retail, finance, and manufacturing. The document assumes a foundationalunderstanding of agentic AI systems. For information about how agents differfrom non-agentic systems, seeWhat is the difference between AI agents, AI assistants, and bots?

Thedeployment section of this document provides links to codesamples to help you experiment with deploying an agentic AI application thatruns a data science workflow.

Architecture

The following diagram shows the architecture for a data science workflowagent.

Architecture for a data science workflow agent.

This architecture includes the following components:

ComponentDescription
FrontendUsers interact with the multi-agent system through a frontend, such as a chat interface, that runs as a serverless Cloud Run service.
Agents This architecture uses the following agents:
  • Root agent: Acoordinator agent that receives requests from the frontend service. The root agent interprets the user's request and it attempts to resolve a request itself. If the task requires specialized tools, the root agent delegates the request to the appropriate specialized agent.
  • Specialized agent: The root agent invokes the following specialized agents by using theagent as a tool feature.
    • Analytics agent: A specialized agent for data analysis and visualization. The analytics agent uses the AI model to generate and run Python code to process datasets, create charts, and perform statistical analysis.
    • AlloyDB for PostgreSQL agent: A specialized agent for interacting with data in AlloyDB for PostgreSQL. The agent uses the AI model to interpret the user's request and togenerate SQL in the PostgreSQL dialect. The agent securely connects to the database by using MCP Toolbox for Databases and then it runs the query to retrieve the requested data.
    • BigQuery agent: A specialized agent for interacting with data in BigQuery. The agent uses the AI model to interpret the user's request and generateGoogleSQL queries. The agent connects to the database by usingAgent Development Kit (ADK)'s built-in BigQuery tool and then it runs the query to retrieve the requested data.
  • BigQuery ML agent: A subagent of the root agent that is dedicated to machine learning workflows. The agent interacts withBigQuery ML to manage the end-to-end ML lifecycle. The agent can create and train models, run evaluations, and generate predictions based on user requests.
Agents runtimeThe AI agents in this architecture are deployed asserverless Cloud Run services.
ADKADK provides tools and a framework to develop, test, and deploy agents. ADK abstracts the complexity of agent creation and lets AI developers focus on the agent's logic and capabilities.
AI model and model runtimesFor inference serving, the agents in this example architecture use the latestGemini model onVertex AI.

Products used

This example architecture uses the following Google Cloud and open-sourceproducts and tools:

  • Cloud Run: A serverless compute platform that lets you runcontainers directly on top of Google's scalable infrastructure.
  • Agent Development Kit (ADK): A set of tools and libraries todevelop, test, and deploy AI agents.
  • Vertex AI: An ML platform that lets you train and deploy ML modelsand AI applications, and customize LLMs for use in AI-powered applications.
  • Gemini: A family of multimodal AI models developed by Google.
  • BigQuery: An enterprise data warehouse that helps you manage andanalyze your data with built-in features like machine learning geospatialanalysis, and business intelligence.
  • AlloyDB for PostgreSQL: A fully managed, PostgreSQL-compatible database servicethat's designed for your most demanding workloads, including hybridtransactional and analytical processing.
  • MCP Toolbox for Databases: An open-sourceModel Context Protocol (MCP) server that lets AI agents securely connect to databases by managing database complexities like connection pooling, authentication, and observability.

Deployment

To deploy a sample implementation of this architecture, useData Science with Multiple Agents. The repository provides two sample datasets todemonstrate the system's flexibility, including a flight dataset for operationalanalysis and an ecommerce sales dataset for business analytics.

What's next

Contributors

Author:Samantha He | Technical Writer

Other contributors:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-08 UTC.