Oct 31, 2025 · Aug 15, 2025 · Aug 18, 2025 · Aug 18, 2025 · Aug 18, 2025 · Aug 19, 2025
diff --git a/notebooks/generative_ai/bq_dataframes_llm_output_schema.ipynb b/notebooks/generative_ai/bq_dataframes_llm_output_schema.ipynb
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#BigFrames LLMOutput Schema\n",
    "#Format LLMoutput using an output schema\n",
    "\n",
    "<table align=\"left\">\n",
    "\n",
    "  <td>\n",
    "    <a href=\"https://console.cloud.google.com/bigquery/import?url=https://github.com/googleapis/python-bigquery-dataframes/blob/main/notebooks/generative_ai/bq_dataframes_llm_output_schema.ipynb\">\n",
    "      <img src=\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTW1gvOovVlbZAIZylUtf5Iu8-693qS1w5NJw&s\" alt=\"BQ logo\" width=\"35\">\n",
    "      Open inBQ Studio\n",
    "      Open inBigQuery Studio\n",
    "    </a>\n",
    "  </td>\n",
    "</table>\n"
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ThisNotebook introduces BigFramesLLMwithoutput schemato generate structured output dataframes."
    "Thisnotebook shows you how to create structuredLLMoutput by specifying anoutput schemawhen generating predictions with a Gemini model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup"
    "## Costs\n",
    "\n",
    "This tutorial uses billable components of Google Cloud:\n",
    "\n",
    "* BigQuery (compute)\n",
    "* BigQuery ML\n",
    "* Generative AI support on Vertex AI\n",
    "\n",
    "Learn about [BigQuery compute pricing](https://cloud.google.com/bigquery/pricing#analysis_pricing_models), [Generative AI support on Vertex AI pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing),\n",
    "and [BigQuery ML pricing](https://cloud.google.com/bigquery/pricing#section-11),\n",
    "and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n",
    "to generate a cost estimate based on your projected usage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Before you begin\n",
    "\n",
    "Complete the tasks in this section to set up your environment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up your Google Cloud project\n",
    "\n",
    "**The following steps are required, regardless of your notebook environment.**\n",
    "\n",
    "1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 credit towards your compute/storage costs.\n",
    "\n",
    "2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
    "\n",
    "3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,aiplatform.googleapis.com) to enable the following APIs:\n",
    "\n",
    "  * BigQuery API\n",
    "  * BigQuery Connection API\n",
    "  * Vertex AI API\n",
    "\n",
    "4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Authenticate your Google Cloud account\n",
    "\n",
    "Depending on your Jupyter environment, you might have to manually authenticate. Follow the relevant instructions below.\n",
    "\n",
    "**BigQuery Studio** or **Vertex AI Workbench**\n",
    "\n",
    "Do nothing, you are already authenticated.\n",
    "\n",
    "**Local JupyterLab instance**\n",
    "\n",
    "Uncomment and run the following cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count":2,
   "execution_count":null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT = \"bigframes-dev\" # replace with your project\n",
    "# ! gcloud auth login"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Colab**\n",
    "\n",
    "Uncomment and run the following cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# from google.colab import auth\n",
    "# auth.authenticate_user()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up your project"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set your project and import necessary modules. If you don't know your project ID, see [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "PROJECT = \"\" # replace with your project\n",
    "import bigframes\n",
    "# Setup project\n",
    "bigframes.options.bigquery.project = PROJECT\n",
    "bigframes.options.display.progress_bar = None\n",
    "\n",
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.Create a BigFrames DataFrame and a Gemini model\n",
    "Starting from creatinga simple dataframe of several cities and a Gemini model in BigFrames"
    "##Create a DataFrame and a Gemini model\n",
    "Createa simple[DataFrame](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.dataframe.DataFrame) of several cities:"
   ]
  },
  {
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Connect to a Gemini model using the [`GeminiTextGenerator` class](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm.GeminiTextGenerator):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Generate structured output data\n",
    "Before, llm models canonly generate text output.Saying if youwant to knowwhetherthecity is a US city, for example:"
    "## Generate structured output data\n",
    "Previously, LLMs couldonly generate text output.For example, youcould generate output that identifieswhethera givencity is a US city:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Theoutputs are textresultsthat human can read.But if want the outputdatato be more useful for analysis, it is better totransfer tostructured data like boolean, int or float values. Usuallytheprocess wasn't easy.\n",
    "Theoutput is text thatahuman can read.However, ifyouwant the output to be more useful for analysis, it is better toformat the output asstructured data. This is especially true when you want to have Boolean, integer, or float values to work with instead of string values. Previously, formattingtheoutput in this way wasn't easy.\n",
    "\n",
    "Now you can get structured output out-of-the-box by specifying the output_schema parameterinGemini modelpredict method. Inbelowexample, theoutputs are only booleanvalues."
    "Now, you can get structured output out-of-the-box by specifying the`output_schema` parameterwhen calling theGemini model's [`predict` method](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm.GeminiTextGenerator#bigframes_ml_llm_GeminiTextGenerator_predict). Inthe followingexample, themodel output is formatted as Booleanvalues:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can alsogetfloat orint values, forexample, toget populations in millions:"
    "You can alsoformat model output asfloat orinteger values. In the followingexample,the model output is formatted as float valuestoshow the city's population in millions:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And yearly rainy days:"
    "In the following example, the model output is formatted as integer values to show the count of the city's rainy days:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###3. Generate all types ofdata in one prediction\n",
    "You can get the different outputcolumnsand types in one prediction. \n",
    "###Format output as multipledata types in one prediction\n",
    "Within a single prediction, you can generate multiplecolumnsof output that use different data types. \n",
    "\n",
    "Note it doesn'trequirededicated prompts, as long as the output column names are informative to the model."
    "The input doesn'thave to bededicated prompts as long as the output column names are informative to the model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###4. Generatecomposite datatypes"
    "###Format output as acomposite datatype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Composite datatypes like array and struct can also be generated. Here the example generates a places_to_visit column as array of strings and a gps_coordinatesas struct of floats. Along with previous fields, all in one prediction."
    "You can generate composite data types like arrays and structs. The following example generates a`places_to_visit` column asanarray of strings and a`gps_coordinates` columnasastruct of floats:"
   ]
  },
  {
    "result = gemini.predict(df, prompt=[df[\"city\"]], output_schema={\"is_US_city\": \"bool\", \"population_in_millions\": \"float64\", \"rainy_days_per_year\": \"int64\", \"places_to_visit\": \"array<string>\", \"gps_coordinates\": \"struct<latitude float64, longitude float64>\"})\n",
    "result[[\"city\", \"is_US_city\", \"population_in_millions\", \"rainy_days_per_year\", \"places_to_visit\", \"gps_coordinates\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean up\n",
    "\n",
    "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
    "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
    "\n",
    "Otherwise, run the following cell to delete the temporary cloud artifacts created during the BigFrames session:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bpd.close_session()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "Learn more about BigQuery DataFrames in the [documentation](https://cloud.google.com/python/docs/reference/bigframes/latest) and find more sample notebooks in the [GitHub repo](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks)."
   ]
  }
 ],
 "metadata": {
Original file line number	Diff line number	Diff line change
Expand Up		@@ -25,7 +25,7 @@
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"#BigFrames LLMOutput Schema\n",
		"#Format LLMoutput using an output schema\n",
		"\n",
		"<table align=\"left\">\n",
		"\n",
Expand All		@@ -43,7 +43,7 @@
		" <td>\n",
		" <a href=\"https://console.cloud.google.com/bigquery/import?url=https://github.com/googleapis/python-bigquery-dataframes/blob/main/notebooks/generative_ai/bq_dataframes_llm_output_schema.ipynb\">\n",
		" <img src=\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTW1gvOovVlbZAIZylUtf5Iu8-693qS1w5NJw&s\" alt=\"BQ logo\" width=\"35\">\n",
		" Open inBQ Studio\n",
		" Open inBigQuery Studio\n",
		" </a>\n",
		" </td>\n",
		"</table>\n"
Expand All		@@ -53,26 +53,124 @@
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"ThisNotebook introduces BigFramesLLMwithoutput schemato generate structured output dataframes."
		"Thisnotebook shows you how to create structuredLLMoutput by specifying anoutput schemawhen generating predictions with a Gemini model."
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"### Setup"
		"## Costs\n",
		"\n",
		"This tutorial uses billable components of Google Cloud:\n",
		"\n",
		"* BigQuery (compute)\n",
		"* BigQuery ML\n",
		"* Generative AI support on Vertex AI\n",
		"\n",
		"Learn about [BigQuery compute pricing](https://cloud.google.com/bigquery/pricing#analysis_pricing_models), [Generative AI support on Vertex AI pricing](https://cloud.google.com/vertex-ai/generative-ai/pricing),\n",
		"and [BigQuery ML pricing](https://cloud.google.com/bigquery/pricing#section-11),\n",
		"and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)\n",
		"to generate a cost estimate based on your projected usage."
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"## Before you begin\n",
		"\n",
		"Complete the tasks in this section to set up your environment."
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"### Set up your Google Cloud project\n",
		"\n",
		"The following steps are required, regardless of your notebook environment.\n",
		"\n",
		"1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 credit towards your compute/storage costs.\n",
		"\n",
		"2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).\n",
		"\n",
		"3. [Click here](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,bigqueryconnection.googleapis.com,aiplatform.googleapis.com) to enable the following APIs:\n",
		"\n",
		" * BigQuery API\n",
		" * BigQuery Connection API\n",
		" * Vertex AI API\n",
		"\n",
		"4. If you are running this notebook locally, install the [Cloud SDK](https://cloud.google.com/sdk)."
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"#### Authenticate your Google Cloud account\n",
		"\n",
		"Depending on your Jupyter environment, you might have to manually authenticate. Follow the relevant instructions below.\n",
		"\n",
		"BigQuery Studio or Vertex AI Workbench\n",
		"\n",
		"Do nothing, you are already authenticated.\n",
		"\n",
		"Local JupyterLab instance\n",
		"\n",
		"Uncomment and run the following cell:"
		]
		},
		{
		"cell_type": "code",
		"execution_count":2,
		"execution_count":null,
		"metadata": {},
		"outputs": [],
		"source": [
		"PROJECT = \"bigframes-dev\" # replace with your project\n",
		"# ! gcloud auth login"
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"Colab\n",
		"\n",
		"Uncomment and run the following cell:"
		]
		},
		{
		"cell_type": "code",
		"execution_count": null,
		"metadata": {},
		"outputs": [],
		"source": [
		"# from google.colab import auth\n",
		"# auth.authenticate_user()"
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"## Set up your project"
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"Set your project and import necessary modules. If you don't know your project ID, see [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
		]
		},
		{
		"cell_type": "code",
		"execution_count": null,
		"metadata": {},
		"outputs": [],
		"source": [
		"PROJECT = \"\" # replace with your project\n",
		"import bigframes\n",
		"# Setup project\n",
		"bigframes.options.bigquery.project = PROJECT\n",
		"bigframes.options.display.progress_bar = None\n",
		"\n",
Expand All		@@ -84,8 +182,8 @@
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"### 1.Create a BigFrames DataFrame and a Gemini model\n",
		"Starting from creatinga simple dataframe of several cities and a Gemini model in BigFrames"
		"##Create a DataFrame and a Gemini model\n",
		"Createa simple[DataFrame](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.dataframe.DataFrame) of several cities:"
		]
		},
		{
Expand DownExpand Up		@@ -162,6 +260,13 @@
		"df"
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"Connect to a Gemini model using the [`GeminiTextGenerator` class](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm.GeminiTextGenerator):"
		]
		},
		{
		"cell_type": "code",
		"execution_count": 4,
Expand All		@@ -186,8 +291,8 @@
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"### 2. Generate structured output data\n",
		"Before, llm models canonly generate text output.Saying if youwant to knowwhetherthecity is a US city, for example:"
		"## Generate structured output data\n",
		"Previously, LLMs couldonly generate text output.For example, youcould generate output that identifieswhethera givencity is a US city:"
		]
		},
		{
Expand DownExpand Up		@@ -273,9 +378,9 @@
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"Theoutputs are textresultsthat human can read.But if want the outputdatato be more useful for analysis, it is better totransfer tostructured data like boolean, int or float values. Usuallytheprocess wasn't easy.\n",
		"Theoutput is text thatahuman can read.However, ifyouwant the output to be more useful for analysis, it is better toformat the output asstructured data. This is especially true when you want to have Boolean, integer, or float values to work with instead of string values. Previously, formattingtheoutput in this way wasn't easy.\n",
		"\n",
		"Now you can get structured output out-of-the-box by specifying the output_schema parameterinGemini modelpredict method. Inbelowexample, theoutputs are only booleanvalues."
		"Now, you can get structured output out-of-the-box by specifying the`output_schema` parameterwhen calling theGemini model's [`predict` method](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.ml.llm.GeminiTextGenerator#bigframes_ml_llm_GeminiTextGenerator_predict). Inthe followingexample, themodel output is formatted as Booleanvalues:"
		]
		},
		{
Expand DownExpand Up		@@ -361,7 +466,7 @@
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"You can alsogetfloat orint values, forexample, toget populations in millions:"
		"You can alsoformat model output asfloat orinteger values. In the followingexample,the model output is formatted as float valuestoshow the city's population in millions:"
		]
		},
		{
Expand DownExpand Up		@@ -447,7 +552,7 @@
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"And yearly rainy days:"
		"In the following example, the model output is formatted as integer values to show the count of the city's rainy days:"
		]
		},
		{
Expand DownExpand Up		@@ -533,10 +638,10 @@
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"###3. Generate all types ofdata in one prediction\n",
		"You can get the different outputcolumnsand types in one prediction. \n",
		"###Format output as multipledata types in one prediction\n",
		"Within a single prediction, you can generate multiplecolumnsof output that use different data types. \n",
		"\n",
		"Note it doesn'trequirededicated prompts, as long as the output column names are informative to the model."
		"The input doesn'thave to bededicated prompts as long as the output column names are informative to the model."
		]
		},
		{
Expand DownExpand Up		@@ -630,14 +735,14 @@
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"###4. Generatecomposite datatypes"
		"###Format output as acomposite datatype"
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"Composite datatypes like array and struct can also be generated. Here the example generates a places_to_visit column as array of strings and a gps_coordinatesas struct of floats. Along with previous fields, all in one prediction."
		"You can generate composite data types like arrays and structs. The following example generates a`places_to_visit` column asanarray of strings and a`gps_coordinates` columnasastruct of floats:"
		]
		},
		{
Expand DownExpand Up		@@ -744,6 +849,36 @@
		"result = gemini.predict(df, prompt=[df[\"city\"]], output_schema={\"is_US_city\": \"bool\", \"population_in_millions\": \"float64\", \"rainy_days_per_year\": \"int64\", \"places_to_visit\": \"array<string>\", \"gps_coordinates\": \"struct<latitude float64, longitude float64>\"})\n",
		"result[[\"city\", \"is_US_city\", \"population_in_millions\", \"rainy_days_per_year\", \"places_to_visit\", \"gps_coordinates\"]]"
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"## Clean up\n",
		"\n",
		"To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
		"project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
		"\n",
		"Otherwise, run the following cell to delete the temporary cloud artifacts created during the BigFrames session:"
		]
		},
		{
		"cell_type": "code",
		"execution_count": null,
		"metadata": {},
		"outputs": [],
		"source": [
		"bpd.close_session()"
		]
		},
		{
		"cell_type": "markdown",
		"metadata": {},
		"source": [
		"## Next steps\n",
		"\n",
		"Learn more about BigQuery DataFrames in the [documentation](https://cloud.google.com/python/docs/reference/bigframes/latest) and find more sample notebooks in the [GitHub repo](https://github.com/googleapis/python-bigquery-dataframes/tree/main/notebooks)."
		]
		}
		],
		"metadata": {
Expand Down