Create recommendations based on implicit feedback with a matrix factorization model Stay organized with collections Save and categorize content based on your preferences.
This tutorial teaches you how to create amatrix factorization modeland train it on the Google Analytics 360 user session data in the publicGA360_test.ga_sessions_sample table. You then use the matrix factorization model to generate content recommendationsfor site users.
Using indirect customer preference information, like user session duration,to train the model is called training withimplicit feedback. Matrixfactorization models are trained using theWeighted-Alternating Least Squares algorithmwhen you use implicit feedback as training data.
Important: You must have a reservation in order to use a matrix factorizationmodel. For more information, seePricing.Objectives
This tutorial guides you through completing the following tasks:
- Creating a matrix factorization model by using the
CREATE MODELstatement. - Evaluating the model by using the
ML.EVALUATEfunction. - Generating content recommendations for users by using the model with the
ML.RECOMMENDfunction.
Costs
This tutorial uses billable components of Google Cloud,including:
- BigQuery
- BigQuery ML
For more information about BigQuery costs, see theBigQuery pricing page.
For more information about BigQuery ML costs, seeBigQuery ML pricing.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
- BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, go to
Enable the BigQuery API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.
Required Permissions
To create the dataset, you need the
bigquery.datasets.createIAM permission.To create the model, you need the following permissions:
bigquery.jobs.createbigquery.models.createbigquery.models.getDatabigquery.models.updateData
To run inference, you need the following permissions:
bigquery.models.getDatabigquery.jobs.create
For more information about IAM roles and permissions inBigQuery, seeIntroduction to IAM.
Create a dataset
Create a BigQuery dataset to store your ML model.
Console
In the Google Cloud console, go to theBigQuery page.
In theExplorer pane, click your project name.
ClickView actions > Create dataset
On theCreate dataset page, do the following:
ForDataset ID, enter
bqml_tutorial.ForLocation type, selectMulti-region, and then selectUS (multiple regions in United States).
Leave the remaining default settings as they are, and clickCreate dataset.
bq
To create a new dataset, use thebq mk commandwith the--location flag. For a full list of possible parameters, see thebq mk --dataset commandreference.
Create a dataset named
bqml_tutorialwith the data location set toUSand a description ofBigQuery ML tutorial dataset:bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
Instead of using the
--datasetflag, the command uses the-dshortcut.If you omit-dand--dataset, the command defaults to creating adataset.Confirm that the dataset was created:
bqls
API
Call thedatasets.insertmethod with a defineddataset resource.
{"datasetReference":{"datasetId":"bqml_tutorial"}}
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.
importgoogle.cloud.bigquerybqclient=google.cloud.bigquery.Client()bqclient.create_dataset("bqml_tutorial",exists_ok=True)Prepare the sample data
Transform the data from theGA360_test.ga_sessions_sample table into a betterstructure for model training, and then write this data to aBigQuery table. The following query calculates the sessionduration for each user for each piece of content, which you can then use asimplicit feedback to infer the user's preference for that content.
Follow these steps to create the training data table:
In the Google Cloud console, go to theBigQuery page.
Create the training data table. In the query editor, paste in the followingquery and clickRun:
CREATEORREPLACETABLE`bqml_tutorial.analytics_session_data`ASWITHvisitor_page_contentAS(SELECTfullVisitorID,(SELECTMAX(IF(index=10,value,NULL))FROMUNNEST(hits.customDimensions))ASlatestContentId,(LEAD(hits.time,1)OVER(PARTITIONBYfullVisitorIdORDERBYhits.timeASC)-hits.time)ASsession_durationFROM`cloud-training-demos.GA360_test.ga_sessions_sample`,UNNEST(hits)AShitsWHERE# only include hits on pageshits.type='PAGE'GROUPBYfullVisitorId,latestContentId,hits.time)# aggregate web statsSELECTfullVisitorIDASvisitorId,latestContentIdAScontentId,SUM(session_duration)ASsession_durationFROMvisitor_page_contentWHERElatestContentIdISNOTNULLGROUPBYfullVisitorID,latestContentIdHAVINGsession_duration>0ORDERBYlatestContentId;
View a subset of the training data. In the query editor, paste in the followingquery and clickRun:
SELECT*FROM`bqml_tutorial.analytics_session_data`LIMIT5;
The results should look similar to the following:
+---------------------+-----------+------------------+| visitorId | contentId | session_duration |+---------------------+-----------+------------------+| 7337153711992174438 | 100074831 | 44652 |+---------------------+-----------+------------------+| 5190801220865459604 | 100170790 | 121420 |+---------------------+-----------+------------------+| 2293633612703952721 | 100510126 | 47744 |+---------------------+-----------+------------------+| 5874973374932455844 | 100510126 | 32109 |+---------------------+-----------+------------------+| 1173698801255170595 | 100676857 | 10512 |+---------------------+-----------+------------------+
Create the model
Create a matrix factorization model and train it on the data in theanalytics_session_data table. The model is trained to predict a confidencerating for everyvisitorId-contentId pair. The confidence rating is createdwith centering and scaling by the median session duration. Records where thesession duration is more than 3.33 times the median are filtered outas outliers.
The followingCREATE MODEL statement uses these columns to generaterecommendations:
visitorId—The visitor ID.contentId—The content ID.rating—The implicit rating from 0 to 1 calculated for eachvisitor-content pair, centered and scaled.
In the Google Cloud console, go to theBigQuery page.
In the query editor, paste in the following query and clickRun:
CREATEORREPLACEMODEL`bqml_tutorial.mf_implicit`OPTIONS(MODEL_TYPE='matrix_factorization',FEEDBACK_TYPE='implicit',USER_COL='visitorId',ITEM_COL='contentId',RATING_COL='rating',L2_REG=30,NUM_FACTORS=15)ASSELECTvisitorId,contentId,0.3*(1+(session_duration-57937)/57937)ASratingFROM`bqml_tutorial.analytics_session_data`WHERE0.3*(1+(session_duration-57937)/57937)<1;
The query takes about 10 minutes to complete, after which the
mf_implicitmodel appears in theExplorer pane. Becausethe query uses aCREATE MODELstatement to create a model, you don't seequery results.
Get training statistics
Optionally, you can view the model's training statistics in theGoogle Cloud console.
A machine learning algorithm builds a model by creating many iterations ofthe model using different parameters, and then selecting the version of themodel that minimizesloss.This process is called empirical risk minimization. The model's trainingstatistics let you see the loss associated with each iteration of the model.
Follow these steps to view the model's training statistics:
In the Google Cloud console, go to theBigQuery page.
In the left pane, clickExplorer:

If you don't see the left pane, clickExpand left pane to open the pane.
In theExplorer pane, expand your project and clickDatasets.
Click the
bqml_tutorialdataset. You can also use the search feature orfilters to find the dataset.Click theModels tab.
Click the
mf_implicitmodel and then click theTraining tabIn theView as section, clickTable. The results should looksimilar to the following:
+-----------+--------------------+--------------------+| Iteration | Training Data Loss | Duration (seconds) |+-----------+--------------------+--------------------+| 5 | 0.0027 | 47.27 |+-----------+--------------------+--------------------+| 4 | 0.0028 | 39.60 |+-----------+--------------------+--------------------+| 3 | 0.0032 | 55.57 |+-----------+--------------------+--------------------+| ... | ... | ... |+-----------+--------------------+--------------------+
TheTraining Data Loss column represents the loss metric calculatedafter the model is trained. Because this is a matrix factorization model,this column shows themean squared error.
Evaluate the model
Evaluate the performance of the model by using theML.EVALUATE function.TheML.EVALUATE function evaluates the predicted content ratings returned bythe model against the evaluation metrics calculated during training.
Follow these steps to evaluate the model:
In the Google Cloud console, go to theBigQuery page.
In the query editor, paste in the following query and clickRun:
SELECT*FROMML.EVALUATE(MODEL`bqml_tutorial.mf_implicit`);
The results should look similar to the following:
+------------------------+-----------------------+---------------------------------------+---------------------+| mean_average_precision | mean_squared_error | normalized_discounted_cumulative_gain | average_rank |+------------------------+-----------------------+---------------------------------------+---------------------+| 0.4434341257478137 | 0.0013381759837648962 | 0.9433280547112802 | 0.24031636088594222 |+------------------------+-----------------------+---------------------------------------+---------------------+
For more information about the
ML.EVALUATEfunction output, seeOutput.
Get the predicted ratings for a subset of visitor-content pairs
Use theML.RECOMMEND to get the predicted rating for each piece of contentfor five site visitors.
Follow these steps to get predicted ratings:
In the Google Cloud console, go to theBigQuery page.
In the query editor, paste in the following query and clickRun:
SELECT*FROMML.RECOMMEND(MODEL`bqml_tutorial.mf_implicit`,(SELECTvisitorIdFROM`bqml_tutorial.analytics_session_data`LIMIT5));
The results should look similar to the following:
+-------------------------------+---------------------+-----------+| predicted_rating_confidence | visitorId | contentId |+-------------------------------+---------------------+-----------+| 0.0033608418060270262 | 7337153711992174438 | 277237933 |+-------------------------------+---------------------+-----------+| 0.003602395397293956 | 7337153711992174438 | 158246147 |+-------------------------------+---------------------+-- -------+| 0.0053197670652785356 | 7337153711992174438 | 299389988 |+-------------------------------+---------------------+-----------+| ... | ... | ... |+-------------------------------+---------------------+-----------+
Generate recommendations
Use the predicted ratings to generate the top five recommended content IDsfor each visitor ID.
Follow these steps to generate recommendations:
In the Google Cloud console, go to theBigQuery page.
Write the predicted ratings to a table. In the query editor, paste in thefollowing query and clickRun:
CREATEORREPLACETABLE`bqml_tutorial.recommend_content`ASSELECT*FROMML.RECOMMEND(MODEL`bqml_tutorial.mf_implicit`);
Select the top five results per visitor. In the query editor, paste in thefollowing query and clickRun:
SELECTvisitorId,ARRAY_AGG(STRUCT(contentId,predicted_rating_confidence)ORDERBYpredicted_rating_confidenceDESCLIMIT5)ASrecFROM`bqml_tutorial.recommend_content`GROUPBYvisitorId;
The results should look similar to the following:
+---------------------+-----------------+---------------------------------+| visitorId | rec:contentId | rec:predicted_rating_confidence |+---------------------+-----------------+------------------------- ------+| 867526255058981688 | 299804319 | 0.88170525357178664 || | 299935287 | 0.54699439944935124 || | 299410466 | 0.53424780863188659 || | 299826767 | 0.46949603950374219 || | 299809748 | 0.3379991197434149 |+---------------------+-----------------+---------------------------------+| 2434264018925667659 | 299824032 | 1.3903516407308065 || | 299410466 | 0.9921995618196483 || | 299903877 | 0.92333625294129218 || | 299816215 | 0.91856701667757279 || | 299852437 | 0.86973661454890561 |+---------------------+-----------------+---------------------------------+| ... | ... | ... |+---------------------+-----------------+---------------------------------+
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
- You can delete the project you created.
- Or you can keep the project and delete the dataset.
Delete your dataset
Deleting your project removes all datasets and all tables in the project. If youprefer to reuse the project, you can delete the dataset you created in thistutorial:
If necessary, open the BigQuery page in theGoogle Cloud console.
In the navigation, click thebqml_tutorial dataset you created.
ClickDelete dataset on the right side of the window.This action deletes the dataset, the table, and all the data.
In theDelete dataset dialog, confirm the delete command by typingthe name of your dataset (
bqml_tutorial) and then clickDelete.
Delete your project
To delete the project:
What's next
- Trycreating a matrix factorization model based on explicit feedback.
- For an overview of BigQuery ML, seeIntroduction to BigQuery ML.
- To learn more about machine learning, see theMachine learning crash course.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.