Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

Download Microsoft EdgeMore info about Internet Explorer and Microsoft Edge
Table of contentsExit focus mode

Tutorial: Build a movie recommender using matrix factorization with ML.NET

  • 2021-11-29
Feedback

In this article

This tutorial shows you how to build a movie recommender with ML.NET in a .NET console application. The steps use C# and Visual Studio 2019.

In this tutorial, you learn how to:

  • Select a machine learning algorithm
  • Prepare and load your data
  • Build and train a model
  • Evaluate a model
  • Deploy and consume a model

You can find the source code for this tutorial at thedotnet/samples repository.

Machine learning workflow

You will use the following steps to accomplish your task, as well as any other ML.NET task:

  1. Load your data
  2. Build and train your model
  3. Evaluate your model
  4. Use your model

Prerequisites

Select the appropriate machine learning task

There are several ways to approach recommendation problems, such as recommending a list of movies or recommending a list of related products, but in this case you will predict what rating (1-5) a user will give to a particular movie and recommend that movie if it's higher than a defined threshold (the higher the rating, the higher the likelihood of a user liking a particular movie).

Create a console application

Create a project

  1. Create a C#Console Application called "MovieRecommender". Click theNext button.

  2. Choose .NET 8 as the framework to use. Click theCreate button.

  3. Create a directory namedData in your project to store the data set:

    InSolution Explorer, right-click the project and selectAdd >New Folder. Type "Data" and selectEnter.

  4. Install theMicrosoft.ML andMicrosoft.ML.Recommender NuGet Packages:

    Note

    This sample uses the latest stable version of the NuGet packages mentioned unless otherwise stated.

    InSolution Explorer, right-click the project and selectManage NuGet Packages. Choose "nuget.org" as the Package source, select theBrowse tab, search forMicrosoft.ML, select the package in the list, and selectInstall. Select theOK button on thePreview Changes dialog and then select theI Accept button on theLicense Acceptance dialog if you agree with the license terms for the packages listed. Repeat these steps forMicrosoft.ML.Recommender.

  5. Add the followingusing directives at the top of yourProgram.cs file:

    using Microsoft.ML;using Microsoft.ML.Trainers;using MovieRecommendation;

Download your data

  1. Download the two datasets and save them to theData folder you previously created:

  2. In Solution Explorer, right-click each of the *.csv files and selectProperties. UnderAdvanced, change the value ofCopy to Output Directory toCopy if newer.

    GIF of a user selecting copy if newer in VS.

Load your data

The first step in the ML.NET process is to prepare and load your model training and testing data.

The recommendation ratings data is split intoTrain andTest datasets. TheTrain data is used to fit your model. TheTest data is used to make predictions with your trained model and evaluate model performance. It's common to have an 80/20 split withTrain andTest data.

Below is a preview of the data from your *.csv files:

Screenshot of the preview of the CVS dataset.

In the *.csv files, there are four columns:

  • userId
  • movieId
  • rating
  • timestamp

In machine learning, the columns that are used to make a prediction are calledFeatures, and the column with the returned prediction is called theLabel.

You want to predict movie ratings, so the rating column is theLabel. The other three columns,userId,movieId, andtimestamp are allFeatures used to predict theLabel.

FeaturesLabel
userIdrating
movieId
timestamp

It's up to you to decide whichFeatures are used to predict theLabel. You can also use methods likepermutation feature importance to help with selecting the bestFeatures.

In this case, you should eliminate thetimestamp column as aFeature because the timestamp does not really affect how a user rates a given movie and thus would not contribute to making a more accurate prediction:

FeaturesLabel
userIdrating
movieId

Next you must define your data structure for the input class.

Add a new class to your project:

  1. InSolution Explorer, right-click the project, and then selectAdd > New Item.

  2. In theAdd New Item dialog box, selectClass and change theName field toMovieRatingData.cs. Then, selectAdd.

TheMovieRatingData.cs file opens in the code editor. Add the followingusing directive to the top ofMovieRatingData.cs:

using Microsoft.ML.Data;

Create a class calledMovieRating by removing the existing class definition and adding the following code inMovieRatingData.cs:

public class MovieRating{    [LoadColumn(0)]    public float userId;    [LoadColumn(1)]    public float movieId;    [LoadColumn(2)]    public float Label;}

MovieRating specifies an input data class. TheLoadColumn attribute specifies which columns (by column index) in the dataset should be loaded. TheuserId andmovieId columns are yourFeatures (the inputs you will give the model to predict theLabel), and the rating column is theLabel that you will predict (the output of the model).

Create another class,MovieRatingPrediction, to represent predicted results by adding the following code after theMovieRating class inMovieRatingData.cs:

public class MovieRatingPrediction{    public float Label;    public float Score;}

InProgram.cs, replace theConsole.WriteLine("Hello World!") with the following code:

MLContext mlContext = new MLContext();

TheMLContext class is a starting point for all ML.NET operations, and initializingmlContext creates a new ML.NET environment that can be shared across the model creation workflow objects. It's similar, conceptually, toDBContext in Entity Framework.

At the bottom of the file, create a method calledLoadData():

(IDataView training, IDataView test) LoadData(MLContext mlContext){}

Note

This method will give you an error until you add a return statement in the following steps.

Initialize your data path variables, load the data from the *.csv files, and return theTrain andTest data asIDataView objects by adding the following as the next line of code inLoadData():

var trainingDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "recommendation-ratings-train.csv");var testDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "recommendation-ratings-test.csv");IDataView trainingDataView = mlContext.Data.LoadFromTextFile<MovieRating>(trainingDataPath, hasHeader: true, separatorChar: ',');IDataView testDataView = mlContext.Data.LoadFromTextFile<MovieRating>(testDataPath, hasHeader: true, separatorChar: ',');return (trainingDataView, testDataView);

Data in ML.NET is represented as anIDataView interface.IDataView is a flexible, efficient way of describing tabular data (numeric and text). Data can be loaded from a text file or in real time (for example, SQL database or log files) to anIDataView object.

TheLoadFromTextFile() defines the data schema and reads in the file. It takes in the data path variables and returns anIDataView. In this case, you provide the path for yourTest andTrain files and indicate both the text file header (so it can use the column names properly) and the comma character data separator (the default separator is a tab).

Add the following code to call yourLoadData() method and return theTrain andTest data:

(IDataView trainingDataView, IDataView testDataView) = LoadData(mlContext);

Build and train your model

Create theBuildAndTrainModel() method, just after theLoadData() method, using the following code:

ITransformer BuildAndTrainModel(MLContext mlContext, IDataView trainingDataView){}

Note

This method will give you an error until you add a return statement in the following steps.

Define the data transformations by adding the following code toBuildAndTrainModel():

IEstimator<ITransformer> estimator = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "userIdEncoded", inputColumnName: "userId")    .Append(mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "movieIdEncoded", inputColumnName: "movieId"));

SinceuserId andmovieId represent users and movie titles, not real values, you use theMapValueToKey() method to transform eachuserId and eachmovieId into a numeric key typeFeature column (a format accepted by recommendation algorithms) and add them as new dataset columns:

userIdmovieIdLabeluserIdEncodedmovieIdEncoded
114userKey1movieKey1
134userKey1movieKey2
164userKey1movieKey3

Choose the machine learning algorithm and append it to the data transformation definitions by adding the following as the next line of code inBuildAndTrainModel():

var options = new MatrixFactorizationTrainer.Options{    MatrixColumnIndexColumnName = "userIdEncoded",    MatrixRowIndexColumnName = "movieIdEncoded",    LabelColumnName = "Label",    NumberOfIterations = 20,    ApproximationRank = 100};var trainerEstimator = estimator.Append(mlContext.Recommendation().Trainers.MatrixFactorization(options));

TheMatrixFactorizationTrainer is your recommendation training algorithm.Matrix Factorization is a common approach to recommendation when you have data on how users have rated products in the past, which is the case for the datasets in this tutorial. There are other recommendation algorithms for when you have different data available (see theOther recommendation algorithms section below to learn more).

In this case, theMatrix Factorization algorithm uses a method called "collaborative filtering", which assumes that if User 1 has the same opinion as User 2 on a certain issue, then User 1 is more likely to feel the same way as User 2 about a different issue.

For instance, if User 1 and User 2 rate movies similarly, then User 2 is more likely to enjoy a movie that User 1 has watched and rated highly:

Incredibles 2 (2018)The Avengers (2012)Guardians of the Galaxy (2014)
User 1Watched and liked movieWatched and liked movieWatched and liked movie
User 2Watched and liked movieWatched and liked movieHas not watched -- RECOMMEND movie

TheMatrix Factorization trainer has severalOptions, which you can read more about in theAlgorithm hyperparameters section below.

Fit the model to theTrain data and return the trained model by adding the following as the next line of code in theBuildAndTrainModel() method:

Console.WriteLine("=============== Training the model ===============");ITransformer model = trainerEstimator.Fit(trainingDataView);return model;

TheFit() method trains your model with the provided training dataset. Technically, it executes theEstimator definitions by transforming the data and applying the training, and it returns back the trained model, which is aTransformer.

For more information on the model training workflow in ML.NET, seeWhat is ML.NET and how does it work?.

Add the following as the next line of code below the call to theLoadData() method to call yourBuildAndTrainModel() method and return the trained model:

ITransformer model = BuildAndTrainModel(mlContext, trainingDataView);

Evaluate your model

Once you have trained your model, use your test data to evaluate how your model is performing.

Create theEvaluateModel() method, just after theBuildAndTrainModel() method, using the following code:

void EvaluateModel(MLContext mlContext, IDataView testDataView, ITransformer model){}

Transform theTest data by adding the following code toEvaluateModel():

Console.WriteLine("=============== Evaluating the model ===============");var prediction = model.Transform(testDataView);

TheTransform() method makes predictions for multiple provided input rows of a test dataset.

Evaluate the model by adding the following as the next line of code in theEvaluateModel() method:

var metrics = mlContext.Regression.Evaluate(prediction, labelColumnName: "Label", scoreColumnName: "Score");

Once you have the prediction set, theEvaluate() method assesses the model, which compares the predicted values with the actualLabels in the test dataset and returns metrics on how the model is performing.

Print your evaluation metrics to the console by adding the following as the next line of code in theEvaluateModel() method:

Console.WriteLine("Root Mean Squared Error : " + metrics.RootMeanSquaredError.ToString());Console.WriteLine("RSquared: " + metrics.RSquared.ToString());

Add the following as the next line of code below the call to theBuildAndTrainModel() method to call yourEvaluateModel() method:

EvaluateModel(mlContext, testDataView, model);

The output so far should look similar to the following text:

=============== Training the model ===============iter      tr_rmse          obj   0       1.5403   3.1262e+05   1       0.9221   1.6030e+05   2       0.8687   1.5046e+05   3       0.8416   1.4584e+05   4       0.8142   1.4209e+05   5       0.7849   1.3907e+05   6       0.7544   1.3594e+05   7       0.7266   1.3361e+05   8       0.6987   1.3110e+05   9       0.6751   1.2948e+05  10       0.6530   1.2766e+05  11       0.6350   1.2644e+05  12       0.6197   1.2541e+05  13       0.6067   1.2470e+05  14       0.5953   1.2382e+05  15       0.5871   1.2342e+05  16       0.5781   1.2279e+05  17       0.5713   1.2240e+05  18       0.5660   1.2230e+05  19       0.5592   1.2179e+05=============== Evaluating the model ===============Rms: 0.994051469730769RSquared: 0.412556298844873

In this output, there are 20 iterations. In each iteration, the measure of error decreases and converges closer and closer to 0.

Theroot of mean squared error (RMS or RMSE) is used to measure the differences between the model predicted values and the test dataset observed values. Technically it's the square root of the average of the squares of the errors. The lower it is, the better the model is.

R Squared indicates how well data fits a model. Ranges from 0 to 1. A value of 0 means that the data is random or otherwise can't be fit to the model. A value of 1 means that the model exactly matches the data. You want yourR Squared score to be as close to 1 as possible.

Building successful models is an iterative process. This model has initial lower quality as the tutorial uses small datasets to provide quick model training. If you aren't satisfied with the model quality, you can try to improve it by providing larger training datasets or by choosing different training algorithms with different hyper-parameters for each algorithm. For more information, check out theImprove your model section below.

Use your model

Now you can use your trained model to make predictions on new data.

Create theUseModelForSinglePrediction() method, just after theEvaluateModel() method, using the following code:

void UseModelForSinglePrediction(MLContext mlContext, ITransformer model){}

Use thePredictionEngine to predict the rating by adding the following code toUseModelForSinglePrediction():

Console.WriteLine("=============== Making a prediction ===============");var predictionEngine = mlContext.Model.CreatePredictionEngine<MovieRating, MovieRatingPrediction>(model);

ThePredictionEngine is a convenience API, which allows you to perform a prediction on a single instance of data.PredictionEngine is not thread-safe. It's acceptable to use in single-threaded or prototype environments. For improved performance and thread safety in production environments, use thePredictionEnginePool service, which creates anObjectPool ofPredictionEngine objects for use throughout your application. See this guide on how tousePredictionEnginePool in an ASP.NET Core Web API.

Note

PredictionEnginePool service extension is currently in preview.

Create an instance ofMovieRating calledtestInput and pass it to the Prediction Engine by adding the following as the next lines of code in theUseModelForSinglePrediction() method:

var testInput = new MovieRating { userId = 6, movieId = 10 };var movieRatingPrediction = predictionEngine.Predict(testInput);

ThePredict() function makes a prediction on a single column of data.

You can then use theScore, or the predicted rating, to determine whether you want to recommend the movie with movieId 10 to user 6. The higher theScore, the higher the likelihood of a user liking a particular movie. In this case, let’s say that you recommend movies with a predicted rating of > 3.5.

To print the results, add the following as the next lines of code in theUseModelForSinglePrediction() method:

if (Math.Round(movieRatingPrediction.Score, 1) > 3.5){    Console.WriteLine("Movie " + testInput.movieId + " is recommended for user " + testInput.userId);}else{    Console.WriteLine("Movie " + testInput.movieId + " is not recommended for user " + testInput.userId);}

Add the following as the next line of code after the call to theEvaluateModel() method to call yourUseModelForSinglePrediction() method:

UseModelForSinglePrediction(mlContext, model);

The output of this method should look similar to the following text:

=============== Making a prediction ===============Movie 10 is recommended for user 6

Save your model

To use your model to make predictions in end-user applications, you must first save the model.

Create theSaveModel() method, just after theUseModelForSinglePrediction() method, using the following code:

void SaveModel(MLContext mlContext, DataViewSchema trainingDataViewSchema, ITransformer model){}

Save your trained model by adding the following code in theSaveModel() method:

var modelPath = Path.Combine(Environment.CurrentDirectory, "Data", "MovieRecommenderModel.zip");Console.WriteLine("=============== Saving the model to a file ===============");mlContext.Model.Save(model, trainingDataViewSchema, modelPath);

This method saves your trained model to a .zip file (in the "Data" folder), which can then be used in other .NET applications to make predictions.

Add the following as the next line of code after the call to theUseModelForSinglePrediction() method to call yourSaveModel() method:

SaveModel(mlContext, trainingDataView.Schema, model);

Use your saved model

Once you have saved your trained model, you can consume the model in different environments. SeeSave and load trained models to learn how to operationalize a trained machine learning model in apps.

Results

After following the steps above, run your console app (Ctrl + F5). Your results from the single prediction above should be similar to the following. You may see warnings or processing messages, but these messages have been removed from the following results for clarity.

=============== Training the model ===============iter      tr_rmse          obj   0       1.5382   3.1213e+05   1       0.9223   1.6051e+05   2       0.8691   1.5050e+05   3       0.8413   1.4576e+05   4       0.8145   1.4208e+05   5       0.7848   1.3895e+05   6       0.7552   1.3613e+05   7       0.7259   1.3357e+05   8       0.6987   1.3121e+05   9       0.6747   1.2949e+05  10       0.6533   1.2766e+05  11       0.6353   1.2636e+05  12       0.6209   1.2561e+05  13       0.6072   1.2462e+05  14       0.5965   1.2394e+05  15       0.5868   1.2352e+05  16       0.5782   1.2279e+05  17       0.5713   1.2227e+05  18       0.5637   1.2190e+05  19       0.5604   1.2178e+05=============== Evaluating the model ===============Rms: 0.977175077487166RSquared: 0.43233349213192=============== Making a prediction ===============Movie 10 is recommended for user 6=============== Saving the model to a file ===============

Congratulations! You've now successfully built a machine learning model for recommending movies. You can find the source code for this tutorial at thedotnet/samples repository.

Improve your model

There are several ways that you can improve the performance of your model so that you can get more accurate predictions.

Data

Adding more training data that has enough samples for each user and movie ID can help improve the quality of the recommendation model.

Cross validation is a technique for evaluating models that randomly splits up data into subsets (instead of extracting out test data from the dataset like you did in this tutorial) and takes some of the groups as train data and some of the groups as test data. This method outperforms making a train-test split in terms of model quality.

Features

In this tutorial, you only use the threeFeatures (user id,movie id, andrating) that are provided by the dataset.

While this is a good start, in reality you might want to add other attributes orFeatures (for example, age, gender, geo-location, etc.) if they are included in the dataset. Adding more relevantFeatures can help improve the performance of your recommendation model.

If you are unsure about whichFeatures might be the most relevant for your machine learning task, you can also make use of Feature Contribution Calculation (FCC) andpermutation feature importance, which ML.NET provides to discover the most influentialFeatures.

Algorithm hyperparameters

While ML.NET provides good default training algorithms, you can further fine-tune performance by changing the algorithm'shyperparameters.

ForMatrix Factorization, you can experiment with hyperparameters such asNumberOfIterations andApproximationRank to see if that gives you better results.

For instance, in this tutorial the algorithm options are:

var options = new MatrixFactorizationTrainer.Options{    MatrixColumnIndexColumnName = "userIdEncoded",    MatrixRowIndexColumnName = "movieIdEncoded",    LabelColumnName = "Label",    NumberOfIterations = 20,    ApproximationRank = 100};

Other Recommendation Algorithms

The matrix factorization algorithm with collaborative filtering is only one approach for performing movie recommendations. In many cases, you may not have the ratings data available and only have movie history available from users. In other cases, you may have more than just the user’s rating data.

AlgorithmScenarioSample
One Class Matrix FactorizationUse this when you only have userId and movieId. This style of recommendation is based upon the co-purchase scenario, or products frequently bought together, which means it will recommend to customers a set of products based upon their own purchase order history.>Try it out
Field Aware Factorization MachinesUse this to make recommendations when you have more Features beyond userId, productId, and rating (such as product description or product price). This method also uses a collaborative filtering approach.>Try it out

New user scenario

One common problem in collaborative filtering is the cold start problem, which is when you have a new user with no previous data to draw inferences from. This problem is often solved by asking new users to create a profile and, for instance, rate movies they have seen in the past. While this method puts some burden on the user, it provides some starting data for new users with no rating history.

Resources

The data used in this tutorial is derived fromMovieLens Dataset.

Next steps

In this tutorial, you learned how to:

  • Select a machine learning algorithm
  • Prepare and load your data
  • Build and train a model
  • Evaluate a model
  • Deploy and consume a model

Advance to the next tutorial to learn more

Collaborate with us on GitHub
The source for this content can be found on GitHub, where you can also create and review issues and pull requests. For more information, seeour contributor guide.

Feedback

Was this page helpful?

YesNo

In this article

Was this page helpful?

YesNo